hyperbard package¶

Submodules¶

hyperbard.compute_rawdata_xml_statistics module¶

hyperbard.compute_rawdata_xml_statistics.random() → x in the interval [0, 1).¶

hyperbard.compute_rawdata_xml_statistics.generate_xml_statistics(xml_counter, counted_name, files)¶

hyperbard.compute_rawdata_xml_statistics.count_paths(files)¶

hyperbard.compute_rawdata_xml_statistics.count_tags(files)¶

hyperbard.compute_rawdata_xml_statistics.count_attributes(files)¶

hyperbard.compute_rawdata_xml_statistics.make_tag_url(tag)¶

hyperbard.compute_rawdata_xml_statistics.get_tag_description(tag_url)¶

hyperbard.compute_rawdata_xml_statistics.retrieve_tag_descriptions()¶

hyperbard.create_graph_representations module¶

hyperbard.create_hypergraph_representations module¶

hyperbard.graph_io module¶

hyperbard.graph_representations module¶

hyperbard.graph_representations.get_weighted_multigraph(df: DataFrame, groupby: list) → MultiGraph¶

Create a weighted multigraph from an aggregated dataframe, with edges resolved at the level given by the groupby argument, where multiedges are kept, and n_tokens and n_lines are potential weights.

Representations: ce-{scene, group}-{mb,mw}

Parameters

df – pd.DataFrame generated from an .agg.csv file
groupby – [“act”, “scene”] -> one edge per act and scene,

[“act”, “scene”, “stagegroup”] -> one edge per act, scene, and stagegroup :return: nx.MultiGraph corresponding to the specified groupby

hyperbard.graph_representations.get_count_weighted_graph(df: DataFrame, groupby: list)¶

Create a count-weighted graph from an aggregated dataframe, with edges resolved at the level given by the groupby argument, where multiedges are _not_ kept, and counts are potential weights.

Representations: ce-{act,group}-{b,w}

Parameters: groupby – [“act”, “scene”] -> one edge per act and scene, [“act”, “scene”, “stagegroup”] -> one edge per act, scene, and stagegroup
Returns: nx.Graph corresponding to the specified groupby

hyperbard.graph_representations.format_text_unit_node(elem)¶

hyperbard.graph_representations.get_bipartite_graph(df: DataFrame, groupby: list) → Union[Graph, MultiDiGraph]¶

Create a weighted bipartite graph from an aggregated dataframe, with play-part nodes resolved at the level given by the groupby argument, where n_tokens and n_lines are potential weights.

Representations: se-{scene, group}-{b,w}, se-speech-mwd

Parameters

df – pd.DataFrame generated from an .agg.csv file
groupby – [“act”, “scene”] -> one play part node per act and scene, [“act”, “scene”, “stagegroup”] -> one play part node per act, scene, and stagegroup, [“act”, “scene”, “stagegroup”, “setting”, “speaker”] -> one play part node per act, scene, and stagegroup, directed edges for speech acts/information flow

Returns

nx.Graph (if groupby is not by speech act) or nx.MultiDiGraph (if groupby is by speech act)

hyperbard.graph_representations.get_weighted_bipartite_graph(df: DataFrame, groupby: list) → DiGraph¶

hyperbard.hypergraph_representations module¶

hyperbard.plot_graph_rankings module¶

hyperbard.plot_hypergraph_rankings module¶

hyperbard.plot_rank_correlations module¶

hyperbard.plot_rank_correlations.get_correlation_dfs(ranking_files)¶

hyperbard.plot_rank_correlations.get_average_correlation(corrs)¶

hyperbard.plot_rank_correlations.get_asymmetric_correlation_difference(first_corr_df, second_corr_df)¶

hyperbard.plot_rank_correlations.plot_correlation_difference_matrix(selected_correlation, difference_correlation, selected_name)¶

hyperbard.plot_romeo module¶

hyperbard.plot_toy module¶

hyperbard.plotting module¶

hyperbard.plotting_utils module¶

hyperbard.plotting_utils.set_rcParams(fontsize=None)¶

hyperbard.plotting_utils.save_pgf_fig(path, axis_off=False, tight=False)¶

hyperbard.plotting_utils.get_character_color(k)¶

hyperbard.plotting_utils.get_formatted_labels(G, selected_labels)¶

hyperbard.preprocessing module¶

hyperbard.preprocessing.get_soup(file: str, parser: str = 'lxml-xml') → BeautifulSoup¶

Parse an XML or HTML document with the specified BeautifulSoup parser.

Parameters

file – Path to file
parser – Parser to use

Returns

BeautifulSoup object containing the parsed file

hyperbard.preprocessing.get_cast_df(file: str) → DataFrame¶

hyperbard.preprocessing.get_body(soup: BeautifulSoup) → Tag¶

Extract the text body from an appropriately shaped BeautifulSoup object.

Parameters: soup – BeautifulSoup with exactly one text.body object
Returns: The text body as a BeautifulSoup object

hyperbard.preprocessing.is_leaf(elem: Tag) → bool¶

Check if a bs4 Tag element has at most one child, i.e., if it is a leaf in the Tag tree (single children are NavigableStrings)

Parameters: elem – bs4 Tag element
Returns: If the element has at most one child

hyperbard.preprocessing.get_attrs(elem: Tag) → dict¶

Get the attributes of a bs4 Tag element, plus its tag name and - if the element is a leaf - its text, as a dictionary.

Parameters: elem – bs4 Tag element
Returns: Dictionary containing the element’s tag name, attributes, and - if the element is a leaf - text

hyperbard.preprocessing.is_navigable_string(elem: PageElement) → bool¶

hyperbard.preprocessing.is_redundant_element(elem: Union[Tag, NavigableString]) → bool¶

Check if an element is redundant, i.e., it is (contained in) a “header” or a “speaker” element, because the information contained in “head” and “speaker” elements and their descendants is already encoded in XML attributes of other tags.

Parameters: elem – bs4 element
Returns: If the element is redundant

hyperbard.preprocessing.is_descendant_of_redundant_element(elem: Union[Tag, NavigableString]) → bool¶

hyperbard.preprocessing.keep_elem_in_xml_df(elem: Union[Tag, NavigableString]) → bool¶

Decide whether to keep an bs4 element in the XML dataframe produced as the raw preprocessed data.

Parameters: elem – bs4 element from a BeautifulSoup object
Returns: Whether to keep the element

hyperbard.preprocessing.get_xml_df(body: Tag) → DataFrame¶

Construct a pd.DataFrame from the non-redundant XML tags of a TEI-encoded BeautifulSoup object.

Parameters: body – Body of a TEI-encoded BeautifulSoup object
Returns: pd.DataFrame containing all non-redundant XML tags with names, attributes, and text

hyperbard.preprocessing.set_act(df: DataFrame) → None¶

Adds act information to a pd.DataFrame created with get_xml_df, using the observation that rows with type == “act” hold the new act number in column “n”. Complete via first forward-filling, then backward-filling, and convert act numbers to integers. Inductions, prologues, and epilogues receive special treatment as act 0 (inductions and prologues) resp. act 6 (epilogues).

Parameters: df – pd.DataFrame created with get_xml_df
Returns: None

hyperbard.preprocessing.set_scene(df: DataFrame) → None¶

Adds scene information to a pd.DataFrame created with get_xml_df, using the observation that rows with type == “scene” hold the new scene number in column “n”. Complete via first forward-filling, then backward-filling, and convert act numbers to integers. Acts, inductions, prologues, and epilogues receive special treatment as scene 0.

Parameters: df – pd.DataFrame created with get_xml_df
Returns: None

hyperbard.preprocessing.is_entrance(row)¶

hyperbard.preprocessing.is_exit(row)¶

hyperbard.preprocessing.has_speaker(row)¶

hyperbard.preprocessing.is_new_act(row, prev_act)¶

hyperbard.preprocessing.is_new_scene(row, prev_scene)¶

hyperbard.preprocessing.set_onstage(df: DataFrame) → None¶

Adds information on who is onstage to a pd.DataFrame created with get_xml_df, primarily based on hints in the XML attributes of “stage” and “sp” tags.

Notes:

We ensure that the speaker(s) are always onstage. Note that there can be multiple speakers for one line, hence the need to treat the “who” attribute as a set.
We flush characters when a new act starts. Rationale: Limit repercussions of encoding “errors” in stage directions, found, e.g., when characters are dead or unconscious and not marked as exiting. Example: R&J - Juliet not marked up as exiting at the end of Act IV
We _also_ flush characters when a new scene starts. Rationale: The same as for flushing when a new act starts, but somewhat more problematic. Example: R&J - Citizen from Act III, Scene I never marked up as exiting, and thus still onstage in Act III, Scene V (on the balcony!).
Thus, we currently model character presence on stage “conservatively” overall, and we are looking into better character management (not relying on the markup) as a potential improvement.
Flushing when a new scene starts is problematic: Stage directions in the Folger Shakespeare often use “Exeunt all but”, and as a consequence, only exits are marked up and not entries in the next scene. Example: Julius Caesar - Brutus and Cassius not marked up to enter in Act IV Scene III, but rather staying from Act IV Scene II (stage directions differ from the Oxford Shakespeare).
Even flushing when a new act starts is problematic with the Folger stage directions, but the problematic instances are very rare. We limit the impact of errors introduced by this modeling choice by also ensuring that the speaker is always onstage.

Parameters: df – pd.DataFrame created with get_xml_df, with act and scene already annotated
Returns: None

hyperbard.preprocessing.set_stagegroup(df: DataFrame) → None¶

hyperbard.preprocessing.get_who_attributes(elem: Tag) → Union[str, float]¶

hyperbard.preprocessing.get_descendants_ids(elem: Tag) → List[str]¶

hyperbard.preprocessing.set_speaker(df: DataFrame, body: Tag) → None¶

hyperbard.preprocessing.get_raw_xml_df(file: str) → DataFrame¶

Construct and enrich a pd.DataFrame from the non-redundant XML tags of a TEI-encoded BeautifulSoup object.

Produces a DataFrame object of the shape of the *.raw.csv files.

Parameters: file – Path to file
Returns: pd.DataFrame containing all non-redundant XML tags with names, attributes, text, and annotations

hyperbard.preprocessing.get_aggregated(df: DataFrame) → DataFrame¶

Given a pd.DataFrame output by get_raw_xml_df, produce a pd.DataFrame containing only spoken words, aggregated by speech acts, i.e., consecutive settings with the same speaker and the same other characters on stage.

Parameters: df – pd.DataFrame output by get_raw_xml_df
Returns: pd.DataFrame containing only spoken words, aggregated by speech acts

hyperbard.preprocessing.set_setting(aggregated)¶

hyperbard.preprocessing.get_grouped_df(aggregated)¶

hyperbard.preprocessing.get_agg_xml_df(df: DataFrame) → DataFrame¶

Given a pd.DataFrame output by get_raw_xml_df, produce a pd.DataFrame containing only spoken words, aggregated by speech acts, i.e., consecutive settings with the same speaker and the same other characters on stage, with full setting annotations.

Produces a DataFrame object of the shape of the *.agg.csv files.

Parameters: df – pd.DataFrame output by get_raw_xml_df
Returns: pd.DataFrame containing only spoken words, aggregated by speech acts

hyperbard.ranking module¶

hyperbard.raw_summary_statistics module¶

hyperbard.run_preprocessing module¶

hyperbard.statics module¶

hyperbard.utils module¶

hyperbard.utils.character_string_to_sorted_list(character_string: str) → List[str]¶

Given a string of character identifiers of shape “id1 id2 id3 … id1 idn”, return a sorted, deduplicated list of these identifiers.

Parameters: character_string – String of character identifiers separated by whitespace
Returns: Sorted list of unique character identifiers

hyperbard.utils.get_name_from_identifier(character_identifier: str) → str¶

Given a character identifier of shape “#CharacterName_PlayAbbreviation”, extract CharacterName.

Parameters: character_identifier – Character identifier of shape “#CharacterName_PlayAbbreviation”
Returns: Character name

hyperbard.utils.remove_hashtag(identifier: str) → str¶

hyperbard.utils.remove_play_abbreviation(identifier: str) → str¶

hyperbard.utils.remove_uppercase_prefixes(identifier: str) → str¶

hyperbard.utils.sort_join_strings(string_iterable: Iterable[str]) → str¶

Sort and concatenate an iterable of strings, joining on a whitespace character.

Parameters: string_iterable – Iterable of strings (e.g., a list or a set)
Returns: String with the entries in the iterable sorted and concatenated with a whitespace as the join character

hyperbard.utils.get_filename_base(file: str, full: bool = True) → str¶

Given a file name of shape “path/to/PlayName_XMLFlavor_Source.ext”, extract PlayName (if not full) or PlayName_XMLFlavor (if full).

Parameters

file – Path of shape “path/to/PlayName_XMLFlavor_Source.ext”
full – Return “_XMLFlavor_Source” as part of the file name

Returns

String of shape “PlayName(_XMLFlavor_Source)”

hyperbard.utils.string_to_set(character_string: Union[str, float]) → Union[set, float]¶

Given a string of character identifiers of shape “id1 id2 id3 … id1 idn”, or nan, return a set of the identifiers or nan.

Parameters: character_string – string of character identifiers of shape “id1 id2 id3 … id1 idn” or nan
Returns: set of identifiers or nan

hyperbard package¶

Submodules¶

hyperbard.compute_rawdata_xml_statistics module¶

hyperbard.create_graph_representations module¶

hyperbard.create_hypergraph_representations module¶

hyperbard.graph_io module¶

hyperbard.graph_representations module¶

hyperbard.hypergraph_representations module¶

hyperbard.plot_graph_rankings module¶

hyperbard.plot_hypergraph_rankings module¶

hyperbard.plot_rank_correlations module¶

hyperbard.plot_romeo module¶

hyperbard.plot_toy module¶

hyperbard.plotting module¶

hyperbard.plotting_utils module¶

hyperbard.preprocessing module¶

hyperbard.ranking module¶

hyperbard.raw_summary_statistics module¶

hyperbard.run_preprocessing module¶

hyperbard.statics module¶

hyperbard.utils module¶

Module contents¶

Hyperbard

Navigation

Related Topics