hyperbard package¶
Submodules¶
hyperbard.compute_rawdata_xml_statistics module¶
- hyperbard.compute_rawdata_xml_statistics.random() x in the interval [0, 1).¶
- hyperbard.compute_rawdata_xml_statistics.generate_xml_statistics(xml_counter, counted_name, files)¶
- hyperbard.compute_rawdata_xml_statistics.count_paths(files)¶
- hyperbard.compute_rawdata_xml_statistics.count_tags(files)¶
- hyperbard.compute_rawdata_xml_statistics.count_attributes(files)¶
- hyperbard.compute_rawdata_xml_statistics.make_tag_url(tag)¶
- hyperbard.compute_rawdata_xml_statistics.get_tag_description(tag_url)¶
- hyperbard.compute_rawdata_xml_statistics.retrieve_tag_descriptions()¶
hyperbard.create_graph_representations module¶
hyperbard.create_hypergraph_representations module¶
hyperbard.graph_io module¶
hyperbard.graph_representations module¶
- hyperbard.graph_representations.get_weighted_multigraph(df: DataFrame, groupby: list) MultiGraph¶
Create a weighted multigraph from an aggregated dataframe, with edges resolved at the level given by the groupby argument, where multiedges are kept, and n_tokens and n_lines are potential weights.
Representations: ce-{scene, group}-{mb,mw}
- Parameters
df – pd.DataFrame generated from an .agg.csv file
groupby – [“act”, “scene”] -> one edge per act and scene,
[“act”, “scene”, “stagegroup”] -> one edge per act, scene, and stagegroup :return: nx.MultiGraph corresponding to the specified groupby
- hyperbard.graph_representations.get_count_weighted_graph(df: DataFrame, groupby: list)¶
Create a count-weighted graph from an aggregated dataframe, with edges resolved at the level given by the groupby argument, where multiedges are _not_ kept, and counts are potential weights.
Representations: ce-{act,group}-{b,w}
- Parameters
groupby – [“act”, “scene”] -> one edge per act and scene, [“act”, “scene”, “stagegroup”] -> one edge per act, scene, and stagegroup
- Returns
nx.Graph corresponding to the specified groupby
- hyperbard.graph_representations.format_text_unit_node(elem)¶
- hyperbard.graph_representations.get_bipartite_graph(df: DataFrame, groupby: list) Union[Graph, MultiDiGraph]¶
Create a weighted bipartite graph from an aggregated dataframe, with play-part nodes resolved at the level given by the groupby argument, where n_tokens and n_lines are potential weights.
Representations: se-{scene, group}-{b,w}, se-speech-mwd
- Parameters
df – pd.DataFrame generated from an .agg.csv file
groupby – [“act”, “scene”] -> one play part node per act and scene, [“act”, “scene”, “stagegroup”] -> one play part node per act, scene, and stagegroup, [“act”, “scene”, “stagegroup”, “setting”, “speaker”] -> one play part node per act, scene, and stagegroup, directed edges for speech acts/information flow
- Returns
nx.Graph (if groupby is not by speech act) or nx.MultiDiGraph (if groupby is by speech act)
- hyperbard.graph_representations.get_weighted_bipartite_graph(df: DataFrame, groupby: list) DiGraph¶
hyperbard.hypergraph_representations module¶
hyperbard.plot_graph_rankings module¶
hyperbard.plot_hypergraph_rankings module¶
hyperbard.plot_rank_correlations module¶
- hyperbard.plot_rank_correlations.get_correlation_dfs(ranking_files)¶
- hyperbard.plot_rank_correlations.get_average_correlation(corrs)¶
- hyperbard.plot_rank_correlations.get_asymmetric_correlation_difference(first_corr_df, second_corr_df)¶
- hyperbard.plot_rank_correlations.plot_correlation_difference_matrix(selected_correlation, difference_correlation, selected_name)¶
hyperbard.plot_romeo module¶
hyperbard.plot_toy module¶
hyperbard.plotting module¶
hyperbard.plotting_utils module¶
- hyperbard.plotting_utils.set_rcParams(fontsize=None)¶
- hyperbard.plotting_utils.save_pgf_fig(path, axis_off=False, tight=False)¶
- hyperbard.plotting_utils.get_character_color(k)¶
- hyperbard.plotting_utils.get_formatted_labels(G, selected_labels)¶
hyperbard.preprocessing module¶
- hyperbard.preprocessing.get_soup(file: str, parser: str = 'lxml-xml') BeautifulSoup¶
Parse an XML or HTML document with the specified BeautifulSoup parser.
- Parameters
file – Path to file
parser – Parser to use
- Returns
BeautifulSoup object containing the parsed file
- hyperbard.preprocessing.get_cast_df(file: str) DataFrame¶
- hyperbard.preprocessing.get_body(soup: BeautifulSoup) Tag¶
Extract the text body from an appropriately shaped BeautifulSoup object.
- Parameters
soup – BeautifulSoup with exactly one text.body object
- Returns
The text body as a BeautifulSoup object
- hyperbard.preprocessing.is_leaf(elem: Tag) bool¶
Check if a bs4 Tag element has at most one child, i.e., if it is a leaf in the Tag tree (single children are NavigableStrings)
- Parameters
elem – bs4 Tag element
- Returns
If the element has at most one child
- hyperbard.preprocessing.get_attrs(elem: Tag) dict¶
Get the attributes of a bs4 Tag element, plus its tag name and - if the element is a leaf - its text, as a dictionary.
- Parameters
elem – bs4 Tag element
- Returns
Dictionary containing the element’s tag name, attributes, and - if the element is a leaf - text
- hyperbard.preprocessing.is_redundant_element(elem: Union[Tag, NavigableString]) bool¶
Check if an element is redundant, i.e., it is (contained in) a “header” or a “speaker” element, because the information contained in “head” and “speaker” elements and their descendants is already encoded in XML attributes of other tags.
- Parameters
elem – bs4 element
- Returns
If the element is redundant
- hyperbard.preprocessing.is_descendant_of_redundant_element(elem: Union[Tag, NavigableString]) bool¶
- hyperbard.preprocessing.keep_elem_in_xml_df(elem: Union[Tag, NavigableString]) bool¶
Decide whether to keep an bs4 element in the XML dataframe produced as the raw preprocessed data.
- Parameters
elem – bs4 element from a BeautifulSoup object
- Returns
Whether to keep the element
- hyperbard.preprocessing.get_xml_df(body: Tag) DataFrame¶
Construct a pd.DataFrame from the non-redundant XML tags of a TEI-encoded BeautifulSoup object.
- Parameters
body – Body of a TEI-encoded BeautifulSoup object
- Returns
pd.DataFrame containing all non-redundant XML tags with names, attributes, and text
- hyperbard.preprocessing.set_act(df: DataFrame) None¶
Adds act information to a pd.DataFrame created with get_xml_df, using the observation that rows with type == “act” hold the new act number in column “n”. Complete via first forward-filling, then backward-filling, and convert act numbers to integers. Inductions, prologues, and epilogues receive special treatment as act 0 (inductions and prologues) resp. act 6 (epilogues).
- Parameters
df – pd.DataFrame created with get_xml_df
- Returns
None
- hyperbard.preprocessing.set_scene(df: DataFrame) None¶
Adds scene information to a pd.DataFrame created with get_xml_df, using the observation that rows with type == “scene” hold the new scene number in column “n”. Complete via first forward-filling, then backward-filling, and convert act numbers to integers. Acts, inductions, prologues, and epilogues receive special treatment as scene 0.
- Parameters
df – pd.DataFrame created with get_xml_df
- Returns
None
- hyperbard.preprocessing.is_entrance(row)¶
- hyperbard.preprocessing.is_exit(row)¶
- hyperbard.preprocessing.has_speaker(row)¶
- hyperbard.preprocessing.is_new_act(row, prev_act)¶
- hyperbard.preprocessing.is_new_scene(row, prev_scene)¶
- hyperbard.preprocessing.set_onstage(df: DataFrame) None¶
Adds information on who is onstage to a pd.DataFrame created with get_xml_df, primarily based on hints in the XML attributes of “stage” and “sp” tags.
Notes:
We ensure that the speaker(s) are always onstage. Note that there can be multiple speakers for one line, hence the need to treat the “who” attribute as a set.
We flush characters when a new act starts. Rationale: Limit repercussions of encoding “errors” in stage directions, found, e.g., when characters are dead or unconscious and not marked as exiting. Example: R&J - Juliet not marked up as exiting at the end of Act IV
We _also_ flush characters when a new scene starts. Rationale: The same as for flushing when a new act starts, but somewhat more problematic. Example: R&J - Citizen from Act III, Scene I never marked up as exiting, and thus still onstage in Act III, Scene V (on the balcony!).
Thus, we currently model character presence on stage “conservatively” overall, and we are looking into better character management (not relying on the markup) as a potential improvement.
Flushing when a new scene starts is problematic: Stage directions in the Folger Shakespeare often use “Exeunt all but”, and as a consequence, only exits are marked up and not entries in the next scene. Example: Julius Caesar - Brutus and Cassius not marked up to enter in Act IV Scene III, but rather staying from Act IV Scene II (stage directions differ from the Oxford Shakespeare).
Even flushing when a new act starts is problematic with the Folger stage directions, but the problematic instances are very rare. We limit the impact of errors introduced by this modeling choice by also ensuring that the speaker is always onstage.
- Parameters
df – pd.DataFrame created with get_xml_df, with act and scene already annotated
- Returns
None
- hyperbard.preprocessing.set_stagegroup(df: DataFrame) None¶
- hyperbard.preprocessing.get_who_attributes(elem: Tag) Union[str, float]¶
- hyperbard.preprocessing.get_descendants_ids(elem: Tag) List[str]¶
- hyperbard.preprocessing.set_speaker(df: DataFrame, body: Tag) None¶
- hyperbard.preprocessing.get_raw_xml_df(file: str) DataFrame¶
Construct and enrich a pd.DataFrame from the non-redundant XML tags of a TEI-encoded BeautifulSoup object.
Produces a DataFrame object of the shape of the *.raw.csv files.
- Parameters
file – Path to file
- Returns
pd.DataFrame containing all non-redundant XML tags with names, attributes, text, and annotations
- hyperbard.preprocessing.get_aggregated(df: DataFrame) DataFrame¶
Given a pd.DataFrame output by get_raw_xml_df, produce a pd.DataFrame containing only spoken words, aggregated by speech acts, i.e., consecutive settings with the same speaker and the same other characters on stage.
- Parameters
df – pd.DataFrame output by get_raw_xml_df
- Returns
pd.DataFrame containing only spoken words, aggregated by speech acts
- hyperbard.preprocessing.set_setting(aggregated)¶
- hyperbard.preprocessing.get_grouped_df(aggregated)¶
- hyperbard.preprocessing.get_agg_xml_df(df: DataFrame) DataFrame¶
Given a pd.DataFrame output by get_raw_xml_df, produce a pd.DataFrame containing only spoken words, aggregated by speech acts, i.e., consecutive settings with the same speaker and the same other characters on stage, with full setting annotations.
Produces a DataFrame object of the shape of the *.agg.csv files.
- Parameters
df – pd.DataFrame output by get_raw_xml_df
- Returns
pd.DataFrame containing only spoken words, aggregated by speech acts
hyperbard.ranking module¶
hyperbard.raw_summary_statistics module¶
hyperbard.run_preprocessing module¶
hyperbard.statics module¶
hyperbard.utils module¶
- hyperbard.utils.character_string_to_sorted_list(character_string: str) List[str]¶
Given a string of character identifiers of shape “id1 id2 id3 … id1 idn”, return a sorted, deduplicated list of these identifiers.
- Parameters
character_string – String of character identifiers separated by whitespace
- Returns
Sorted list of unique character identifiers
- hyperbard.utils.get_name_from_identifier(character_identifier: str) str¶
Given a character identifier of shape “#CharacterName_PlayAbbreviation”, extract CharacterName.
- Parameters
character_identifier – Character identifier of shape “#CharacterName_PlayAbbreviation”
- Returns
Character name
- hyperbard.utils.remove_hashtag(identifier: str) str¶
- hyperbard.utils.remove_play_abbreviation(identifier: str) str¶
- hyperbard.utils.remove_uppercase_prefixes(identifier: str) str¶
- hyperbard.utils.sort_join_strings(string_iterable: Iterable[str]) str¶
Sort and concatenate an iterable of strings, joining on a whitespace character.
- Parameters
string_iterable – Iterable of strings (e.g., a list or a set)
- Returns
String with the entries in the iterable sorted and concatenated with a whitespace as the join character
- hyperbard.utils.get_filename_base(file: str, full: bool = True) str¶
Given a file name of shape “path/to/PlayName_XMLFlavor_Source.ext”, extract PlayName (if not full) or PlayName_XMLFlavor (if full).
- Parameters
file – Path of shape “path/to/PlayName_XMLFlavor_Source.ext”
full – Return “_XMLFlavor_Source” as part of the file name
- Returns
String of shape “PlayName(_XMLFlavor_Source)”
- hyperbard.utils.string_to_set(character_string: Union[str, float]) Union[set, float]¶
Given a string of character identifiers of shape “id1 id2 id3 … id1 idn”, or nan, return a set of the identifiers or nan.
- Parameters
character_string – string of character identifiers of shape “id1 id2 id3 … id1 idn” or nan
- Returns
set of identifiers or nan