hyperbard package

Submodules

hyperbard.compute_rawdata_xml_statistics module

hyperbard.compute_rawdata_xml_statistics.random() x in the interval [0, 1).
hyperbard.compute_rawdata_xml_statistics.generate_xml_statistics(xml_counter, counted_name, files)
hyperbard.compute_rawdata_xml_statistics.count_paths(files)
hyperbard.compute_rawdata_xml_statistics.count_tags(files)
hyperbard.compute_rawdata_xml_statistics.count_attributes(files)
hyperbard.compute_rawdata_xml_statistics.make_tag_url(tag)
hyperbard.compute_rawdata_xml_statistics.get_tag_description(tag_url)
hyperbard.compute_rawdata_xml_statistics.retrieve_tag_descriptions()

hyperbard.create_graph_representations module

hyperbard.create_hypergraph_representations module

hyperbard.graph_io module

hyperbard.graph_representations module

hyperbard.graph_representations.get_weighted_multigraph(df: DataFrame, groupby: list) MultiGraph

Create a weighted multigraph from an aggregated dataframe, with edges resolved at the level given by the groupby argument, where multiedges are kept, and n_tokens and n_lines are potential weights.

Representations: ce-{scene, group}-{mb,mw}

Parameters
  • df – pd.DataFrame generated from an .agg.csv file

  • groupby – [“act”, “scene”] -> one edge per act and scene,

[“act”, “scene”, “stagegroup”] -> one edge per act, scene, and stagegroup :return: nx.MultiGraph corresponding to the specified groupby

hyperbard.graph_representations.get_count_weighted_graph(df: DataFrame, groupby: list)

Create a count-weighted graph from an aggregated dataframe, with edges resolved at the level given by the groupby argument, where multiedges are _not_ kept, and counts are potential weights.

Representations: ce-{act,group}-{b,w}

Parameters

groupby – [“act”, “scene”] -> one edge per act and scene, [“act”, “scene”, “stagegroup”] -> one edge per act, scene, and stagegroup

Returns

nx.Graph corresponding to the specified groupby

hyperbard.graph_representations.format_text_unit_node(elem)
hyperbard.graph_representations.get_bipartite_graph(df: DataFrame, groupby: list) Union[Graph, MultiDiGraph]

Create a weighted bipartite graph from an aggregated dataframe, with play-part nodes resolved at the level given by the groupby argument, where n_tokens and n_lines are potential weights.

Representations: se-{scene, group}-{b,w}, se-speech-mwd

Parameters
  • df – pd.DataFrame generated from an .agg.csv file

  • groupby – [“act”, “scene”] -> one play part node per act and scene, [“act”, “scene”, “stagegroup”] -> one play part node per act, scene, and stagegroup, [“act”, “scene”, “stagegroup”, “setting”, “speaker”] -> one play part node per act, scene, and stagegroup, directed edges for speech acts/information flow

Returns

nx.Graph (if groupby is not by speech act) or nx.MultiDiGraph (if groupby is by speech act)

hyperbard.graph_representations.get_weighted_bipartite_graph(df: DataFrame, groupby: list) DiGraph

hyperbard.hypergraph_representations module

hyperbard.plot_graph_rankings module

hyperbard.plot_hypergraph_rankings module

hyperbard.plot_rank_correlations module

hyperbard.plot_rank_correlations.get_correlation_dfs(ranking_files)
hyperbard.plot_rank_correlations.get_average_correlation(corrs)
hyperbard.plot_rank_correlations.get_asymmetric_correlation_difference(first_corr_df, second_corr_df)
hyperbard.plot_rank_correlations.plot_correlation_difference_matrix(selected_correlation, difference_correlation, selected_name)

hyperbard.plot_romeo module

hyperbard.plot_toy module

hyperbard.plotting module

hyperbard.plotting_utils module

hyperbard.plotting_utils.set_rcParams(fontsize=None)
hyperbard.plotting_utils.save_pgf_fig(path, axis_off=False, tight=False)
hyperbard.plotting_utils.get_character_color(k)
hyperbard.plotting_utils.get_formatted_labels(G, selected_labels)

hyperbard.preprocessing module

hyperbard.preprocessing.get_soup(file: str, parser: str = 'lxml-xml') BeautifulSoup

Parse an XML or HTML document with the specified BeautifulSoup parser.

Parameters
  • file – Path to file

  • parser – Parser to use

Returns

BeautifulSoup object containing the parsed file

hyperbard.preprocessing.get_cast_df(file: str) DataFrame
hyperbard.preprocessing.get_body(soup: BeautifulSoup) Tag

Extract the text body from an appropriately shaped BeautifulSoup object.

Parameters

soup – BeautifulSoup with exactly one text.body object

Returns

The text body as a BeautifulSoup object

hyperbard.preprocessing.is_leaf(elem: Tag) bool

Check if a bs4 Tag element has at most one child, i.e., if it is a leaf in the Tag tree (single children are NavigableStrings)

Parameters

elem – bs4 Tag element

Returns

If the element has at most one child

hyperbard.preprocessing.get_attrs(elem: Tag) dict

Get the attributes of a bs4 Tag element, plus its tag name and - if the element is a leaf - its text, as a dictionary.

Parameters

elem – bs4 Tag element

Returns

Dictionary containing the element’s tag name, attributes, and - if the element is a leaf - text

hyperbard.preprocessing.is_navigable_string(elem: PageElement) bool
hyperbard.preprocessing.is_redundant_element(elem: Union[Tag, NavigableString]) bool

Check if an element is redundant, i.e., it is (contained in) a “header” or a “speaker” element, because the information contained in “head” and “speaker” elements and their descendants is already encoded in XML attributes of other tags.

Parameters

elem – bs4 element

Returns

If the element is redundant

hyperbard.preprocessing.is_descendant_of_redundant_element(elem: Union[Tag, NavigableString]) bool
hyperbard.preprocessing.keep_elem_in_xml_df(elem: Union[Tag, NavigableString]) bool

Decide whether to keep an bs4 element in the XML dataframe produced as the raw preprocessed data.

Parameters

elem – bs4 element from a BeautifulSoup object

Returns

Whether to keep the element

hyperbard.preprocessing.get_xml_df(body: Tag) DataFrame

Construct a pd.DataFrame from the non-redundant XML tags of a TEI-encoded BeautifulSoup object.

Parameters

body – Body of a TEI-encoded BeautifulSoup object

Returns

pd.DataFrame containing all non-redundant XML tags with names, attributes, and text

hyperbard.preprocessing.set_act(df: DataFrame) None

Adds act information to a pd.DataFrame created with get_xml_df, using the observation that rows with type == “act” hold the new act number in column “n”. Complete via first forward-filling, then backward-filling, and convert act numbers to integers. Inductions, prologues, and epilogues receive special treatment as act 0 (inductions and prologues) resp. act 6 (epilogues).

Parameters

df – pd.DataFrame created with get_xml_df

Returns

None

hyperbard.preprocessing.set_scene(df: DataFrame) None

Adds scene information to a pd.DataFrame created with get_xml_df, using the observation that rows with type == “scene” hold the new scene number in column “n”. Complete via first forward-filling, then backward-filling, and convert act numbers to integers. Acts, inductions, prologues, and epilogues receive special treatment as scene 0.

Parameters

df – pd.DataFrame created with get_xml_df

Returns

None

hyperbard.preprocessing.is_entrance(row)
hyperbard.preprocessing.is_exit(row)
hyperbard.preprocessing.has_speaker(row)
hyperbard.preprocessing.is_new_act(row, prev_act)
hyperbard.preprocessing.is_new_scene(row, prev_scene)
hyperbard.preprocessing.set_onstage(df: DataFrame) None

Adds information on who is onstage to a pd.DataFrame created with get_xml_df, primarily based on hints in the XML attributes of “stage” and “sp” tags.

Notes:

  • We ensure that the speaker(s) are always onstage. Note that there can be multiple speakers for one line, hence the need to treat the “who” attribute as a set.

  • We flush characters when a new act starts. Rationale: Limit repercussions of encoding “errors” in stage directions, found, e.g., when characters are dead or unconscious and not marked as exiting. Example: R&J - Juliet not marked up as exiting at the end of Act IV

  • We _also_ flush characters when a new scene starts. Rationale: The same as for flushing when a new act starts, but somewhat more problematic. Example: R&J - Citizen from Act III, Scene I never marked up as exiting, and thus still onstage in Act III, Scene V (on the balcony!).

  • Thus, we currently model character presence on stage “conservatively” overall, and we are looking into better character management (not relying on the markup) as a potential improvement.

  • Flushing when a new scene starts is problematic: Stage directions in the Folger Shakespeare often use “Exeunt all but”, and as a consequence, only exits are marked up and not entries in the next scene. Example: Julius Caesar - Brutus and Cassius not marked up to enter in Act IV Scene III, but rather staying from Act IV Scene II (stage directions differ from the Oxford Shakespeare).

  • Even flushing when a new act starts is problematic with the Folger stage directions, but the problematic instances are very rare. We limit the impact of errors introduced by this modeling choice by also ensuring that the speaker is always onstage.

Parameters

df – pd.DataFrame created with get_xml_df, with act and scene already annotated

Returns

None

hyperbard.preprocessing.set_stagegroup(df: DataFrame) None
hyperbard.preprocessing.get_who_attributes(elem: Tag) Union[str, float]
hyperbard.preprocessing.get_descendants_ids(elem: Tag) List[str]
hyperbard.preprocessing.set_speaker(df: DataFrame, body: Tag) None
hyperbard.preprocessing.get_raw_xml_df(file: str) DataFrame

Construct and enrich a pd.DataFrame from the non-redundant XML tags of a TEI-encoded BeautifulSoup object.

Produces a DataFrame object of the shape of the *.raw.csv files.

Parameters

file – Path to file

Returns

pd.DataFrame containing all non-redundant XML tags with names, attributes, text, and annotations

hyperbard.preprocessing.get_aggregated(df: DataFrame) DataFrame

Given a pd.DataFrame output by get_raw_xml_df, produce a pd.DataFrame containing only spoken words, aggregated by speech acts, i.e., consecutive settings with the same speaker and the same other characters on stage.

Parameters

df – pd.DataFrame output by get_raw_xml_df

Returns

pd.DataFrame containing only spoken words, aggregated by speech acts

hyperbard.preprocessing.set_setting(aggregated)
hyperbard.preprocessing.get_grouped_df(aggregated)
hyperbard.preprocessing.get_agg_xml_df(df: DataFrame) DataFrame

Given a pd.DataFrame output by get_raw_xml_df, produce a pd.DataFrame containing only spoken words, aggregated by speech acts, i.e., consecutive settings with the same speaker and the same other characters on stage, with full setting annotations.

Produces a DataFrame object of the shape of the *.agg.csv files.

Parameters

df – pd.DataFrame output by get_raw_xml_df

Returns

pd.DataFrame containing only spoken words, aggregated by speech acts

hyperbard.ranking module

hyperbard.raw_summary_statistics module

hyperbard.run_preprocessing module

hyperbard.statics module

hyperbard.utils module

hyperbard.utils.character_string_to_sorted_list(character_string: str) List[str]

Given a string of character identifiers of shape “id1 id2 id3 … id1 idn”, return a sorted, deduplicated list of these identifiers.

Parameters

character_string – String of character identifiers separated by whitespace

Returns

Sorted list of unique character identifiers

hyperbard.utils.get_name_from_identifier(character_identifier: str) str

Given a character identifier of shape “#CharacterName_PlayAbbreviation”, extract CharacterName.

Parameters

character_identifier – Character identifier of shape “#CharacterName_PlayAbbreviation”

Returns

Character name

hyperbard.utils.remove_hashtag(identifier: str) str
hyperbard.utils.remove_play_abbreviation(identifier: str) str
hyperbard.utils.remove_uppercase_prefixes(identifier: str) str
hyperbard.utils.sort_join_strings(string_iterable: Iterable[str]) str

Sort and concatenate an iterable of strings, joining on a whitespace character.

Parameters

string_iterable – Iterable of strings (e.g., a list or a set)

Returns

String with the entries in the iterable sorted and concatenated with a whitespace as the join character

hyperbard.utils.get_filename_base(file: str, full: bool = True) str

Given a file name of shape “path/to/PlayName_XMLFlavor_Source.ext”, extract PlayName (if not full) or PlayName_XMLFlavor (if full).

Parameters
  • file – Path of shape “path/to/PlayName_XMLFlavor_Source.ext”

  • full – Return “_XMLFlavor_Source” as part of the file name

Returns

String of shape “PlayName(_XMLFlavor_Source)”

hyperbard.utils.string_to_set(character_string: Union[str, float]) Union[set, float]

Given a string of character identifiers of shape “id1 id2 id3 … id1 idn”, or nan, return a set of the identifiers or nan.

Parameters

character_string – string of character identifiers of shape “id1 id2 id3 … id1 idn” or nan

Returns

set of identifiers or nan

Module contents