Visualization

textacy.viz.termite.draw_termite_plot(values_mat, col_labels, row_labels, *, highlight_cols=None, highlight_colors=None, save=False, rc_params=None)[source]

Make a “termite” plot, typically used for assessing topic models with a tabular layout that promotes comparison of terms both within and across topics.

Parameters
  • values_mat (np.ndarray or matrix) – matrix of values with shape (# row labels, # col labels) used to size the dots on the grid

  • col_labels (seq[str]) – labels used to identify x-axis ticks on the grid

  • row_labels (seq[str]) – labels used to identify y-axis ticks on the grid

  • highlight_cols (int or seq[int], optional) – indices for columns to visually highlight in the plot with contrasting colors

  • highlight_colors (tuple of 2-tuples) – each 2-tuple corresponds to a pair of (light/dark) matplotlib-friendly colors used to highlight a single column; if not specified (default), a good set of 6 pairs are used

  • save (str, optional) – give the full /path/to/fname on disk to save figure

  • rc_params (dict, optional) – allow passing parameters to rc_context in matplotlib.plyplot, details in https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.rc_context.html

Returns

Axis on which termite plot is plotted.

Return type

matplotlib.axes.Axes.axis

Raises

ValueError – if more columns are selected for highlighting than colors or if any of the inputs’ dimensions don’t match

References

Chuang, Jason, Christopher D. Manning, and Jeffrey Heer. “Termite: Visualization techniques for assessing textual topic models.” Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 2012.

textacy.viz.termite.termite_df_plot(components, *, highlight_topics=None, n_terms=25, rank_terms_by='max', sort_terms_by='seriation', save=False, rc_params=None)[source]

Make a “termite” plot for assessing topic models using a tabular layout to promote comparison of terms both within and across topics.

Parameters
  • components (pandas.DataFrame or sparse matrix) – corpus represented as a term-topic matrix with shape (n_terms, n_topics); should have terms as index and topics as column names

  • topics (int or Sequence[int]) – topic(s) to include in termite plot; if -1, all topics are included

  • highlight_topics (str or Sequence[str]) – names for up to 6 topics to visually highlight in the plot with contrasting colors

  • n_terms (int) – number of top terms to include in termite plot

  • rank_terms_by ({'max', 'mean', 'var'}) – argument to dataframe agg function, used to rank terms; the top-ranked n_terms are included in the plot

  • sort_terms_by ({'seriation', 'weight', 'index', 'alphabetical'}) – method used to vertically sort the selected top n_terms terms; the default (“seriation”) groups similar terms together, which facilitates cross-topic assessment

  • save (str) – give the full /path/to/fname on disk to save figure rc_params (dict, optional): allow passing parameters to rc_context in matplotlib.plyplot, details in https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.rc_context.html

Returns

Axis on which termite plot is plotted.

Return type

matplotlib.axes.Axes.axis

Raises

ValueError – if more than 6 topics are selected for highlighting, or an invalid value is passed for the sort_topics_by, rank_terms_by, and/or sort_terms_by params

References

  • Chuang, Jason, Christopher D. Manning, and Jeffrey Heer. “Termite:

    Visualization techniques for assessing textual topic models.” Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 2012.

  • Fajwel Fogel, Alexandre d’Aspremont, and Milan Vojnovic. 2016.

    Spectral ranking using seriation. J. Mach. Learn. Res. 17, 1 (January 2016), 3013–3057.

See also

viz.termite_plot

TODO: rank_terms_by other metrics, e.g. topic salience or relevance

textacy.viz.network.draw_semantic_network(graph, *, node_weights=None, spread=3.0, draw_nodes=False, base_node_size=300, node_alpha=0.25, line_width=0.5, line_alpha=0.1, base_font_size=12, save=False)[source]

Draw a semantic network with nodes representing either terms or sentences, edges representing coocurrence or similarity, and positions given by a force- directed layout.

Parameters
  • graph (networkx.Graph) –

  • node_weights (dict) – mapping of node: weight, used to size node labels (and, optionally, node circles) according to their weight

  • spread (float) – number that drives the spread of the network; higher values give more spread-out networks

  • draw_nodes (bool) – if True, circles are drawn under the node labels

  • base_node_size (int) – if node_weights not given and draw_nodes is True, this is the size of all nodes in the network; if node_weights _is_ given, node sizes will be scaled against this value based on their weights compared to the max weight

  • node_alpha (float) – alpha of the circular nodes drawn behind labels if draw_nodes is True

  • line_width (float) – width of the lines (edges) drawn between nodes

  • line_alpha (float) – alpha of the lines (edges) drawn between nodes

  • base_font_size (int) – if node_weights not given, this is the font size used to draw all labels; otherwise, font sizes will be scaled against this value based on the corresponding node weights compared to the max

  • save (str) – give the full /path/to/fname on disk to save figure (optional)

Returns

Axis on which network plot is drawn.

Return type

matplotlib.axes.Axes.axis

Note

This function requires matplotlib.