Visualization¶

textacy.viz.termite.
draw_termite_plot
(values_mat, col_labels, row_labels, *, highlight_cols=None, highlight_colors=None, save=False, rc_params=None)[source]¶ Make a “termite” plot, typically used for assessing topic models with a tabular layout that promotes comparison of terms both within and across topics.
 Parameters
values_mat (
np.ndarray
or matrix) – matrix of values with shape (# row labels, # col labels) used to size the dots on the gridcol_labels (seq[str]) – labels used to identify xaxis ticks on the grid
row_labels (seq[str]) – labels used to identify yaxis ticks on the grid
highlight_cols (int or seq[int], optional) – indices for columns to visually highlight in the plot with contrasting colors
highlight_colors (tuple of 2tuples) – each 2tuple corresponds to a pair of (light/dark) matplotlibfriendly colors used to highlight a single column; if not specified (default), a good set of 6 pairs are used
save (str, optional) – give the full /path/to/fname on disk to save figure
rc_params (dict, optional) – allow passing parameters to rc_context in matplotlib.plyplot, details in https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.rc_context.html
 Returns
Axis on which termite plot is plotted.
 Return type
 Raises
ValueError – if more columns are selected for highlighting than colors or if any of the inputs’ dimensions don’t match
References
Chuang, Jason, Christopher D. Manning, and Jeffrey Heer. “Termite: Visualization techniques for assessing textual topic models.” Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 2012.
See also

textacy.viz.termite.
termite_df_plot
(components, *, highlight_topics=None, n_terms=25, rank_terms_by='max', sort_terms_by='seriation', save=False, rc_params=None)[source]¶ Make a “termite” plot for assessing topic models using a tabular layout to promote comparison of terms both within and across topics.
 Parameters
components (
pandas.DataFrame
or sparse matrix) – corpus represented as a termtopic matrix with shape (n_terms, n_topics); should have terms as index and topics as column namestopics (int or Sequence[int]) – topic(s) to include in termite plot; if 1, all topics are included
highlight_topics (str or Sequence[str]) – names for up to 6 topics to visually highlight in the plot with contrasting colors
n_terms (int) – number of top terms to include in termite plot
rank_terms_by ({'max', 'mean', 'var'}) – argument to dataframe agg function, used to rank terms; the topranked
n_terms
are included in the plotsort_terms_by ({'seriation', 'weight', 'index', 'alphabetical'}) – method used to vertically sort the selected top
n_terms
terms; the default (“seriation”) groups similar terms together, which facilitates crosstopic assessmentsave (str) – give the full /path/to/fname on disk to save figure rc_params (dict, optional): allow passing parameters to rc_context in matplotlib.plyplot, details in https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.rc_context.html
 Returns
Axis on which termite plot is plotted.
 Return type
matplotlib.axes.Axes.axis
 Raises
ValueError – if more than 6 topics are selected for highlighting, or an invalid value is passed for the sort_topics_by, rank_terms_by, and/or sort_terms_by params
References
 Chuang, Jason, Christopher D. Manning, and Jeffrey Heer. “Termite:
Visualization techniques for assessing textual topic models.” Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 2012.
 Fajwel Fogel, Alexandre d’Aspremont, and Milan Vojnovic. 2016.
Spectral ranking using seriation. J. Mach. Learn. Res. 17, 1 (January 2016), 3013–3057.
See also
viz.termite_plot
TODO: rank_terms_by other metrics, e.g. topic salience or relevance

textacy.viz.network.
draw_semantic_network
(graph, *, node_weights=None, spread=3.0, draw_nodes=False, base_node_size=300, node_alpha=0.25, line_width=0.5, line_alpha=0.1, base_font_size=12, save=False)[source]¶ Draw a semantic network with nodes representing either terms or sentences, edges representing coocurrence or similarity, and positions given by a force directed layout.
 Parameters
graph (
networkx.Graph
) –node_weights (dict) – mapping of node: weight, used to size node labels (and, optionally, node circles) according to their weight
spread (float) – number that drives the spread of the network; higher values give more spreadout networks
draw_nodes (bool) – if True, circles are drawn under the node labels
base_node_size (int) – if node_weights not given and draw_nodes is True, this is the size of all nodes in the network; if node_weights _is_ given, node sizes will be scaled against this value based on their weights compared to the max weight
node_alpha (float) – alpha of the circular nodes drawn behind labels if draw_nodes is True
line_width (float) – width of the lines (edges) drawn between nodes
line_alpha (float) – alpha of the lines (edges) drawn between nodes
base_font_size (int) – if node_weights not given, this is the font size used to draw all labels; otherwise, font sizes will be scaled against this value based on the corresponding node weights compared to the max
save (str) – give the full /path/to/fname on disk to save figure (optional)
 Returns
Axis on which network plot is drawn.
 Return type
Note
This function requires matplotlib.