Visualization¶
-
textacy.viz.termite.
draw_termite_plot
(values_mat, col_labels, row_labels, *, highlight_cols=None, highlight_colors=None, save=False, rc_params=None)[source]¶ Make a “termite” plot, typically used for assessing topic models with a tabular layout that promotes comparison of terms both within and across topics.
- Parameters
values_mat (
np.ndarray
or matrix) – matrix of values with shape (# row labels, # col labels) used to size the dots on the gridcol_labels (seq[str]) – labels used to identify x-axis ticks on the grid
row_labels (seq[str]) – labels used to identify y-axis ticks on the grid
highlight_cols (int or seq[int], optional) – indices for columns to visually highlight in the plot with contrasting colors
highlight_colors (tuple of 2-tuples) – each 2-tuple corresponds to a pair of (light/dark) matplotlib-friendly colors used to highlight a single column; if not specified (default), a good set of 6 pairs are used
save (str, optional) – give the full /path/to/fname on disk to save figure
rc_params (dict, optional) – allow passing parameters to rc_context in matplotlib.plyplot, details in https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.rc_context.html
- Returns
Axis on which termite plot is plotted.
- Return type
- Raises
ValueError – if more columns are selected for highlighting than colors or if any of the inputs’ dimensions don’t match
References
Chuang, Jason, Christopher D. Manning, and Jeffrey Heer. “Termite: Visualization techniques for assessing textual topic models.” Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 2012.
See also
-
textacy.viz.termite.
termite_df_plot
(components, *, highlight_topics=None, n_terms=25, rank_terms_by='max', sort_terms_by='seriation', save=False, rc_params=None)[source]¶ Make a “termite” plot for assessing topic models using a tabular layout to promote comparison of terms both within and across topics.
- Parameters
components (
pandas.DataFrame
or sparse matrix) – corpus represented as a term-topic matrix with shape (n_terms, n_topics); should have terms as index and topics as column namestopics (int or Sequence[int]) – topic(s) to include in termite plot; if -1, all topics are included
highlight_topics (str or Sequence[str]) – names for up to 6 topics to visually highlight in the plot with contrasting colors
n_terms (int) – number of top terms to include in termite plot
rank_terms_by ({'max', 'mean', 'var'}) – argument to dataframe agg function, used to rank terms; the top-ranked
n_terms
are included in the plotsort_terms_by ({'seriation', 'weight', 'index', 'alphabetical'}) – method used to vertically sort the selected top
n_terms
terms; the default (“seriation”) groups similar terms together, which facilitates cross-topic assessmentsave (str) – give the full /path/to/fname on disk to save figure rc_params (dict, optional): allow passing parameters to rc_context in matplotlib.plyplot, details in https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.rc_context.html
- Returns
Axis on which termite plot is plotted.
- Return type
matplotlib.axes.Axes.axis
- Raises
ValueError – if more than 6 topics are selected for highlighting, or an invalid value is passed for the sort_topics_by, rank_terms_by, and/or sort_terms_by params
References
- Chuang, Jason, Christopher D. Manning, and Jeffrey Heer. “Termite:
Visualization techniques for assessing textual topic models.” Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 2012.
- Fajwel Fogel, Alexandre d’Aspremont, and Milan Vojnovic. 2016.
Spectral ranking using seriation. J. Mach. Learn. Res. 17, 1 (January 2016), 3013–3057.
See also
viz.termite_plot
TODO: rank_terms_by other metrics, e.g. topic salience or relevance
-
textacy.viz.network.
draw_semantic_network
(graph, *, node_weights=None, spread=3.0, draw_nodes=False, base_node_size=300, node_alpha=0.25, line_width=0.5, line_alpha=0.1, base_font_size=12, save=False)[source]¶ Draw a semantic network with nodes representing either terms or sentences, edges representing coocurrence or similarity, and positions given by a force- directed layout.
- Parameters
graph (
networkx.Graph
) –node_weights (dict) – mapping of node: weight, used to size node labels (and, optionally, node circles) according to their weight
spread (float) – number that drives the spread of the network; higher values give more spread-out networks
draw_nodes (bool) – if True, circles are drawn under the node labels
base_node_size (int) – if node_weights not given and draw_nodes is True, this is the size of all nodes in the network; if node_weights _is_ given, node sizes will be scaled against this value based on their weights compared to the max weight
node_alpha (float) – alpha of the circular nodes drawn behind labels if draw_nodes is True
line_width (float) – width of the lines (edges) drawn between nodes
line_alpha (float) – alpha of the lines (edges) drawn between nodes
base_font_size (int) – if node_weights not given, this is the font size used to draw all labels; otherwise, font sizes will be scaled against this value based on the corresponding node weights compared to the max
save (str) – give the full /path/to/fname on disk to save figure (optional)
- Returns
Axis on which network plot is drawn.
- Return type
Note
This function requires matplotlib.