Barplot#

Aggregate entity features across a SequencePool with SequenceVisualizer.

Three aggregation modes are available:

  • show_as="count": raw occurrences per label (all pool types)

  • show_as="rate": relative frequency, bars sum to 1 (all pool types)

  • show_as="duration": total cumulated duration per label (interval / state pools only)

Imports#

import polars as pl

from tanat import build_intervals
from tanat.dataset import simulate_intervals, simulate_static
from tanat.visualization import SequenceVisualizer

Simulate data#

simulate_intervals() produces one row per interval. The second feature (status) is categorical; it groups the bars.

temporal = simulate_intervals(
    n_ids=80,
    seq_length_range=(3, 12),
    features=["value", "status"],
    seed=42,
)
print(temporal.shape, temporal.columns.tolist())
(612, 5) ['id', 'start', 'end', 'value', 'status']
temporal.head()
id start end value status
0 1 2005-12-10 07:06:29.451655 2005-12-21 09:20:16.986150 95 E
1 1 2016-10-16 04:02:09.928755 2016-11-10 21:57:50.336774 44 E
2 1 2021-04-27 14:20:04.629001 2021-05-07 06:23:57.841244 17 A
3 2 2002-02-13 11:59:12.580655 2002-02-23 02:15:17.350702 84 B
4 2 2004-09-09 02:31:55.499932 2004-09-21 22:27:34.852686 63 D


Build the pool#

pool = build_intervals(
    temporal_data=temporal,
    id_column="id",
    start_column="start",
    end_column="end",
)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (80 sequences · 612 entities · 0.00s)
pool.cast_features({"status": pl.Categorical}, is_static=False)
print(pool)
┌────────────────────────────────────────────────┐
│          IntervalSequencePool Summary          │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          80
  Store              /home/runner/.tanat/_quick_interval_9a3daaa0
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-01-06 04:30:56.712327 → 2025-01-20 12:52:39.461948]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • status              Categorical (5 categories)
  • value               Numerical [1 → 100]

Count: occurrences per label#

show_as="count" (default) counts how many intervals carry each label.

# fmt: off
SequenceVisualizer.barplot(show_as="count") \
    .title("Interval count by status") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Interval count by status

Rate: relative frequency#

show_as="rate" normalises counts so bars sum to 1. Combine with sort="descending" to put the most frequent label first.

# fmt: off
SequenceVisualizer.barplot(show_as="rate", sort="descending") \
    .title("Relative frequency by status (descending)") \
    .y_axis(label="Rate") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Relative frequency by status (descending)

Duration: total time per label#

show_as="duration" sums end start per label. display_unit converts the result to a human-readable time unit.

Note

Duration mode requires an interval or state pool. Event pools (point observations) have no duration.

# fmt: off
SequenceVisualizer.barplot(show_as="duration", display_unit="hours") \
    .title("Total duration per status (hours)") \
    .y_axis(label="Hours") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Total duration per status (hours)

Horizontal orientation#

orientation="horizontal" flips the axes, handy when label names are long.

# fmt: off
SequenceVisualizer.barplot(
    show_as="count",
    orientation="horizontal",
    sort="descending",
) \
    .title("Interval count by status (horizontal)") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Interval count by status (horizontal)

Color customization#

The .colors() method accepts three formats:

  • Named colormap string: "Set2", "tab10", "Pastel1", …

  • Dict mapping label → hex color

  • No argument (default): matplotlib default color cycle

# Named colormap
# fmt: off
SequenceVisualizer.barplot(show_as="count") \
    .colors("Set2") \
    .title("Count (Set2 palette)") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Count (Set2 palette)
# Explicit dict: one color per label
palette = {
    "A": "#2ecc71",
    "B": "#e74c3c",
    "C": "#3498db",
    "D": "#f39c12",
    "E": "#9b59b6",
}
# fmt: off
SequenceVisualizer.barplot(show_as="count") \
    .colors(palette) \
    .title("Count (custom dict palette)") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Count (custom dict palette)

Single sequence#

Pass a Sequence directly for a per-individual view.

seq = pool[pool.unique_ids[0]]
print(f"ID {seq.id_value}: {len(seq)} intervals")
ID 1: 3 intervals
# fmt: off
SequenceVisualizer.barplot(show_as="count") \
    .title(f"Status counts, sequence {seq.id_value}") \
    .colors("Set2") \
    .draw(seq, entity_feature="status") \
    .show()
# fmt: on
Status counts, sequence 1

Layout and style#

# Grid + capped y-axis
# fmt: off
SequenceVisualizer.barplot(show_as="rate", sort="descending") \
    .figsize(8, 4) \
    .grid() \
    .x_axis(rotation=30) \
    .y_axis(limit_max=1, label="Rate") \
    .colors("Set2") \
    .title("Rate (grid, capped y-axis)") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Rate (grid, capped y-axis)
# Slim bars with a visible edge
# fmt: off
SequenceVisualizer.barplot(show_as="count") \
    .colors("Set2") \
    .marker(bar_width=0.5, alpha=0.85, edge_color="#333333") \
    .title("Count (slim bars with edge)") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Count (slim bars with edge)

Faceting#

.facet() splits the chart into a grid of panels, one per unique value of a chosen feature. Here we attach per-sequence static data and facet on group.

static_df = simulate_static(n_ids=80, features=["age", "group"], seed=0)
pool.add_static_features(static_df)
pool.cast_features({"group": pl.Categorical}, is_static=True)
# fmt: off
SequenceVisualizer.barplot(show_as="count") \
    .facet(by="group", is_static=True, cols=3) \
    .colors("Set2") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
group = A, group = B, group = C, group = D, group = E

Inspect prepare_data()#

prepare_data() returns the aggregated Polars DataFrame before rendering. The result is cached: calling .draw() on the same builder reuses it.

builder = SequenceVisualizer.barplot(show_as="rate", sort="descending")
df = builder.prepare_data(pool, entity_feature="status")
df
shape: (5, 3)
__LABEL____VALUE____COLOR__
strf64str
"D"0.215686"#d62728"
"C"0.214052"#2ca02c"
"A"0.207516"#1f77b4"
"E"0.205882"#9467bd"
"B"0.156863"#ff7f0e"


Total running time of the script: (0 minutes 1.135 seconds)

Gallery generated by Sphinx-Gallery