Spanplot#

Visualize segment-duration distributions with SequenceVisualizer.

Three chart styles and two grouping dimensions are available:

  • kind: "box" (default), "violin", or "strip"

  • group_by: "category" (one column per label) or "id" (one column per sequence)

Note

Compatible with interval and state pools only. Event pools have no duration; passing one raises UnsupportedSequenceTypeError.

Imports#

import polars as pl

from tanat import build_intervals
from tanat.dataset import simulate_intervals, simulate_static
from tanat.visualization import SequenceVisualizer

Simulate data#

simulate_intervals() produces one row per interval. The second feature (status) is categorical; it groups the duration boxes.

temporal = simulate_intervals(
    n_ids=80,
    seq_length_range=(4, 15),
    features=["value", "status"],
    seed=42,
)
print(temporal.shape, temporal.columns.tolist())
(780, 5) ['id', 'start', 'end', 'value', 'status']
temporal.head()
id start end value status
0 1 2001-04-06 08:18:02.932054 2001-04-07 20:56:25.990885 95 E
1 1 2007-05-15 16:54:22.261907 2007-06-02 23:31:22.695760 44 E
2 1 2014-11-12 14:11:37.339414 2014-11-20 07:09:53.859653 17 A
3 1 2016-08-04 04:29:51.196041 2016-08-30 12:43:26.216389 84 D
4 1 2020-11-27 14:10:58.505661 2020-12-05 22:44:00.293302 63 E


Build the pool#

pool = build_intervals(
    temporal_data=temporal,
    id_column="id",
    start_column="start",
    end_column="end",
)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (80 sequences · 780 entities · 0.00s)
pool.cast_features({"status": pl.Categorical}, is_static=False)
print(pool)
┌────────────────────────────────────────────────┐
│          IntervalSequencePool Summary          │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          80
  Store              /home/runner/.tanat/_quick_interval_fb7042aa
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-01-05 17:41:31.636713 → 2025-01-19 23:04:04.968485]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • status              Categorical (5 categories)
  • value               Numerical [1 → 100]

Box plot (default)#

kind="box" (default) renders a standard box-and-whisker plot. Groups are sorted by ascending median duration.

# fmt: off
SequenceVisualizer.spanplot(kind="box", display_unit="hours") \
    .title("Duration distribution by status (box)") \
    .y_axis(label="Duration (h)") \
    .colors("Set2") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Duration distribution by status (box)

Violin plot#

kind="violin" shows the full kernel-density estimate, more informative when the distribution is multimodal or skewed.

# fmt: off
SequenceVisualizer.spanplot(kind="violin", display_unit="hours") \
    .title("Duration distribution by status (violin)") \
    .y_axis(label="Duration (h)") \
    .colors("Set2") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Duration distribution by status (violin)

Strip plot#

kind="strip" renders individual points with horizontal jitter, ideal for spotting outliers and showing raw data density.

# fmt: off
SequenceVisualizer.spanplot(kind="strip", display_unit="hours") \
    .title("Duration distribution by status (strip)") \
    .y_axis(label="Duration (h)") \
    .marker(alpha=0.4, point_size=3.5) \
    .colors("Set2") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Duration distribution by status (strip)

Group by sequence ID#

group_by="id" shows one distribution per sequence ID. We work on a small subset to keep the chart readable.

small_pool = pool.subset(ids=pool.unique_ids[:15])
# Box: one distribution per ID
# fmt: off
SequenceVisualizer.spanplot(group_by="id", kind="box", display_unit="hours") \
    .title("Duration per sequence ID (box)") \
    .y_axis(label="Duration (h)") \
    .x_axis(rotation=45) \
    .colors("tab20") \
    .draw(small_pool, entity_feature="status") \
    .show()
# fmt: on
Duration per sequence ID (box)

Sort order#

sort controls the group ordering on the x-axis:

  • "ascending": ascending median duration (default)

  • "descending": descending median duration

  • "alphabetic": alphabetical label order

# Descending: largest median first
# fmt: off
SequenceVisualizer.spanplot(kind="box", display_unit="hours", sort="descending") \
    .title("sort='descending': largest median first") \
    .y_axis(label="Duration (h)") \
    .colors("Set2") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
sort='descending': largest median first

Horizontal orientation#

orientation="horizontal" moves group labels to the y-axis, especially useful when label names are long.

# fmt: off
SequenceVisualizer.spanplot(
    kind="box",
    display_unit="hours",
    orientation="horizontal",
) \
    .title("Duration distribution (horizontal box)") \
    .x_axis(label="Duration (h)") \
    .colors("Pastel1") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Duration distribution (horizontal box)

Single sequence#

Pass a Sequence directly for a per-individual view.

seq = pool[pool.unique_ids[0]]
print(f"ID {seq.id_value}: {len(seq)} intervals")
ID 1: 5 intervals
# fmt: off
SequenceVisualizer.spanplot(kind="strip", display_unit="hours") \
    .title(f"Duration distribution, sequence {seq.id_value}") \
    .y_axis(label="Duration (h)") \
    .colors("tab10") \
    .draw(seq, entity_feature="status") \
    .show()
# fmt: on
Duration distribution, sequence 1

Faceting#

.facet() splits the chart into a grid of panels, one per unique value of a chosen feature. Here we attach per-sequence static data and facet on group.

static_df = simulate_static(n_ids=80, features=["age", "group"], seed=0)
pool.add_static_features(static_df)
pool.cast_features({"group": pl.Categorical}, is_static=True)
# fmt: off
SequenceVisualizer.spanplot(kind="box", display_unit="hours") \
    .facet(by="group", is_static=True, cols=3) \
    .title("Duration distribution faceted by group") \
    .y_axis(label="Duration (h)") \
    .colors("Set2") \
    .draw(pool, entity_feature="status") \
    .show()
# fmt: on
Duration distribution faceted by group, group = A, group = B, group = C, group = D, group = E

Inspect prepare_data()#

prepare_data() returns the flat Polars DataFrame before rendering. Each row is one segment; __DURATION__ holds the computed duration.

builder = SequenceVisualizer.spanplot(display_unit="hours")
df = builder.prepare_data(pool, entity_feature="status")
df.head()
shape: (5, 4)
__ID____LABEL____DURATION____COLOR__
i64strf64str
1"E"36.639738"#9467bd"
1"E"438.616787"#9467bd"
1"A"184.971256"#1f77b4"
1"D"632.226394"#d62728"
1"E"200.550496"#9467bd"


Total running time of the script: (0 minutes 1.282 seconds)

Gallery generated by Sphinx-Gallery