The Three Types of Sequences#

TanaT supports three types of temporal sequences depending on how each entity’s temporal extent is defined:

Type

Temporal extent

Key constraint

Builder

Event

Single timestamp

None

build_events()

Interval

[start, end]

Overlaps and gaps are allowed

build_intervals()

State

[start, end]

Contiguous, no overlap, no gap

build_states()

This example walks through each type step by step: data simulation, pool construction, pool-level exploration, navigation down to individual sequences and entities, and a comparison of the temporal_extent at the entity level.

For a broader conceptual introduction see Core Concepts.

Imports#

Each type has its own shortcut builder. All three live in the same tanat namespace.

from tanat import build_events, build_intervals, build_states
from tanat.dataset import (
    simulate_events,
    simulate_intervals,
    simulate_states,
    simulate_static,
)

1. Event Sequences#

An event is a point-in-time observation: it has one timestamp and no duration. Think of medical visits, user clicks, or purchase records.

Simulate data#

simulate_events() returns a DataFrame with columns id, time, and one column per feature.

events_data = simulate_events(
    n_ids=10,
    features=["value", "category"],
    seed=42,
)
events_data.head()
id time value category
0 1 2013-11-03 06:53:24.300281 53 C
1 1 2013-12-24 16:18:54.022502 98 D
2 1 2015-11-14 05:56:41.128402 74 A
3 2 2000-10-08 10:16:59.399710 77 C
4 2 2001-06-16 10:05:43.045833 72 A


Build the pool#

build_events() needs at minimum:

  • id_column: the column that identifies each sequence

  • time_column: the column containing the event timestamp

All remaining columns are automatically inferred as entity features.

events_pool = build_events(
    temporal_data=events_data,
    id_column="id",
    time_column="time",
)
┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
print(events_pool)
┌────────────────────────────────────────────────┐
│           EventSequencePool Summary            │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          10
  Store              /home/runner/.tanat/_quick_event_20b56094
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-07-26 09:45:34.720567 → 2024-03-25 12:26:53.773965]
  Columns            ['time']
  t0                 position=0, anchor=None

Entity Features (2)
─────────────────────────
  • category            String [len 1 → 1]
  • value               Numerical [5 → 98]

Explore the pool#

print(f"Number of sequences : {len(events_pool)}")
print(f"First IDs           : {events_pool.unique_ids[:5]}")
Number of sequences : 10
First IDs           : [1, 2, 3, 4, 5]

Temporal data of the pool in tabular form (one row = one entity)

events_pool.temporal_data().head()
id time category value
0 1 2013-11-03 06:53:24.300281 C 53
1 1 2013-12-24 16:18:54.022502 D 98
2 1 2015-11-14 05:56:41.128402 A 74
3 2 2000-10-08 10:16:59.399710 C 77
4 2 2001-06-16 10:05:43.045833 A 72


Iterate#

Pool → one sequence per ID

for seq in events_pool.subset(events_pool.unique_ids[:3]):
    print(f"  ID {seq.id_value!r}: {len(seq)} events")
ID 1: 3 events
ID 2: 9 events
ID 3: 8 events

Sequence → one entity per row

for entity in event_seq:
    print(f"  t={entity.temporal_extent}  value={entity['value']}")
t=2013-11-03 06:53:24.300281  value=53
t=2013-12-24 16:18:54.022502  value=98
t=2015-11-14 05:56:41.128402  value=74

Static features#

Per-sequence static data (age, group, …) can be attached at build time via static_data, or added later with add_static_features(). Static features are shared by all entities of a given sequence.

Generate one row of static attributes per sequence ID

static_df = simulate_static(n_ids=10, features=["age", "group"], seed=0)
static_df.head()
id age group
0 1 86 D
1 2 64 E
2 3 52 C
3 4 27 D
4 5 31 E


Option 1: attach at build time

events_pool_with_static = build_events(
    temporal_data=events_data,
    id_column="id",
    time_column="time",
    static_data=static_df,
)
events_pool_with_static.static_data().head()
┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity, time index & static features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
id age group
0 1 86 D
1 2 64 E
2 3 52 C
3 4 27 D
4 5 31 E


Option 2: add to an existing pool in place

events_pool.add_static_features(static_df)
events_pool.static_data().head()
id age group
0 1 86 D
1 2 64 E
2 3 52 C
3 4 27 D
4 5 31 E


Static data is also accessible per-sequence (single row)

events_pool[events_pool.unique_ids[0]].static_data()
id age group
0 1 86 D


2. Interval Sequences#

An interval spans a period of time with a start and an end. Unlike states, intervals are not required to be contiguous: two intervals can overlap and gaps between them are allowed. Think of overlapping treatments, project assignments, or sensor readings.

Simulate data#

simulate_intervals() produces a DataFrame with id, start, end, and feature columns.

intervals_data = simulate_intervals(
    n_ids=10,
    features=["value", "category"],
    seed=42,
)
intervals_data.head()
id start end value category
0 1 2013-11-03 06:53:24.300281 2013-11-13 02:26:21.665986 53 C
1 1 2013-12-24 16:18:54.022502 2013-12-26 13:45:51.188795 98 D
2 1 2015-11-14 05:56:41.128402 2015-11-27 21:54:00.218863 74 A
3 2 2001-06-16 10:05:43.045833 2001-07-10 03:41:18.398570 77 C
4 2 2005-05-13 14:05:36.861038 2005-06-02 20:27:20.867682 72 A


Build the pool#

build_intervals() needs:

  • id_column: sequence identifier

  • start_column: interval start

  • end_column: interval end

intervals_pool = build_intervals(
    temporal_data=intervals_data,
    id_column="id",
    start_column="start",
    end_column="end",
)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
print(intervals_pool)
┌────────────────────────────────────────────────┐
│          IntervalSequencePool Summary          │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          10
  Store              /home/runner/.tanat/_quick_interval_40b641d6
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-07-27 05:52:43.701012 → 2024-11-19 06:48:23.343189]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • category            String [len 1 → 1]
  • value               Numerical [5 → 98]

Navigate to a sequence then to an entity#

interval_seq = intervals_pool[intervals_pool.unique_ids[0]]
print(f"→ {len(interval_seq)} intervals for ID {interval_seq.id_value!r}")
→ 3 intervals for ID 1
interval_entity = interval_seq[0]
print(interval_entity)
┌────────────────────────────────────────────────┐
│             IntervalEntity Summary             │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequence ID        1
  Rank               0

Entity Features
─────────────────────────
  value              53
  category           C

The temporal extent is now a (start, end) pair

print("features      :", interval_entity.data())
print("temporal span :", interval_entity.temporal_extent)  # (start, end)
print("feature value :", interval_entity["value"])
features      : {'value': 53, 'category': 'C'}
temporal span : [datetime.datetime(2013, 11, 3, 6, 53, 24, 300281), datetime.datetime(2013, 11, 13, 2, 26, 21, 665986)]
feature value : 53

3. State Sequences#

A state sequence partitions the timeline into contiguous, non-overlapping periods: end[i] == start[i+1] within every sequence. The individual is in exactly one state at any point in time. Think of disease stages, employment status, or device modes.

Simulate data#

simulate_states() guarantees strict continuity: end[i] == start[i+1] by construction.

states_data = simulate_states(
    n_ids=10,
    features=["value", "category"],
    seed=42,
)
states_data.head()
id start end value category
0 1 2015-11-14 05:56:41.128402 2015-12-11 09:22:51.537076 53 C
1 1 2015-12-11 09:22:51.537076 2016-01-07 18:13:11.484612 98 D
2 1 2016-01-07 18:13:11.484612 2025-01-01 00:00:00.000000 74 A
3 2 2007-08-07 16:08:06.331871 2007-08-13 21:43:12.768137 77 C
4 2 2007-08-13 21:43:12.768137 2007-09-05 08:58:08.065324 72 A


Build the pool#

build_states() accepts the same start_column / end_column pair as build_intervals().

Note

end_column is optional for state sequences. When omitted, the end of state i is automatically derived from the start of state i+1. The last state per sequence will have end = null unless you supply an explicit sentinel value to the builder.

states_pool = build_states(
    temporal_data=states_data,
    id_column="id",
    start_column="start",
    end_column="end",  # optional: omit to let TanaT infer it
)
┌─ State SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
print(states_pool)
┌────────────────────────────────────────────────┐
│           StateSequencePool Summary            │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          10
  Store              /home/runner/.tanat/_quick_state_cf0d3870
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-07-26 09:45:34.720567 → 2025-01-01 00:00:00]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • category            String [len 1 → 1]
  • value               Numerical [5 → 98]

Navigate to a sequence then to an entity#

state_seq = states_pool[states_pool.unique_ids[0]]
print(f"→ {len(state_seq)} states for ID {state_seq.id_value!r}")
→ 3 states for ID 1
state_entity = state_seq[0]
print(state_entity)
┌────────────────────────────────────────────────┐
│              StateEntity Summary               │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequence ID        1
  Rank               0

Entity Features
─────────────────────────
  value              53
  category           C

Like intervals, the temporal extent is a (start, end) pair

print("features      :", state_entity.data())
print("temporal span :", state_entity.temporal_extent)  # (start, end)
print("feature value :", state_entity["value"])
features      : {'value': 53, 'category': 'C'}
temporal span : [datetime.datetime(2015, 11, 14, 5, 56, 41, 128402), datetime.datetime(2015, 12, 11, 9, 22, 51, 537076)]
feature value : 53

4. Side-by-side comparison#

To summarise the differences, we build all three pools from the same underlying dataset (states data, which contains both start and end columns) and compare the temporal_extent of the first entity of the first sequence.

Re-use states_data for all three types so the raw data is identical

common_events = build_events(
    temporal_data=states_data,
    id_column="id",
    time_column="start",  # use start as the single event timestamp
)
common_intervals = build_intervals(
    temporal_data=states_data,
    id_column="id",
    start_column="start",
    end_column="end",
)
common_states = build_states(
    temporal_data=states_data,
    id_column="id",
    start_column="start",
    end_column="end",
)

first_id = common_events.unique_ids[0]

for label, pool in [
    ("Event   ", common_events),
    ("Interval", common_intervals),
    ("State   ", common_states),
]:
    entity = pool[first_id][0]
    print(f"{label} → temporal_extent: {entity.temporal_extent}")
┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
┌─ State SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
Event    → temporal_extent: 2015-11-14 05:56:41.128402
Interval → temporal_extent: [datetime.datetime(2015, 11, 14, 5, 56, 41, 128402), datetime.datetime(2015, 12, 11, 9, 22, 51, 537076)]
State    → temporal_extent: [datetime.datetime(2015, 11, 14, 5, 56, 41, 128402), datetime.datetime(2015, 12, 11, 9, 22, 51, 537076)]

Note

  • Event : a single timestamp (no duration)

  • Interval : a (start, end) pair; gaps and overlaps are allowed

  • State : a (start, end) pair; end[i] == start[i+1] is guaranteed by construction

Total running time of the script: (0 minutes 0.122 seconds)

Gallery generated by Sphinx-Gallery