Note
Go to the end to download the full example code.
The Three Types of Sequences#
TanaT supports three types of temporal sequences depending on how each entity’s temporal extent is defined:
Type |
Temporal extent |
Key constraint |
Builder |
|---|---|---|---|
Event |
Single timestamp |
None |
|
Interval |
|
Overlaps and gaps are allowed |
|
State |
|
Contiguous, no overlap, no gap |
This example walks through each type step by step: data simulation, pool
construction, pool-level exploration, navigation down to individual
sequences and entities, and a comparison of the temporal_extent at
the entity level.
For a broader conceptual introduction see Core Concepts.
Imports#
Each type has its own shortcut builder. All three live in the same
tanat namespace.
from tanat import build_events, build_intervals, build_states
from tanat.dataset import (
simulate_events,
simulate_intervals,
simulate_states,
simulate_static,
)
1. Event Sequences#
An event is a point-in-time observation: it has one timestamp and no duration. Think of medical visits, user clicks, or purchase records.
Simulate data#
simulate_events() returns a
DataFrame with columns id, time, and one column per feature.
events_data = simulate_events(
n_ids=10,
features=["value", "category"],
seed=42,
)
events_data.head()
Build the pool#
build_events() needs at minimum:
id_column: the column that identifies each sequencetime_column: the column containing the event timestamp
All remaining columns are automatically inferred as entity features.
events_pool = build_events(
temporal_data=events_data,
id_column="id",
time_column="time",
)
┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
print(events_pool)
┌────────────────────────────────────────────────┐
│ EventSequencePool Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequences 10
Store /home/runner/.tanat/_quick_event_20b56094
id_column id
Time Index
─────────────────────────
Type Datetime(time_unit='us', time_zone=None) [2000-07-26 09:45:34.720567 → 2024-03-25 12:26:53.773965]
Columns ['time']
t0 position=0, anchor=None
Entity Features (2)
─────────────────────────
• category String [len 1 → 1]
• value Numerical [5 → 98]
Explore the pool#
print(f"Number of sequences : {len(events_pool)}")
print(f"First IDs : {events_pool.unique_ids[:5]}")
Number of sequences : 10
First IDs : [1, 2, 3, 4, 5]
Temporal data of the pool in tabular form (one row = one entity)
events_pool.temporal_data().head()
Iterate#
Pool → one sequence per ID
for seq in events_pool.subset(events_pool.unique_ids[:3]):
print(f" ID {seq.id_value!r}: {len(seq)} events")
ID 1: 3 events
ID 2: 9 events
ID 3: 8 events
Sequence → one entity per row
for entity in event_seq:
print(f" t={entity.temporal_extent} value={entity['value']}")
t=2013-11-03 06:53:24.300281 value=53
t=2013-12-24 16:18:54.022502 value=98
t=2015-11-14 05:56:41.128402 value=74
Static features#
Per-sequence static data (age, group, …) can be attached at build time
via static_data, or added later with
add_static_features().
Static features are shared by all entities of a given sequence.
Generate one row of static attributes per sequence ID
static_df = simulate_static(n_ids=10, features=["age", "group"], seed=0)
static_df.head()
Option 1: attach at build time
events_pool_with_static = build_events(
temporal_data=events_data,
id_column="id",
time_column="time",
static_data=static_df,
)
events_pool_with_static.static_data().head()
┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity, time index & static features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
Option 2: add to an existing pool in place
events_pool.add_static_features(static_df)
events_pool.static_data().head()
Static data is also accessible per-sequence (single row)
events_pool[events_pool.unique_ids[0]].static_data()
2. Interval Sequences#
An interval spans a period of time with a start and an end.
Unlike states, intervals are not required to be contiguous:
two intervals can overlap and gaps between them are allowed.
Think of overlapping treatments, project assignments, or sensor readings.
Simulate data#
simulate_intervals() produces a
DataFrame with id, start, end, and feature columns.
intervals_data = simulate_intervals(
n_ids=10,
features=["value", "category"],
seed=42,
)
intervals_data.head()
Build the pool#
build_intervals() needs:
id_column: sequence identifierstart_column: interval startend_column: interval end
intervals_pool = build_intervals(
temporal_data=intervals_data,
id_column="id",
start_column="start",
end_column="end",
)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
print(intervals_pool)
┌────────────────────────────────────────────────┐
│ IntervalSequencePool Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequences 10
Store /home/runner/.tanat/_quick_interval_40b641d6
id_column id
Time Index
─────────────────────────
Type Datetime(time_unit='us', time_zone=None) [2000-07-27 05:52:43.701012 → 2024-11-19 06:48:23.343189]
Columns ['start', 'end']
t0 position=0, anchor=start
Entity Features (2)
─────────────────────────
• category String [len 1 → 1]
• value Numerical [5 → 98]
Navigate to a sequence then to an entity#
interval_seq = intervals_pool[intervals_pool.unique_ids[0]]
print(f"→ {len(interval_seq)} intervals for ID {interval_seq.id_value!r}")
→ 3 intervals for ID 1
interval_entity = interval_seq[0]
print(interval_entity)
┌────────────────────────────────────────────────┐
│ IntervalEntity Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequence ID 1
Rank 0
Entity Features
─────────────────────────
value 53
category C
The temporal extent is now a (start, end) pair
print("features :", interval_entity.data())
print("temporal span :", interval_entity.temporal_extent) # (start, end)
print("feature value :", interval_entity["value"])
features : {'value': 53, 'category': 'C'}
temporal span : [datetime.datetime(2013, 11, 3, 6, 53, 24, 300281), datetime.datetime(2013, 11, 13, 2, 26, 21, 665986)]
feature value : 53
3. State Sequences#
A state sequence partitions the timeline into contiguous,
non-overlapping periods: end[i] == start[i+1] within every sequence.
The individual is in exactly one state at any point in time.
Think of disease stages, employment status, or device modes.
Simulate data#
simulate_states() guarantees
strict continuity: end[i] == start[i+1] by construction.
states_data = simulate_states(
n_ids=10,
features=["value", "category"],
seed=42,
)
states_data.head()
Build the pool#
build_states() accepts the same
start_column / end_column pair as build_intervals().
Note
end_column is optional for state sequences. When omitted, the
end of state i is automatically derived from the start of state i+1.
The last state per sequence will have end = null unless you supply
an explicit sentinel value to the builder.
states_pool = build_states(
temporal_data=states_data,
id_column="id",
start_column="start",
end_column="end", # optional: omit to let TanaT infer it
)
┌─ State SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
print(states_pool)
┌────────────────────────────────────────────────┐
│ StateSequencePool Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequences 10
Store /home/runner/.tanat/_quick_state_cf0d3870
id_column id
Time Index
─────────────────────────
Type Datetime(time_unit='us', time_zone=None) [2000-07-26 09:45:34.720567 → 2025-01-01 00:00:00]
Columns ['start', 'end']
t0 position=0, anchor=start
Entity Features (2)
─────────────────────────
• category String [len 1 → 1]
• value Numerical [5 → 98]
Navigate to a sequence then to an entity#
state_seq = states_pool[states_pool.unique_ids[0]]
print(f"→ {len(state_seq)} states for ID {state_seq.id_value!r}")
→ 3 states for ID 1
state_entity = state_seq[0]
print(state_entity)
┌────────────────────────────────────────────────┐
│ StateEntity Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequence ID 1
Rank 0
Entity Features
─────────────────────────
value 53
category C
Like intervals, the temporal extent is a (start, end) pair
print("features :", state_entity.data())
print("temporal span :", state_entity.temporal_extent) # (start, end)
print("feature value :", state_entity["value"])
features : {'value': 53, 'category': 'C'}
temporal span : [datetime.datetime(2015, 11, 14, 5, 56, 41, 128402), datetime.datetime(2015, 12, 11, 9, 22, 51, 537076)]
feature value : 53
4. Side-by-side comparison#
To summarise the differences, we build all three pools from the
same underlying dataset (states data, which contains both start
and end columns) and compare the temporal_extent of the first
entity of the first sequence.
Re-use states_data for all three types so the raw data is identical
common_events = build_events(
temporal_data=states_data,
id_column="id",
time_column="start", # use start as the single event timestamp
)
common_intervals = build_intervals(
temporal_data=states_data,
id_column="id",
start_column="start",
end_column="end",
)
common_states = build_states(
temporal_data=states_data,
id_column="id",
start_column="start",
end_column="end",
)
first_id = common_events.unique_ids[0]
for label, pool in [
("Event ", common_events),
("Interval", common_intervals),
("State ", common_states),
]:
entity = pool[first_id][0]
print(f"{label} → temporal_extent: {entity.temporal_extent}")
┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
┌─ State SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
Event → temporal_extent: 2015-11-14 05:56:41.128402
Interval → temporal_extent: [datetime.datetime(2015, 11, 14, 5, 56, 41, 128402), datetime.datetime(2015, 12, 11, 9, 22, 51, 537076)]
State → temporal_extent: [datetime.datetime(2015, 11, 14, 5, 56, 41, 128402), datetime.datetime(2015, 12, 11, 9, 22, 51, 537076)]
Note
Event: a single timestamp (no duration)Interval: a(start, end)pair; gaps and overlaps are allowedState: a(start, end)pair;end[i] == start[i+1]is guaranteed by construction
Total running time of the script: (0 minutes 0.122 seconds)