Note

Go to the end to download the full example code.

The Three Types of Sequences#

TanaT supports three types of temporal sequences depending on how each entity’s temporal extent is defined:

Type	Temporal extent	Key constraint	Builder
Event	Single timestamp	None	`build_events()`
Interval	`[start, end]`	Overlaps and gaps are allowed	`build_intervals()`
State	`[start, end]`	Contiguous, no overlap, no gap	`build_states()`

This example walks through each type step by step: data simulation, pool construction, pool-level exploration, navigation down to individual sequences and entities, and a comparison of the temporal_extent at the entity level.

For a broader conceptual introduction see Core Concepts.

Imports#

Each type has its own shortcut builder. All three live in the same tanat namespace.

from tanat import build_events, build_intervals, build_states
from tanat.dataset import (
    simulate_events,
    simulate_intervals,
    simulate_states,
    simulate_static,
)

1. Event Sequences#

An event is a point-in-time observation: it has one timestamp and no duration. Think of medical visits, user clicks, or purchase records.

Simulate data#

simulate_events() returns a DataFrame with columns id, time, and one column per feature.

events_data = simulate_events(
    n_ids=10,
    features=["value", "category"],
    seed=42,
)
events_data.head()

	id	time	value	category
0	1	2013-11-03 06:53:24.300281	53	C
1	1	2013-12-24 16:18:54.022502	98	D
2	1	2015-11-14 05:56:41.128402	74	A
3	2	2000-10-08 10:16:59.399710	77	C
4	2	2001-06-16 10:05:43.045833	72	A

Build the pool#

build_events() needs at minimum:

id_column: the column that identifies each sequence
time_column: the column containing the event timestamp

All remaining columns are automatically inferred as entity features.

events_pool = build_events(
    temporal_data=events_data,
    id_column="id",
    time_column="time",
)

┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)

print(events_pool)

┌────────────────────────────────────────────────┐
│           EventSequencePool Summary            │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          10
  Store              /home/runner/.tanat/_quick_event_621ace8c
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-07-26 09:45:34.720567 → 2024-03-25 12:26:53.773965]
  Columns            ['time']
  t0                 position=0, anchor=None

Entity Features (2)
─────────────────────────
  • category            String [len 1 → 1]
  • value               Numerical [5 → 98]

Explore the pool#

print(f"Number of sequences : {len(events_pool)}")
print(f"First IDs           : {events_pool.unique_ids[:5]}")

Number of sequences : 10
First IDs           : [1, 2, 3, 4, 5]

Temporal data of the pool in tabular form (one row = one entity)

events_pool.temporal_data().head()

	id	time	category	value
0	1	2013-11-03 06:53:24.300281	C	53
1	1	2013-12-24 16:18:54.022502	D	98
2	1	2015-11-14 05:56:41.128402	A	74
3	2	2000-10-08 10:16:59.399710	C	77
4	2	2001-06-16 10:05:43.045833	A	72

Navigate to a sequence then to an entity#

Index the pool by ID to get an EventSequence

event_seq = events_pool[events_pool.unique_ids[0]]
print(event_seq)
print(f"→ {len(event_seq)} events for ID {event_seq.id_value!r}")

┌────────────────────────────────────────────────┐
│             EventSequence Summary              │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequence ID        1
  Length             3

Time Index
─────────────────────────
  Range              2013-11-03 06:53:24.300281 → 2015-11-14 05:56:41.128402
  t0                 2013-11-03 06:53:24.300281 (rank 0)

Entity Features (2)
─────────────────────────
  • category            String [len 1 → 1]
  • value               Numerical [5 → 98]
→ 3 events for ID 1

Index the sequence by integer to get an EventEntity

event_entity = event_seq[0]  # first event
print(event_entity)

┌────────────────────────────────────────────────┐
│              EventEntity Summary               │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequence ID        1
  Rank               0

Entity Features
─────────────────────────
  value              53
  category           C

At the entity level the temporal extent is a single timestamp

print("features      :", event_entity.data())
print("temporal span :", event_entity.temporal_extent)  # single date/time value
print("feature value :", event_entity["value"])

features      : {'value': 53, 'category': 'C'}
temporal span : 2013-11-03 06:53:24.300281
feature value : 53

Iterate#

Pool → one sequence per ID

for seq in events_pool.subset(events_pool.unique_ids[:3]):
    print(f"  ID {seq.id_value!r}: {len(seq)} events")

ID 1: 3 events
ID 2: 9 events
ID 3: 8 events

Sequence → one entity per row

for entity in event_seq:
    print(f"  t={entity.temporal_extent}  value={entity['value']}")

t=2013-11-03 06:53:24.300281  value=53
t=2013-12-24 16:18:54.022502  value=98
t=2015-11-14 05:56:41.128402  value=74

Static features#

Per-sequence static data (age, group, …) can be attached at build time via static_data, or added later with add_static_features(). Static features are shared by all entities of a given sequence.

Generate one row of static attributes per sequence ID

static_df = simulate_static(n_ids=10, features=["age", "group"], seed=0)
static_df.head()

	id	age	group
0	1	86	D
1	2	64	E
2	3	52	C
3	4	27	D
4	5	31	E

Option 1: attach at build time

events_pool_with_static = build_events(
    temporal_data=events_data,
    id_column="id",
    time_column="time",
    static_data=static_df,
)
events_pool_with_static.static_data().head()

┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity, time index & static features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)

	id	age	group
0	1	86	D
1	2	64	E
2	3	52	C
3	4	27	D
4	5	31	E

Option 2: add to an existing pool in place

events_pool.add_static_features(static_df)
events_pool.static_data().head()

	id	age	group
0	1	86	D
1	2	64	E
2	3	52	C
3	4	27	D
4	5	31	E

Static data is also accessible per-sequence (single row)

events_pool[events_pool.unique_ids[0]].static_data()

	id	age	group
0	1	86	D

2. Interval Sequences#

An interval spans a period of time with a start and an end. Unlike states, intervals are not required to be contiguous: two intervals can overlap and gaps between them are allowed. Think of overlapping treatments, project assignments, or sensor readings.

Simulate data#

simulate_intervals() produces a DataFrame with id, start, end, and feature columns.

intervals_data = simulate_intervals(
    n_ids=10,
    features=["value", "category"],
    seed=42,
)
intervals_data.head()

	id	start	end	value	category
0	1	2013-11-03 06:53:24.300281	2013-11-13 02:26:21.665986	53	C
1	1	2013-12-24 16:18:54.022502	2013-12-26 13:45:51.188795	98	D
2	1	2015-11-14 05:56:41.128402	2015-11-27 21:54:00.218863	74	A
3	2	2001-06-16 10:05:43.045833	2001-07-10 03:41:18.398570	77	C
4	2	2005-05-13 14:05:36.861038	2005-06-02 20:27:20.867682	72	A

Build the pool#

build_intervals() needs:

id_column: sequence identifier
start_column: interval start
end_column: interval end

intervals_pool = build_intervals(
    temporal_data=intervals_data,
    id_column="id",
    start_column="start",
    end_column="end",
)

┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)

print(intervals_pool)

┌────────────────────────────────────────────────┐
│          IntervalSequencePool Summary          │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          10
  Store              /home/runner/.tanat/_quick_interval_4bcaa142
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-07-27 05:52:43.701012 → 2024-11-19 06:48:23.343189]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • category            String [len 1 → 1]
  • value               Numerical [5 → 98]

Navigate to a sequence then to an entity#

interval_seq = intervals_pool[intervals_pool.unique_ids[0]]
print(f"→ {len(interval_seq)} intervals for ID {interval_seq.id_value!r}")

→ 3 intervals for ID 1

interval_entity = interval_seq[0]
print(interval_entity)

┌────────────────────────────────────────────────┐
│             IntervalEntity Summary             │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequence ID        1
  Rank               0

Entity Features
─────────────────────────
  value              53
  category           C

The temporal extent is now a (start, end) pair

print("features      :", interval_entity.data())
print("temporal span :", interval_entity.temporal_extent)  # (start, end)
print("feature value :", interval_entity["value"])

features      : {'value': 53, 'category': 'C'}
temporal span : [datetime.datetime(2013, 11, 3, 6, 53, 24, 300281), datetime.datetime(2013, 11, 13, 2, 26, 21, 665986)]
feature value : 53

3. State Sequences#

A state sequence partitions the timeline into contiguous, non-overlapping periods: end[i] == start[i+1] within every sequence. The individual is in exactly one state at any point in time. Think of disease stages, employment status, or device modes.

Simulate data#

simulate_states() guarantees strict continuity: end[i] == start[i+1] by construction.

states_data = simulate_states(
    n_ids=10,
    features=["value", "category"],
    seed=42,
)
states_data.head()

	id	start	end	value	category
0	1	2015-11-14 05:56:41.128402	2015-12-11 09:22:51.537076	53	C
1	1	2015-12-11 09:22:51.537076	2016-01-07 18:13:11.484612	98	D
2	1	2016-01-07 18:13:11.484612	2025-01-01 00:00:00.000000	74	A
3	2	2007-08-07 16:08:06.331871	2007-08-13 21:43:12.768137	77	C
4	2	2007-08-13 21:43:12.768137	2007-09-05 08:58:08.065324	72	A

Build the pool#

build_states() accepts the same start_column / end_column pair as build_intervals().

Note

end_column is optional for state sequences. When omitted, the end of state i is automatically derived from the start of state i+1. The last state per sequence will have end = null unless you supply an explicit sentinel value to the builder.

states_pool = build_states(
    temporal_data=states_data,
    id_column="id",
    start_column="start",
    end_column="end",  # optional: omit to let TanaT infer it
)

┌─ State SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)

print(states_pool)

┌────────────────────────────────────────────────┐
│           StateSequencePool Summary            │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          10
  Store              /home/runner/.tanat/_quick_state_c1eeca33
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-07-26 09:45:34.720567 → 2025-01-01 00:00:00]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • category            String [len 1 → 1]
  • value               Numerical [5 → 98]

Navigate to a sequence then to an entity#

state_seq = states_pool[states_pool.unique_ids[0]]
print(f"→ {len(state_seq)} states for ID {state_seq.id_value!r}")

→ 3 states for ID 1

state_entity = state_seq[0]
print(state_entity)

┌────────────────────────────────────────────────┐
│              StateEntity Summary               │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequence ID        1
  Rank               0

Entity Features
─────────────────────────
  value              53
  category           C

Like intervals, the temporal extent is a (start, end) pair

print("features      :", state_entity.data())
print("temporal span :", state_entity.temporal_extent)  # (start, end)
print("feature value :", state_entity["value"])

features      : {'value': 53, 'category': 'C'}
temporal span : [datetime.datetime(2015, 11, 14, 5, 56, 41, 128402), datetime.datetime(2015, 12, 11, 9, 22, 51, 537076)]
feature value : 53

4. Side-by-side comparison#

To summarise the differences, we build all three pools from the same underlying dataset (states data, which contains both start and end columns) and compare the temporal_extent of the first entity of the first sequence.

Re-use states_data for all three types so the raw data is identical

common_events = build_events(
    temporal_data=states_data,
    id_column="id",
    time_column="start",  # use start as the single event timestamp
)
common_intervals = build_intervals(
    temporal_data=states_data,
    id_column="id",
    start_column="start",
    end_column="end",
)
common_states = build_states(
    temporal_data=states_data,
    id_column="id",
    start_column="start",
    end_column="end",
)

first_id = common_events.unique_ids[0]

for label, pool in [
    ("Event   ", common_events),
    ("Interval", common_intervals),
    ("State   ", common_states),
]:
    entity = pool[first_id][0]
    print(f"{label} → temporal_extent: {entity.temporal_extent}")

┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
┌─ State SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (10 sequences · 59 entities · 0.00s)
Event    → temporal_extent: 2015-11-14 05:56:41.128402
Interval → temporal_extent: [datetime.datetime(2015, 11, 14, 5, 56, 41, 128402), datetime.datetime(2015, 12, 11, 9, 22, 51, 537076)]
State    → temporal_extent: [datetime.datetime(2015, 11, 14, 5, 56, 41, 128402), datetime.datetime(2015, 12, 11, 9, 22, 51, 537076)]

Note

Event : a single timestamp (no duration)
Interval : a (start, end) pair; gaps and overlaps are allowed
State : a (start, end) pair; end[i] == start[i+1] is guaranteed by construction

Total running time of the script: (0 minutes 0.128 seconds)

Gallery generated by Sphinx-Gallery