Note
Go to the end to download the full example code.
Trajectories#
We illustrate here how to build a TrajectoryPool
by composing several sequence pools, then navigate from the pool down
to an individual trajectory, its sub-sequences.
A trajectory groups all sequences belonging to the same individual across multiple temporal dimensions (e.g. visits, treatments, lab results). A trajectory pool aggregates trajectories across an entire cohort.
Each sequence pool is registered under an alias that acts as the key for retrieval:
tpool.sequence_pools["events"] → EventSequencePool (full pool)
tpool[id] → Trajectory (one individual)
tpool[id]["events"] → EventSequence (one sequence)
tpool[id]["events"][0] → EventEntity (one entity)
Imports#
from tanat import build_events, build_intervals, build_states, build_trajectories
Simulate data#
simulate_trajectories() is a
convenience wrapper that calls each simulate_* function in one shot
and guarantees a shared ID space across all sequence types.
from tanat.dataset import simulate_trajectories, simulate_static
data = simulate_trajectories(
sequences={
"events": {"type": "event", "n_ids": 50, "features": ["value", "category"]},
"intervals": {
"type": "interval",
"n_ids": 50,
"features": ["duration_days", "label"],
},
"states": {"type": "state", "n_ids": 50, "features": ["score", "status"]},
},
shared_ids=True,
seed=42,
)
# Each value is a plain DataFrame.
print("events :", data["events"].shape)
print("intervals:", data["intervals"].shape)
print("states :", data["states"].shape)
events : (328, 4)
intervals: (327, 5)
states : (314, 5)
Build the sequence pools#
Each pool is built independently with its own build_* shortcut
(build_events(),
build_intervals(),
build_states()).
event_pool = build_events(
temporal_data=data["events"],
id_column="id",
time_column="time",
)
interval_pool = build_intervals(
temporal_data=data["intervals"],
id_column="id",
start_column="start",
end_column="end",
)
state_pool = build_states(
temporal_data=data["states"],
id_column="id",
start_column="start",
end_column="end",
)
┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 328 entities · 0.00s)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 327 entities · 0.00s)
┌─ State SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 314 entities · 0.00s)
Build the trajectory pool#
build_trajectories() composes the pools
under their aliases. The alias becomes the key used to retrieve a
sub-sequence from a trajectory (traj["events"]).
tpool = build_trajectories(
pools={
"events": event_pool,
"intervals": interval_pool,
"states": state_pool,
},
)
┌─ TrajectoryStore
│
│ Step 1/2: Linking pools: events, intervals, states
│
│ Step 2/2: Building trajectory index & metadata
│
└─ Done (50 trajectories · 3 pool(s) · 0.01s)
print(tpool)
┌────────────────────────────────────────────────┐
│ TrajectoryPool Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Trajectories 50
Store /home/runner/.tanat/_quick_trajectory_407ed794
id_column id
Time Index
─────────────────────────
Type Datetime(time_unit='us', time_zone=None) [2000-01-05 11:22:13.038801 → 2025-01-02 20:49:35.424120]
t0 position=0, anchor=start
Sequences (3)
─────────────────────────
• events EventSequencePool(n=50, entity_features=2, static_features=0, store='/home/runner/.tanat/_quick_event_4b94bed6')
• intervals IntervalSequencePool(n=50, entity_features=2, static_features=0, store='/home/runner/.tanat/_quick_interval_e4d0e50a')
• states StateSequencePool(n=50, entity_features=2, static_features=0, store='/home/runner/.tanat/_quick_state_d82374ed')
Explore the trajectory pool#
print(f"Trajectories : {len(tpool)}")
print(f"First IDs : {tpool.unique_ids[:5]}")
Trajectories : 50
First IDs : [1, 2, 3, 4, 5]
Access one of the sequence pool#
The underlying sequence pools are accessible as a read-only mapping tpool.sequence_pools.
To access the pool with the alias states:
print(tpool.sequence_pools["states"])
┌────────────────────────────────────────────────┐
│ StateSequencePool Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequences 50
Store /home/runner/.tanat/_quick_state_d82374ed
id_column id
Time Index
─────────────────────────
Type Datetime(time_unit='us', time_zone=None) [2000-06-28 16:23:08.055289 → 2025-01-01 00:00:00]
Columns ['start', 'end']
t0 position=0, anchor=start (from trajectory)
Entity Features (2)
─────────────────────────
• score Numerical [1 → 100]
• status String [len 1 → 1]
Access a trajectory of the trajectory pool#
tpool[id] returns a Trajectory, a
lightweight view over all sub-sequences for that individual.
traj = tpool[tpool.unique_ids[0]]
print(traj)
/opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/site-packages/tanat_utils/caching/cachable.py:133: UserWarning: anchor='start' has no effect on event pools (single time column). The argument will be ignored.
value = method(self, *args, **kwargs)
┌────────────────────────────────────────────────┐
│ Trajectory Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Trajectory ID 1
Sequences events, intervals, states
Time Index
─────────────────────────
Type Datetime(time_unit='us', time_zone=None) [2000-01-05 11:22:13.038801 → 2025-01-02 20:49:35.424120]
t0 2002-04-25 19:24:32.505355 (events: rank 0, intervals: rank 0, states: rank 0)
Sequences
─────────────────────────
• events EventSequence(id=1, length=10, entity_features=2)
• intervals IntervalSequence(id=1, length=3, entity_features=2)
• states StateSequence(id=1, length=3, entity_features=2)
Sequences of a trajectory#
Use the alias as the key to retrieve the sequence of an individual trajectory.
event_seq = traj["events"]
interval_seq = traj["intervals"]
state_seq = traj["states"]
print(f"events : {len(event_seq)} events")
print(f"intervals : {len(interval_seq)} intervals")
print(f"states : {len(state_seq)} states")
events : 10 events
intervals : 3 intervals
states : 3 states
print(event_seq)
┌────────────────────────────────────────────────┐
│ EventSequence Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequence ID 1
Length 10
Time Index
─────────────────────────
Range 2002-04-25 19:24:32.505355 → 2024-09-20 05:32:51.832082
t0 2002-04-25 19:24:32.505355 (rank 0)
Entity Features (2)
─────────────────────────
• category String [len 1 → 1]
• value Numerical [1 → 100]
print(interval_seq)
┌────────────────────────────────────────────────┐
│ IntervalSequence Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequence ID 1
Length 3
Time Index
─────────────────────────
Range 2007-02-18 04:42:11.948119 → 2016-01-31 17:21:10.983866
t0 2002-04-25 19:24:32.505355 (rank 0)
Entity Features (2)
─────────────────────────
• duration_days Numerical [2 → 100]
• label String [len 1 → 1]
print(state_seq)
┌────────────────────────────────────────────────┐
│ StateSequence Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequence ID 1
Length 3
Time Index
─────────────────────────
Range 2016-12-21 16:57:35.449531 → 2025-01-01 00:00:00
t0 2002-04-25 19:24:32.505355 (rank 0)
Entity Features (2)
─────────────────────────
• score Numerical [1 → 100]
• status String [len 1 → 1]
Static features#
Per-trajectory static data (age, group, …) is passed at build time
via build_trajectories(). It is then
accessible on the pool and on individual trajectories.
Generate a static DataFrame matching the shared ID space
static_df = simulate_static(n_ids=50, features=["age", "group"], seed=0)
static_df.head()
tpool_with_static = build_trajectories(
pools={
"events": event_pool,
"intervals": interval_pool,
"states": state_pool,
},
static_data=static_df,
id_column="id",
)
┌─ TrajectoryStore
│
│ Step 1/2: Linking pools: events, intervals, states
│
│ Step 2/2: Building trajectory index & metadata
│
└─ Done (50 trajectories · 3 pool(s) · 0.01s)
Access to static data is similar for trajectory pools than for sequences pools.
tpool_with_static.static_data().head()
Static data is also accessible per-trajectory (single row)
tpool_with_static[tpool_with_static.unique_ids[0]].static_data()
Note
If a sequence pool combined to create a trajectory pool contains static features they are kept in the sequence pool but not visible at the trajectiry level.
Iteration#
All pool and trajectory objects are iterable.
sequence_pools()yieldsSequencePoolTrajectoryPoolyieldsTrajectoryobjects;.items()gives(id, trajectory)pairs.Trajectoryyields its aliases (string keys);.items()gives(alias, sequence)pairs.
# TrajectoryPool → SequencePool
for seq_pool in tpool.sequence_pools:
print(f" {len(seq_pool)}")
6
9
6
# TrajectoryPool → Trajectory
for t in tpool:
print(f" {t.id_value}: sequences={list(t)}")
1: sequences=['events', 'intervals', 'states']
2: sequences=['events', 'intervals', 'states']
3: sequences=['events', 'intervals', 'states']
4: sequences=['events', 'intervals', 'states']
5: sequences=['events', 'intervals', 'states']
6: sequences=['events', 'intervals', 'states']
7: sequences=['events', 'intervals', 'states']
8: sequences=['events', 'intervals', 'states']
9: sequences=['events', 'intervals', 'states']
10: sequences=['events', 'intervals', 'states']
11: sequences=['events', 'intervals', 'states']
12: sequences=['events', 'intervals', 'states']
13: sequences=['events', 'intervals', 'states']
14: sequences=['events', 'intervals', 'states']
15: sequences=['events', 'intervals', 'states']
16: sequences=['events', 'intervals', 'states']
17: sequences=['events', 'intervals', 'states']
18: sequences=['events', 'intervals', 'states']
19: sequences=['events', 'intervals', 'states']
20: sequences=['events', 'intervals', 'states']
21: sequences=['events', 'intervals', 'states']
22: sequences=['events', 'intervals', 'states']
23: sequences=['events', 'intervals', 'states']
24: sequences=['events', 'intervals', 'states']
25: sequences=['events', 'intervals', 'states']
26: sequences=['events', 'intervals', 'states']
27: sequences=['events', 'intervals', 'states']
28: sequences=['events', 'intervals', 'states']
29: sequences=['events', 'intervals', 'states']
30: sequences=['events', 'intervals', 'states']
31: sequences=['events', 'intervals', 'states']
32: sequences=['events', 'intervals', 'states']
33: sequences=['events', 'intervals', 'states']
34: sequences=['events', 'intervals', 'states']
35: sequences=['events', 'intervals', 'states']
36: sequences=['events', 'intervals', 'states']
37: sequences=['events', 'intervals', 'states']
38: sequences=['events', 'intervals', 'states']
39: sequences=['events', 'intervals', 'states']
40: sequences=['events', 'intervals', 'states']
41: sequences=['events', 'intervals', 'states']
42: sequences=['events', 'intervals', 'states']
43: sequences=['events', 'intervals', 'states']
44: sequences=['events', 'intervals', 'states']
45: sequences=['events', 'intervals', 'states']
46: sequences=['events', 'intervals', 'states']
47: sequences=['events', 'intervals', 'states']
48: sequences=['events', 'intervals', 'states']
49: sequences=['events', 'intervals', 'states']
50: sequences=['events', 'intervals', 'states']
# TrajectoryPool.items() → (id, Trajectory) pairs
for tid, t in tpool.items():
print(f" {tid}: {type(t).__name__}")
1: Trajectory
2: Trajectory
3: Trajectory
4: Trajectory
5: Trajectory
6: Trajectory
7: Trajectory
8: Trajectory
9: Trajectory
10: Trajectory
11: Trajectory
12: Trajectory
13: Trajectory
14: Trajectory
15: Trajectory
16: Trajectory
17: Trajectory
18: Trajectory
19: Trajectory
20: Trajectory
21: Trajectory
22: Trajectory
23: Trajectory
24: Trajectory
25: Trajectory
26: Trajectory
27: Trajectory
28: Trajectory
29: Trajectory
30: Trajectory
31: Trajectory
32: Trajectory
33: Trajectory
34: Trajectory
35: Trajectory
36: Trajectory
37: Trajectory
38: Trajectory
39: Trajectory
40: Trajectory
41: Trajectory
42: Trajectory
43: Trajectory
44: Trajectory
45: Trajectory
46: Trajectory
47: Trajectory
48: Trajectory
49: Trajectory
50: Trajectory
# Trajectory.items() → (alias, Sequence) pairs
traj = tpool[tpool.unique_ids[0]]
for alias, seq in traj.items():
print(f" {alias}: {len(seq)} entities")
events: 10 entities
intervals: 3 entities
states: 3 entities
Total running time of the script: (0 minutes 0.139 seconds)