Trajectories#

We illustrate here how to build a TrajectoryPool by composing several sequence pools, then navigate from the pool down to an individual trajectory, its sub-sequences.

A trajectory groups all sequences belonging to the same individual across multiple temporal dimensions (e.g. visits, treatments, lab results). A trajectory pool aggregates trajectories across an entire cohort.

Each sequence pool is registered under an alias that acts as the key for retrieval:

tpool.sequence_pools["events"]  → EventSequencePool   (full pool)
tpool[id]                       → Trajectory          (one individual)
tpool[id]["events"]             → EventSequence       (one sequence)
tpool[id]["events"][0]          → EventEntity         (one entity)

Imports#

from tanat import build_events, build_intervals, build_states, build_trajectories

Simulate data#

simulate_trajectories() is a convenience wrapper that calls each simulate_* function in one shot and guarantees a shared ID space across all sequence types.

from tanat.dataset import simulate_trajectories, simulate_static

data = simulate_trajectories(
    sequences={
        "events": {"type": "event", "n_ids": 50, "features": ["value", "category"]},
        "intervals": {
            "type": "interval",
            "n_ids": 50,
            "features": ["duration_days", "label"],
        },
        "states": {"type": "state", "n_ids": 50, "features": ["score", "status"]},
    },
    shared_ids=True,
    seed=42,
)

# Each value is a plain DataFrame.
print("events   :", data["events"].shape)
print("intervals:", data["intervals"].shape)
print("states   :", data["states"].shape)
events   : (328, 4)
intervals: (327, 5)
states   : (314, 5)

Build the sequence pools#

Each pool is built independently with its own build_* shortcut (build_events(), build_intervals(), build_states()).

event_pool = build_events(
    temporal_data=data["events"],
    id_column="id",
    time_column="time",
)

interval_pool = build_intervals(
    temporal_data=data["intervals"],
    id_column="id",
    start_column="start",
    end_column="end",
)

state_pool = build_states(
    temporal_data=data["states"],
    id_column="id",
    start_column="start",
    end_column="end",
)
┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 328 entities · 0.00s)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 327 entities · 0.00s)
┌─ State SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 314 entities · 0.00s)

Build the trajectory pool#

build_trajectories() composes the pools under their aliases. The alias becomes the key used to retrieve a sub-sequence from a trajectory (traj["events"]).

tpool = build_trajectories(
    pools={
        "events": event_pool,
        "intervals": interval_pool,
        "states": state_pool,
    },
)
┌─ TrajectoryStore
│
│ Step 1/2: Linking pools: events, intervals, states
│
│ Step 2/2: Building trajectory index & metadata
│
└─ Done (50 trajectories · 3 pool(s) · 0.01s)
print(tpool)
┌────────────────────────────────────────────────┐
│             TrajectoryPool Summary             │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Trajectories       50
  Store              /home/runner/.tanat/_quick_trajectory_407ed794
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-01-05 11:22:13.038801 → 2025-01-02 20:49:35.424120]
  t0                 position=0, anchor=start

Sequences (3)
─────────────────────────
  • events              EventSequencePool(n=50, entity_features=2, static_features=0, store='/home/runner/.tanat/_quick_event_4b94bed6')
  • intervals           IntervalSequencePool(n=50, entity_features=2, static_features=0, store='/home/runner/.tanat/_quick_interval_e4d0e50a')
  • states              StateSequencePool(n=50, entity_features=2, static_features=0, store='/home/runner/.tanat/_quick_state_d82374ed')

Explore the trajectory pool#

print(f"Trajectories : {len(tpool)}")
print(f"First IDs    : {tpool.unique_ids[:5]}")
Trajectories : 50
First IDs    : [1, 2, 3, 4, 5]

Access one of the sequence pool#

The underlying sequence pools are accessible as a read-only mapping tpool.sequence_pools.

To access the pool with the alias states:

print(tpool.sequence_pools["states"])
┌────────────────────────────────────────────────┐
│           StateSequencePool Summary            │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          50
  Store              /home/runner/.tanat/_quick_state_d82374ed
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-06-28 16:23:08.055289 → 2025-01-01 00:00:00]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start (from trajectory)

Entity Features (2)
─────────────────────────
  • score               Numerical [1 → 100]
  • status              String [len 1 → 1]

Access a trajectory of the trajectory pool#

tpool[id] returns a Trajectory, a lightweight view over all sub-sequences for that individual.

traj = tpool[tpool.unique_ids[0]]
print(traj)
/opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/site-packages/tanat_utils/caching/cachable.py:133: UserWarning: anchor='start' has no effect on event pools (single time column). The argument will be ignored.
  value = method(self, *args, **kwargs)
┌────────────────────────────────────────────────┐
│               Trajectory Summary               │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Trajectory ID      1
  Sequences          events, intervals, states

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-01-05 11:22:13.038801 → 2025-01-02 20:49:35.424120]
  t0                 2002-04-25 19:24:32.505355 (events: rank 0, intervals: rank 0, states: rank 0)

Sequences
─────────────────────────
  • events              EventSequence(id=1, length=10, entity_features=2)
  • intervals           IntervalSequence(id=1, length=3, entity_features=2)
  • states              StateSequence(id=1, length=3, entity_features=2)

Sequences of a trajectory#

Use the alias as the key to retrieve the sequence of an individual trajectory.

event_seq = traj["events"]
interval_seq = traj["intervals"]
state_seq = traj["states"]

print(f"events    : {len(event_seq)} events")
print(f"intervals : {len(interval_seq)} intervals")
print(f"states    : {len(state_seq)} states")
events    : 10 events
intervals : 3 intervals
states    : 3 states
print(event_seq)
┌────────────────────────────────────────────────┐
│             EventSequence Summary              │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequence ID        1
  Length             10

Time Index
─────────────────────────
  Range              2002-04-25 19:24:32.505355 → 2024-09-20 05:32:51.832082
  t0                 2002-04-25 19:24:32.505355 (rank 0)

Entity Features (2)
─────────────────────────
  • category            String [len 1 → 1]
  • value               Numerical [1 → 100]
print(interval_seq)
┌────────────────────────────────────────────────┐
│            IntervalSequence Summary            │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequence ID        1
  Length             3

Time Index
─────────────────────────
  Range              2007-02-18 04:42:11.948119 → 2016-01-31 17:21:10.983866
  t0                 2002-04-25 19:24:32.505355 (rank 0)

Entity Features (2)
─────────────────────────
  • duration_days       Numerical [2 → 100]
  • label               String [len 1 → 1]
print(state_seq)
┌────────────────────────────────────────────────┐
│             StateSequence Summary              │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequence ID        1
  Length             3

Time Index
─────────────────────────
  Range              2016-12-21 16:57:35.449531 → 2025-01-01 00:00:00
  t0                 2002-04-25 19:24:32.505355 (rank 0)

Entity Features (2)
─────────────────────────
  • score               Numerical [1 → 100]
  • status              String [len 1 → 1]

Static features#

Per-trajectory static data (age, group, …) is passed at build time via build_trajectories(). It is then accessible on the pool and on individual trajectories.

Generate a static DataFrame matching the shared ID space

static_df = simulate_static(n_ids=50, features=["age", "group"], seed=0)
static_df.head()
id age group
0 1 86 D
1 2 64 B
2 3 52 C
3 4 27 E
4 5 31 E


tpool_with_static = build_trajectories(
    pools={
        "events": event_pool,
        "intervals": interval_pool,
        "states": state_pool,
    },
    static_data=static_df,
    id_column="id",
)
┌─ TrajectoryStore
│
│ Step 1/2: Linking pools: events, intervals, states
│
│ Step 2/2: Building trajectory index & metadata
│
└─ Done (50 trajectories · 3 pool(s) · 0.01s)

Access to static data is similar for trajectory pools than for sequences pools.

tpool_with_static.static_data().head()
id age group
0 1 86 D
1 2 64 B
2 3 52 C
3 4 27 E
4 5 31 E


Static data is also accessible per-trajectory (single row)

tpool_with_static[tpool_with_static.unique_ids[0]].static_data()
id age group
0 1 86 D


Note

If a sequence pool combined to create a trajectory pool contains static features they are kept in the sequence pool but not visible at the trajectiry level.

Iteration#

All pool and trajectory objects are iterable.

# TrajectoryPool → SequencePool
for seq_pool in tpool.sequence_pools:
    print(f"  {len(seq_pool)}")
6
9
6
# TrajectoryPool → Trajectory
for t in tpool:
    print(f"  {t.id_value}: sequences={list(t)}")
1: sequences=['events', 'intervals', 'states']
2: sequences=['events', 'intervals', 'states']
3: sequences=['events', 'intervals', 'states']
4: sequences=['events', 'intervals', 'states']
5: sequences=['events', 'intervals', 'states']
6: sequences=['events', 'intervals', 'states']
7: sequences=['events', 'intervals', 'states']
8: sequences=['events', 'intervals', 'states']
9: sequences=['events', 'intervals', 'states']
10: sequences=['events', 'intervals', 'states']
11: sequences=['events', 'intervals', 'states']
12: sequences=['events', 'intervals', 'states']
13: sequences=['events', 'intervals', 'states']
14: sequences=['events', 'intervals', 'states']
15: sequences=['events', 'intervals', 'states']
16: sequences=['events', 'intervals', 'states']
17: sequences=['events', 'intervals', 'states']
18: sequences=['events', 'intervals', 'states']
19: sequences=['events', 'intervals', 'states']
20: sequences=['events', 'intervals', 'states']
21: sequences=['events', 'intervals', 'states']
22: sequences=['events', 'intervals', 'states']
23: sequences=['events', 'intervals', 'states']
24: sequences=['events', 'intervals', 'states']
25: sequences=['events', 'intervals', 'states']
26: sequences=['events', 'intervals', 'states']
27: sequences=['events', 'intervals', 'states']
28: sequences=['events', 'intervals', 'states']
29: sequences=['events', 'intervals', 'states']
30: sequences=['events', 'intervals', 'states']
31: sequences=['events', 'intervals', 'states']
32: sequences=['events', 'intervals', 'states']
33: sequences=['events', 'intervals', 'states']
34: sequences=['events', 'intervals', 'states']
35: sequences=['events', 'intervals', 'states']
36: sequences=['events', 'intervals', 'states']
37: sequences=['events', 'intervals', 'states']
38: sequences=['events', 'intervals', 'states']
39: sequences=['events', 'intervals', 'states']
40: sequences=['events', 'intervals', 'states']
41: sequences=['events', 'intervals', 'states']
42: sequences=['events', 'intervals', 'states']
43: sequences=['events', 'intervals', 'states']
44: sequences=['events', 'intervals', 'states']
45: sequences=['events', 'intervals', 'states']
46: sequences=['events', 'intervals', 'states']
47: sequences=['events', 'intervals', 'states']
48: sequences=['events', 'intervals', 'states']
49: sequences=['events', 'intervals', 'states']
50: sequences=['events', 'intervals', 'states']
# TrajectoryPool.items() → (id, Trajectory) pairs
for tid, t in tpool.items():
    print(f"  {tid}: {type(t).__name__}")
1: Trajectory
2: Trajectory
3: Trajectory
4: Trajectory
5: Trajectory
6: Trajectory
7: Trajectory
8: Trajectory
9: Trajectory
10: Trajectory
11: Trajectory
12: Trajectory
13: Trajectory
14: Trajectory
15: Trajectory
16: Trajectory
17: Trajectory
18: Trajectory
19: Trajectory
20: Trajectory
21: Trajectory
22: Trajectory
23: Trajectory
24: Trajectory
25: Trajectory
26: Trajectory
27: Trajectory
28: Trajectory
29: Trajectory
30: Trajectory
31: Trajectory
32: Trajectory
33: Trajectory
34: Trajectory
35: Trajectory
36: Trajectory
37: Trajectory
38: Trajectory
39: Trajectory
40: Trajectory
41: Trajectory
42: Trajectory
43: Trajectory
44: Trajectory
45: Trajectory
46: Trajectory
47: Trajectory
48: Trajectory
49: Trajectory
50: Trajectory
# Trajectory.items() → (alias, Sequence) pairs
traj = tpool[tpool.unique_ids[0]]
for alias, seq in traj.items():
    print(f"  {alias}: {len(seq)} entities")
events: 10 entities
intervals: 3 entities
states: 3 entities

Total running time of the script: (0 minutes 0.139 seconds)

Gallery generated by Sphinx-Gallery