StaticCriterion#

Select sequences or trajectories using a Polars expression evaluated against the static (per-ID) data. Static features do not vary over time; typical examples are age, group membership, or a baseline score.

Level

Behaviour

which() on a SequencePool

Returns IDs whose static row satisfies the expression.

which() on a TrajectoryPool

Same, at trajectory level.

match()

Returns True iff this sequence / trajectory’s static row matches.

filter_entities()

Not supported — static data has no entity rows to prune.

See Criteria for the full reference.

Imports#

import polars as pl

from tanat import build_events, build_intervals, build_trajectories
from tanat.criterion import StaticCriterion
from tanat.dataset import simulate_events, simulate_intervals, simulate_static

Simulate data#

StaticCriterion requires the pool to have static features attached. Pass static_data to the builder (or call pool.add_static_features() later).

temporal = simulate_intervals(
    n_ids=50,
    features=["value", "status"],
    seed=42,
)
static = simulate_static(n_ids=50, features=["age", "group"], seed=0)
static.head()
id age group
0 1 86 D
1 2 64 B
2 3 52 C
3 4 27 E
4 5 31 E


pool = build_intervals(
    temporal_data=temporal,
    id_column="id",
    start_column="start",
    end_column="end",
    static_data=static,
)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity, time index & static features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 343 entities · 0.01s)
print(pool)
┌────────────────────────────────────────────────┐
│          IntervalSequencePool Summary          │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          50
  Store              /home/runner/.tanat/_quick_interval_54e0d144
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-01-12 06:14:52.240595 → 2025-01-20 05:35:23.188780]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • status              String [len 1 → 1]
  • value               Numerical [1 → 100]

Static Features (2)
─────────────────────────
  • age                 Numerical [1 → 98]
  • group               String [len 1 → 1]

which(): sequence-level selection#

The expression is evaluated once per ID against the static table. IDs that lack a static row (e.g. IDs not present in static_data) do not appear in the result.

# Numeric threshold.
ids_old = pool.which(StaticCriterion(query=pl.col("age") > 50))
[which]           StaticCriterion → 27 / 50 IDs (54.0%)
# Categorical filter.
target_group = "A"
ids_group = pool.which(StaticCriterion(query=pl.col("group") == target_group))
[which]           StaticCriterion → 4 / 50 IDs (8.0%)
# Combine conditions.
ids_combined = pool.which(
    StaticCriterion(query=(pl.col("age") > 50) & (pl.col("group") == target_group))
)
[which]           StaticCriterion → 4 / 50 IDs (8.0%)
# Use the result to subset the pool.
pool_old = pool.subset(ids_old)
print(pool_old)
┌────────────────────────────────────────────────┐
│          IntervalSequencePool Summary          │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          27
  Store              /home/runner/.tanat/_quick_interval_54e0d144
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-02-06 14:20:19.371107 → 2025-01-07 04:41:52.057717]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • status              String [len 1 → 1]
  • value               Numerical [2 → 100]

Static Features (2)
─────────────────────────
  • age                 Numerical [51 → 98]
  • group               String [len 1 → 1]

Complement and partitioning#

The two complementary age filters partition the IDs that have a non-null age.

ids_young = pool.which(StaticCriterion(query=pl.col("age") <= 50))
ids_null_age = pool.which(StaticCriterion(query=pl.col("age").is_null()))
[which]           StaticCriterion → 23 / 50 IDs (46.0%)
[which]           StaticCriterion → 0 / 50 IDs (0.0%)

Trajectory pool#

StaticCriterion works identically on a TrajectoryPool because trajectories share the same static-data concept.

temporal_events = simulate_events(n_ids=50, features=["value", "status"], seed=1)

event_pool = build_events(
    temporal_data=temporal_events,
    id_column="id",
    time_column="time",
)
tpool = build_trajectories(
    pools={"admissions": pool, "labs": event_pool},
    static_data=static,
    id_column="id",
)
┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 323 entities · 0.00s)
┌─ TrajectoryStore
│
│ Step 1/2: Linking pools: admissions, labs
│
│ Step 2/2: Building trajectory index & metadata
│
└─ Done (50 trajectories · 2 pool(s) · 0.01s)
print(tpool)
┌────────────────────────────────────────────────┐
│             TrajectoryPool Summary             │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Trajectories       50
  Store              /home/runner/.tanat/_quick_trajectory_34979d41
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-01-12 06:14:52.240595 → 2025-01-20 05:35:23.188780]
  t0                 position=0, anchor=start

Sequences (2)
─────────────────────────
  • admissions          IntervalSequencePool(n=50, entity_features=2, static_features=2, store='/home/runner/.tanat/_quick_interval_54e0d144')
  • labs                EventSequencePool(n=50, entity_features=2, static_features=0, store='/home/runner/.tanat/_quick_event_49e4f192')

Static Features (2)
─────────────────────────
  • age                 Numerical [1 → 98]
  • group               String [len 1 → 1]
# Query on static features to get trajectory IDs.
traj_ids = tpool.which(StaticCriterion(query=pl.col("age") > 50))
[which]           StaticCriterion → 27 / 50 IDs (54.0%)

match(): single-trajectory evaluation#

# Iterate to find the first trajectory that matches.
criterion = StaticCriterion(query=pl.col("age") > 50)
first_match = next((t for t in tpool if t.match(criterion)), None)
if first_match:
    print(f"First matching trajectory: id={first_match.id_value}")
First matching trajectory: id=1

Total running time of the script: (0 minutes 0.057 seconds)

Gallery generated by Sphinx-Gallery