Note
Go to the end to download the full example code.
StaticCriterion#
Select sequences or trajectories using a Polars expression evaluated against the static (per-ID) data. Static features do not vary over time; typical examples are age, group membership, or a baseline score.
Level |
Behaviour |
|---|---|
|
Returns IDs whose static row satisfies the expression. |
|
Same, at trajectory level. |
|
Returns |
|
Not supported — static data has no entity rows to prune. |
See Criteria for the full reference.
Imports#
import polars as pl
from tanat import build_events, build_intervals, build_trajectories
from tanat.criterion import StaticCriterion
from tanat.dataset import simulate_events, simulate_intervals, simulate_static
Simulate data#
StaticCriterion requires the pool to have static
features attached. Pass static_data to the builder (or call
pool.add_static_features() later).
temporal = simulate_intervals(
n_ids=50,
features=["value", "status"],
seed=42,
)
static = simulate_static(n_ids=50, features=["age", "group"], seed=0)
static.head()
pool = build_intervals(
temporal_data=temporal,
id_column="id",
start_column="start",
end_column="end",
static_data=static,
)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity, time index & static features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 343 entities · 0.01s)
print(pool)
┌────────────────────────────────────────────────┐
│ IntervalSequencePool Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequences 50
Store /home/runner/.tanat/_quick_interval_54e0d144
id_column id
Time Index
─────────────────────────
Type Datetime(time_unit='us', time_zone=None) [2000-01-12 06:14:52.240595 → 2025-01-20 05:35:23.188780]
Columns ['start', 'end']
t0 position=0, anchor=start
Entity Features (2)
─────────────────────────
• status String [len 1 → 1]
• value Numerical [1 → 100]
Static Features (2)
─────────────────────────
• age Numerical [1 → 98]
• group String [len 1 → 1]
which(): sequence-level selection#
The expression is evaluated once per ID against the static table.
IDs that lack a static row (e.g. IDs not present in static_data) do
not appear in the result.
# Numeric threshold.
ids_old = pool.which(StaticCriterion(query=pl.col("age") > 50))
[which] StaticCriterion → 27 / 50 IDs (54.0%)
# Categorical filter.
target_group = "A"
ids_group = pool.which(StaticCriterion(query=pl.col("group") == target_group))
[which] StaticCriterion → 4 / 50 IDs (8.0%)
# Combine conditions.
ids_combined = pool.which(
StaticCriterion(query=(pl.col("age") > 50) & (pl.col("group") == target_group))
)
[which] StaticCriterion → 4 / 50 IDs (8.0%)
# Use the result to subset the pool.
pool_old = pool.subset(ids_old)
print(pool_old)
┌────────────────────────────────────────────────┐
│ IntervalSequencePool Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequences 27
Store /home/runner/.tanat/_quick_interval_54e0d144
id_column id
Time Index
─────────────────────────
Type Datetime(time_unit='us', time_zone=None) [2000-02-06 14:20:19.371107 → 2025-01-07 04:41:52.057717]
Columns ['start', 'end']
t0 position=0, anchor=start
Entity Features (2)
─────────────────────────
• status String [len 1 → 1]
• value Numerical [2 → 100]
Static Features (2)
─────────────────────────
• age Numerical [51 → 98]
• group String [len 1 → 1]
Complement and partitioning#
The two complementary age filters partition the IDs that have a non-null age.
ids_young = pool.which(StaticCriterion(query=pl.col("age") <= 50))
ids_null_age = pool.which(StaticCriterion(query=pl.col("age").is_null()))
[which] StaticCriterion → 23 / 50 IDs (46.0%)
[which] StaticCriterion → 0 / 50 IDs (0.0%)
Trajectory pool#
StaticCriterion works identically on a
TrajectoryPool because trajectories share
the same static-data concept.
temporal_events = simulate_events(n_ids=50, features=["value", "status"], seed=1)
event_pool = build_events(
temporal_data=temporal_events,
id_column="id",
time_column="time",
)
tpool = build_trajectories(
pools={"admissions": pool, "labs": event_pool},
static_data=static,
id_column="id",
)
┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 323 entities · 0.00s)
┌─ TrajectoryStore
│
│ Step 1/2: Linking pools: admissions, labs
│
│ Step 2/2: Building trajectory index & metadata
│
└─ Done (50 trajectories · 2 pool(s) · 0.01s)
print(tpool)
┌────────────────────────────────────────────────┐
│ TrajectoryPool Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Trajectories 50
Store /home/runner/.tanat/_quick_trajectory_34979d41
id_column id
Time Index
─────────────────────────
Type Datetime(time_unit='us', time_zone=None) [2000-01-12 06:14:52.240595 → 2025-01-20 05:35:23.188780]
t0 position=0, anchor=start
Sequences (2)
─────────────────────────
• admissions IntervalSequencePool(n=50, entity_features=2, static_features=2, store='/home/runner/.tanat/_quick_interval_54e0d144')
• labs EventSequencePool(n=50, entity_features=2, static_features=0, store='/home/runner/.tanat/_quick_event_49e4f192')
Static Features (2)
─────────────────────────
• age Numerical [1 → 98]
• group String [len 1 → 1]
# Query on static features to get trajectory IDs.
traj_ids = tpool.which(StaticCriterion(query=pl.col("age") > 50))
[which] StaticCriterion → 27 / 50 IDs (54.0%)
match(): single-trajectory evaluation#
# Iterate to find the first trajectory that matches.
criterion = StaticCriterion(query=pl.col("age") > 50)
first_match = next((t for t in tpool if t.match(criterion)), None)
if first_match:
print(f"First matching trajectory: id={first_match.id_value}")
First matching trajectory: id=1
Total running time of the script: (0 minutes 0.057 seconds)