EntityCriterion#

Select sequences or prune entity rows using any Polars expression evaluated against the temporal data.

Level	Behaviour
`which()`	Returns IDs that have at least one row satisfying the expression.
`filter_entities()`	Keeps only the rows where the expression is `True`; sequences with zero matching rows disappear from the filtered view.
`match()`	Returns `True` iff the sequence has at least one matching row.

See Criteria for the full reference.

Imports#

import polars as pl

from tanat import build_intervals
from tanat.criterion import EntityCriterion
from tanat.dataset import simulate_intervals, simulate_static

Simulate data#

temporal = simulate_intervals(
    n_ids=50,
    features=["value", "status"],
    seed=42,
)
static = simulate_static(n_ids=50, features=["age", "group"], seed=0)

pool = build_intervals(
    temporal_data=temporal,
    id_column="id",
    start_column="start",
    end_column="end",
    static_data=static,
)

┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity, time index & static features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 343 entities · 0.01s)

print(pool)

┌────────────────────────────────────────────────┐
│          IntervalSequencePool Summary          │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          50
  Store              /home/runner/.tanat/_quick_interval_109c58b0
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-01-12 06:14:52.240595 → 2025-01-20 05:35:23.188780]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • status              String [len 1 → 1]
  • value               Numerical [1 → 100]

Static Features (2)
─────────────────────────
  • age                 Numerical [1 → 98]
  • group               String [len 1 → 1]

# Inspect the unique status values present in the data.
pool.temporal_data()["status"].unique()

<ArrowExtensionArray>
['C', 'D', 'E', 'A', 'B']
Length: 5, dtype: large_string[pyarrow]

`which()` : sequence-level selection#

Return the IDs of all sequences that have at least one entity row satisfying the expression. The original pool is left unchanged.

# Pick a status value that exists in the data
target_status = "A"
# Select sequences that have at least one entity with that status.
ids_with_status = pool.which(EntityCriterion(query=pl.col("status") == target_status))

[which]           EntityCriterion → 31 / 50 IDs (62.0%)

Numeric threshold: sequences with at least one high-value entity.

ids_high_value = pool.which(EntityCriterion(query=pl.col("value") > 80))

[which]           EntityCriterion → 36 / 50 IDs (72.0%)

Combine conditions with a Polars expression.

ids_combined = pool.which(
    EntityCriterion(query=(pl.col("status") == target_status) & (pl.col("value") > 80))
)

[which]           EntityCriterion → 9 / 50 IDs (18.0%)

`filter_entities()`: entity-level pruning#

Return a new pool view that contains only the rows satisfying the expression. The original pool is unchanged. Sequences with zero surviving rows no longer appear in the filtered pool.

filtered = pool.filter_entities(
    EntityCriterion(query=pl.col("status") == target_status)
)

[filter_entities] EntityCriterion → 73 / 343 entities (21.3%) · 19 IDs affected

# Combine two conditions in a single criterion to narrow further.
filtered2 = pool.filter_entities(
    EntityCriterion(query=(pl.col("status") == target_status) & (pl.col("value") > 80))
)

[filter_entities] EntityCriterion → 11 / 343 entities (3.2%) · 41 IDs affected

`match()`: single-sequence evaluation#

criterion = EntityCriterion(query=pl.col("status") == target_status)
# Iterate to find the first sequence that matches.
first_match = next((s for s in pool if s.match(criterion)), None)
if first_match:
    print(f"First matching sequence: id={first_match.id_value}")

First matching sequence: id=2

Total running time of the script: (0 minutes 0.068 seconds)

Gallery generated by Sphinx-Gallery