Criteria#
Criteria are composable filtering objects that evaluate temporal or static properties of sequences and entities. They expose a uniform three-operation API:
Operation |
Description |
|---|---|
|
Returns a |
|
Returns a new pool view where only the entity rows satisfying the criterion are kept (entity level). The original pool is unchanged. |
|
Returns |
Each criterion declares which levels it supports. Applying an unsupported
operation raises CriterionLevelError.
EntityCriterion#
Filter entities or select sequences using any Polars expression evaluated against the temporal data.
from tanat.criterion import EntityCriterion
import polars as pl
# Select sequences with at least one "error" row.
ids = pool.which(EntityCriterion(query=pl.col("status") == "error"))
# Keep only the "error" rows across all sequences.
pool2 = pool.filter_entities(EntityCriterion(query=pl.col("status") == "error"))
# Combine conditions with any Polars expression.
pool3 = pool.filter_entities(
EntityCriterion(query=(pl.col("status") == "error") & (pl.col("value") > 0.5))
)
# Single-sequence match.
ok = seq.match(EntityCriterion(query=pl.col("status") == "error"))
The expression must return a Boolean column. Rows where it evaluates to
True are kept (filter_entities) or counted towards sequence selection
(which).
Parameter |
Type |
Description |
|---|---|---|
|
|
A Polars expression evaluated per entity row against the temporal data. |
StaticCriterion#
Select sequences or trajectories using a Polars expression evaluated against the static (per-ID) data. The pool must have static features attached.
from tanat.criterion import StaticCriterion
import polars as pl
# Select IDs whose age exceeds 50.
ids = pool.which(StaticCriterion(query=pl.col("age") > 50))
pool2 = pool.subset(ids)
# Works identically on a TrajectoryPool.
traj_ids = tpool.which(StaticCriterion(query=pl.col("group") == "A"))
# Single match.
ok = seq.match(StaticCriterion(query=pl.col("age") > 50))
filter_entities() is not supported: static data has no entity rows.
Parameter |
Type |
Description |
|---|---|---|
|
|
A Polars expression evaluated per ID against the static data frame. |
TimeCriterion#
Filter entities or select sequences based on temporal bounds on the start and/or end time columns. All bounds are inclusive.
import datetime as dt
from tanat.criterion import TimeCriterion
t0 = dt.datetime(2020, 1, 1)
t1 = dt.datetime(2021, 1, 1)
# Sequences with at least one entity starting on or after t0.
ids = pool.which(TimeCriterion(start_ge=t0))
# Sequences where ALL entities start on or after t0.
ids = pool.which(TimeCriterion(start_ge=t0, all_entities=True))
# Entity pruning: keep rows inside [t0, t1] (overlap mode, default).
pool2 = pool.filter_entities(TimeCriterion(start_ge=t0, end_le=t1))
# Containment mode: entity interval must be fully inside [t0, t1].
pool3 = pool.filter_entities(
TimeCriterion(start_ge=t0, end_le=t1, duration_within=True)
)
# Numeric bounds for timestep pools.
ids = state_pool.which(TimeCriterion(start_ge=200.0, start_le=400.0))
Parameter |
Type |
Description |
|---|---|---|
|
TimeBound | |
Minimum value for the start column (inclusive). |
|
TimeBound | |
Maximum value for the start column (inclusive). |
|
TimeBound | |
Minimum value for the end column: interval/state pools only. |
|
TimeBound | |
Maximum value for the end column: interval/state pools only. |
|
|
|
|
|
|
TimeBound
TimeBound = datetime.datetime | datetime.date | int | float
All bounds within a single criterion call must share the same Python type.
datetime and date may be mixed (datetime takes precedence).
Use int or float for numeric timestep sequences.
Overlap vs containment (two-column pools)#
For interval and state sequences (duration-based sequences):
Overlap (
duration_within=False, default): entity[s, e]overlaps window[lo, hi]whens ≤ hi AND e ≥ lo. Providestart_ge=lo, end_le=hi.Containment (
duration_within=True): entity is fully inside whens ≥ lo AND e ≤ hi. Providestart_ge=lo, end_le=hi.
Open-ended states (end = null) are treated as still-ongoing: their end is
considered +∞ in overlap mode (they satisfy any end ≥ lo condition).
PatternCriterion#
Select sequences or extract witness rows based on an ordered pattern of string values in a feature column. Elements are matched in temporal order.
from tanat.criterion import PatternCriterion, ANY, WILDCARD
# A directly followed by B (adjacent).
ids = pool.which(PatternCriterion(feature="status", pattern=["A", "B"]))
# A before B with any number of rows in between.
ids = pool.which(PatternCriterion(feature="status", pattern=["A", ANY, "B"]))
# A, then exactly one element, then B.
ids = pool.which(PatternCriterion(feature="status", pattern=["A", WILDCARD, "B"]))
# Sequences that never contain A→B.
ids = pool.which(
PatternCriterion(feature="status", pattern=["A", "B"], present=False)
)
# Keep only the witness rows (greedy first match).
pool2 = pool.filter_entities(
PatternCriterion(feature="status", pattern=["A", "B"])
)
Sentinels#
Constant |
Value |
Description |
|---|---|---|
|
|
Matches zero or more elements: free gap between adjacent sub-patterns. |
|
|
Matches exactly one element of any value. |
Parameters#
Parameter |
Type |
Description |
|---|---|---|
|
|
Name of the string feature column to match against. |
|
|
Ordered pattern. A bare string is a single-element pattern. |
|
|
|
|
|
|
|
|
|
Entity-level behaviour#
present=True: keeps the greedy first-match witness rows only. Each ID contributes at mostlen(pattern)rows; IDs with no match contribute 0 rows.present=False: keeps all rows that are not witnesses. IDs with no match keep all their rows.
LengthCriterion#
Select sequences by their number of entity rows (sequence length).
from tanat.criterion import LengthCriterion
# More than 6 entities.
ids = pool.which(LengthCriterion(gt=6))
# Between 3 and 10 entities (inclusive on both ends).
ids = pool.which(LengthCriterion(ge=3, le=10))
# Single match.
ok = seq.match(LengthCriterion(ge=3, lt=20))
filter_entities() is not supported.
Parameter |
Type |
Description |
|---|---|---|
|
|
Strictly greater than. |
|
|
Greater than or equal to. |
|
|
Strictly less than. |
|
|
Less than or equal to. |
At least one bound must be provided. Contradictory bounds (e.g. gt=5,
lt=3) raise ValueError at construction time.
RankCriterion#
Prune entity rows by their 0-based positional rank within each sequence.
from tanat.criterion import RankCriterion
# Keep the first 3 entities.
pool2 = pool.filter_entities(RankCriterion(first=3))
# Keep all except the last 2 entities.
pool2 = pool.filter_entities(RankCriterion(first=-2))
# Keep the last 2 entities.
pool2 = pool.filter_entities(RankCriterion(last=2))
# Python-slice: ranks 1, 2, 3 (0-based).
pool2 = pool.filter_entities(RankCriterion(start=1, end=4))
# Every other entity.
pool2 = pool.filter_entities(RankCriterion(step=2))
# First and last entity.
pool2 = pool.filter_entities(RankCriterion(ranks=[0, -1]))
# Relative to T0: entity at T0 and the one after it.
pool.set_t0(position=0, anchor="start")
pool2 = pool.filter_entities(RankCriterion(start=0, end=2, relative=True))
which() and match() are not supported.
Parameter |
Type |
Description |
|---|---|---|
|
|
Keep first N rows ( |
|
|
Keep last N rows ( |
|
|
Start rank inclusive (Python-style negative supported). |
|
|
End rank exclusive (Python-style negative supported). |
|
|
Sub-sample every N-th entity (≥ 1). Compatible with |
|
|
Explicit 0-based positions (negative = from end). A single |
|
|
|
Exactly one parameter group must be active at a time.
Chaining criteria#
Criteria can be chained by passing the result of one operation as the target of the next. Each call returns a new pool view; the original is never modified.
# 1. Select IDs matching a static condition.
ids = pool.which(StaticCriterion(query=pl.col("age") > 50))
# 2. Restrict the pool to those IDs.
pool2 = pool.subset(ids)
# 3. Prune entity rows by time window.
pool3 = pool2.filter_entities(
TimeCriterion(start_ge=dt.datetime(2020, 1, 1), end_le=dt.datetime(2021, 1, 1))
)
# 4. Keep only the first 2 entities per sequence.
pool4 = pool3.filter_entities(RankCriterion(first=2))
Alternatively, use which() results to drive multi-step pipelines:
ids_long = pool.which(LengthCriterion(gt=5))
ids_error = pool.which(PatternCriterion(feature="status", pattern="error"))
ids_target = ids_long & ids_error # set intersection
pool_target = pool.subset(ids_target)
See Also#
EntityCriterion : EntityCriterion examples.
StaticCriterion : StaticCriterion examples.
TimeCriterion : TimeCriterion examples.
PatternCriterion : PatternCriterion examples.
LengthCriterion : LengthCriterion examples.
RankCriterion : RankCriterion examples.
Data Manipulation : Full operation reference (
which,filter_entities,subset).