Criteria#

Criteria are composable filtering objects that evaluate temporal or static properties of sequences and entities. They expose a uniform three-operation API:

Operation

Description

pool.which(criterion)

Returns a set of IDs whose sequences satisfy the criterion at sequence level.

pool.filter_entities(criterion)

Returns a new pool view where only the entity rows satisfying the criterion are kept (entity level). The original pool is unchanged.

seq.match(criterion)

Returns True if the single sequence satisfies the criterion.

Each criterion declares which levels it supports. Applying an unsupported operation raises CriterionLevelError.


EntityCriterion#

Filter entities or select sequences using any Polars expression evaluated against the temporal data.

from tanat.criterion import EntityCriterion
import polars as pl

# Select sequences with at least one "error" row.
ids = pool.which(EntityCriterion(query=pl.col("status") == "error"))

# Keep only the "error" rows across all sequences.
pool2 = pool.filter_entities(EntityCriterion(query=pl.col("status") == "error"))

# Combine conditions with any Polars expression.
pool3 = pool.filter_entities(
    EntityCriterion(query=(pl.col("status") == "error") & (pl.col("value") > 0.5))
)

# Single-sequence match.
ok = seq.match(EntityCriterion(query=pl.col("status") == "error"))

The expression must return a Boolean column. Rows where it evaluates to True are kept (filter_entities) or counted towards sequence selection (which).

Parameter

Type

Description

query

pl.Expr

A Polars expression evaluated per entity row against the temporal data.


StaticCriterion#

Select sequences or trajectories using a Polars expression evaluated against the static (per-ID) data. The pool must have static features attached.

from tanat.criterion import StaticCriterion
import polars as pl

# Select IDs whose age exceeds 50.
ids = pool.which(StaticCriterion(query=pl.col("age") > 50))
pool2 = pool.subset(ids)

# Works identically on a TrajectoryPool.
traj_ids = tpool.which(StaticCriterion(query=pl.col("group") == "A"))

# Single match.
ok = seq.match(StaticCriterion(query=pl.col("age") > 50))

filter_entities() is not supported: static data has no entity rows.

Parameter

Type

Description

query

pl.Expr

A Polars expression evaluated per ID against the static data frame.


TimeCriterion#

Filter entities or select sequences based on temporal bounds on the start and/or end time columns. All bounds are inclusive.

import datetime as dt
from tanat.criterion import TimeCriterion

t0 = dt.datetime(2020, 1, 1)
t1 = dt.datetime(2021, 1, 1)

# Sequences with at least one entity starting on or after t0.
ids = pool.which(TimeCriterion(start_ge=t0))

# Sequences where ALL entities start on or after t0.
ids = pool.which(TimeCriterion(start_ge=t0, all_entities=True))

# Entity pruning: keep rows inside [t0, t1] (overlap mode, default).
pool2 = pool.filter_entities(TimeCriterion(start_ge=t0, end_le=t1))

# Containment mode: entity interval must be fully inside [t0, t1].
pool3 = pool.filter_entities(
    TimeCriterion(start_ge=t0, end_le=t1, duration_within=True)
)

# Numeric bounds for timestep pools.
ids = state_pool.which(TimeCriterion(start_ge=200.0, start_le=400.0))

Parameter

Type

Description

start_ge

TimeBound | None

Minimum value for the start column (inclusive).

start_le

TimeBound | None

Maximum value for the start column (inclusive).

end_ge

TimeBound | None

Minimum value for the end column: interval/state pools only.

end_le

TimeBound | None

Maximum value for the end column: interval/state pools only.

duration_within

bool

False (default): overlap is sufficient. True: entity interval must be fully contained in the window.

all_entities

bool

False (default): at least one row must match. True: every row must match.

TimeBound

TimeBound = datetime.datetime | datetime.date | int | float

All bounds within a single criterion call must share the same Python type. datetime and date may be mixed (datetime takes precedence). Use int or float for numeric timestep sequences.

Overlap vs containment (two-column pools)#

For interval and state sequences (duration-based sequences):

  • Overlap (duration_within=False, default): entity [s, e] overlaps window [lo, hi] when s hi AND e lo. Provide start_ge=lo, end_le=hi.

  • Containment (duration_within=True): entity is fully inside when s lo AND e hi. Provide start_ge=lo, end_le=hi.

Open-ended states (end = null) are treated as still-ongoing: their end is considered +∞ in overlap mode (they satisfy any end lo condition).


PatternCriterion#

Select sequences or extract witness rows based on an ordered pattern of string values in a feature column. Elements are matched in temporal order.

from tanat.criterion import PatternCriterion, ANY, WILDCARD

# A directly followed by B (adjacent).
ids = pool.which(PatternCriterion(feature="status", pattern=["A", "B"]))

# A before B with any number of rows in between.
ids = pool.which(PatternCriterion(feature="status", pattern=["A", ANY, "B"]))

# A, then exactly one element, then B.
ids = pool.which(PatternCriterion(feature="status", pattern=["A", WILDCARD, "B"]))

# Sequences that never contain A→B.
ids = pool.which(
    PatternCriterion(feature="status", pattern=["A", "B"], present=False)
)

# Keep only the witness rows (greedy first match).
pool2 = pool.filter_entities(
    PatternCriterion(feature="status", pattern=["A", "B"])
)

Sentinels#

Constant

Value

Description

ANY

"..."

Matches zero or more elements: free gap between adjacent sub-patterns.

WILDCARD

"*"

Matches exactly one element of any value.

Parameters#

Parameter

Type

Description

feature

str

Name of the string feature column to match against.

pattern

str | list[str]

Ordered pattern. A bare string is a single-element pattern.

present

bool

True (default): pattern must be present. False: pattern must be absent.

regex

bool

True (default): elements are regular expressions. False: literal substring matching.

case_sensitive

bool

True (default): case-sensitive. False: case-insensitive.

Entity-level behaviour#

  • present=True: keeps the greedy first-match witness rows only. Each ID contributes at most len(pattern) rows; IDs with no match contribute 0 rows.

  • present=False: keeps all rows that are not witnesses. IDs with no match keep all their rows.


LengthCriterion#

Select sequences by their number of entity rows (sequence length).

from tanat.criterion import LengthCriterion

# More than 6 entities.
ids = pool.which(LengthCriterion(gt=6))

# Between 3 and 10 entities (inclusive on both ends).
ids = pool.which(LengthCriterion(ge=3, le=10))

# Single match.
ok = seq.match(LengthCriterion(ge=3, lt=20))

filter_entities() is not supported.

Parameter

Type

Description

gt

int

Strictly greater than.

ge

int

Greater than or equal to.

lt

int

Strictly less than.

le

int

Less than or equal to.

At least one bound must be provided. Contradictory bounds (e.g. gt=5, lt=3) raise ValueError at construction time.


RankCriterion#

Prune entity rows by their 0-based positional rank within each sequence.

from tanat.criterion import RankCriterion

# Keep the first 3 entities.
pool2 = pool.filter_entities(RankCriterion(first=3))

# Keep all except the last 2 entities.
pool2 = pool.filter_entities(RankCriterion(first=-2))

# Keep the last 2 entities.
pool2 = pool.filter_entities(RankCriterion(last=2))

# Python-slice: ranks 1, 2, 3 (0-based).
pool2 = pool.filter_entities(RankCriterion(start=1, end=4))

# Every other entity.
pool2 = pool.filter_entities(RankCriterion(step=2))

# First and last entity.
pool2 = pool.filter_entities(RankCriterion(ranks=[0, -1]))

# Relative to T0: entity at T0 and the one after it.
pool.set_t0(position=0, anchor="start")
pool2 = pool.filter_entities(RankCriterion(start=0, end=2, relative=True))

which() and match() are not supported.

Parameter

Type

Description

first

int

Keep first N rows (< 0 → all except last |N|). Cannot be 0.

last

int

Keep last N rows (< 0 → all except first |N|). Cannot be 0.

start

int

Start rank inclusive (Python-style negative supported).

end

int

End rank exclusive (Python-style negative supported).

step

int

Sub-sample every N-th entity (≥ 1). Compatible with start/end or standalone.

ranks

list[int]

Explicit 0-based positions (negative = from end). A single int is accepted.

relative

bool

False (default): absolute ranks from start of sequence. True: ranks relative to T0 (requires pool.set_t0() first). Not compatible with first/last.

Exactly one parameter group must be active at a time.


Chaining criteria#

Criteria can be chained by passing the result of one operation as the target of the next. Each call returns a new pool view; the original is never modified.

# 1. Select IDs matching a static condition.
ids = pool.which(StaticCriterion(query=pl.col("age") > 50))

# 2. Restrict the pool to those IDs.
pool2 = pool.subset(ids)

# 3. Prune entity rows by time window.
pool3 = pool2.filter_entities(
    TimeCriterion(start_ge=dt.datetime(2020, 1, 1), end_le=dt.datetime(2021, 1, 1))
)

# 4. Keep only the first 2 entities per sequence.
pool4 = pool3.filter_entities(RankCriterion(first=2))

Alternatively, use which() results to drive multi-step pipelines:

ids_long = pool.which(LengthCriterion(gt=5))
ids_error = pool.which(PatternCriterion(feature="status", pattern="error"))
ids_target = ids_long & ids_error   # set intersection

pool_target = pool.subset(ids_target)

See Also#