Note
Go to the end to download the full example code.
TimeCriterion#
Filter entities or select sequences based on temporal bounds applied to the start and/or end time columns.
Parameter |
Description |
|---|---|
|
Inclusive bounds on the start column. |
|
Inclusive bounds on the end column (interval/state pools only). |
|
|
|
|
All bounds are inclusive. At least one bound must be supplied.
See Criteria for the full reference.
Imports#
import datetime as dt
from tanat import build_intervals, build_events
from tanat.criterion import TimeCriterion
from tanat.dataset import simulate_intervals, simulate_events
Simulate data#
temporal = simulate_intervals(n_ids=50, features=["value", "status"], seed=42)
pool = build_intervals(
temporal_data=temporal,
id_column="id",
start_column="start",
end_column="end",
)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 343 entities · 0.00s)
print(pool)
┌────────────────────────────────────────────────┐
│ IntervalSequencePool Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequences 50
Store /home/runner/.tanat/_quick_interval_9ef04e68
id_column id
Time Index
─────────────────────────
Type Datetime(time_unit='us', time_zone=None) [2000-01-12 06:14:52.240595 → 2025-01-20 05:35:23.188780]
Columns ['start', 'end']
t0 position=0, anchor=start
Entity Features (2)
─────────────────────────
• status String [len 1 → 1]
• value Numerical [1 → 100]
Inspect the time range covered by the data.
df = pool.temporal_data()
print(
f"start range: {df['start'].min()} → {df['start'].max()}\n"
f"end range : {df['end'].min()} → {df['end'].max()}"
)
start range: 2000-01-12 06:14:52.240595 → 2024-12-23 19:47:08.046880
end range : 2000-01-25 18:27:05.222274 → 2025-01-20 05:35:23.188780
which(): sequence-level selection#
# Any entity starts on or after a given date (default: all_entities=False).
cutoff = dt.datetime(2000, 7, 1)
ids_after = pool.which(TimeCriterion(start_ge=cutoff))
[which] TimeCriterion → 50 / 50 IDs (100.0%)
# All entities must start after the cutoff (stricter: all_entities=True).
ids_all_after = pool.which(TimeCriterion(start_ge=cutoff, all_entities=True))
[which] TimeCriterion → 42 / 50 IDs (84.0%)
# Two-sided window: sequences with at least one entity that starts in [t0, t1].
t0 = dt.datetime(2000, 3, 1)
t1 = dt.datetime(2000, 9, 1)
ids_window = pool.which(TimeCriterion(start_ge=t0, start_le=t1))
[which] TimeCriterion → 8 / 50 IDs (16.0%)
Overlap vs containment#
For duration-based sequences (Interval/State) two modes control how entity relate to the query window:
Overlap (
duration_within=False, default): entity touches the window → start ≤ window_end and end ≥ window_start.Containment (
duration_within=True): entity lies fully inside → start ≥ window_start and end ≤ window_end.
window_start = dt.datetime(2007, 1, 1)
window_end = dt.datetime(2008, 1, 1)
filtered_overlap = pool.filter_entities(
TimeCriterion(start_ge=window_start, end_le=window_end, duration_within=False)
)
[filter_entities] TimeCriterion → 17 / 343 entities (5.0%) · 34 IDs affected
filtered_within = pool.filter_entities(
TimeCriterion(start_ge=window_start, end_le=window_end, duration_within=True)
)
[filter_entities] TimeCriterion → 13 / 343 entities (3.8%) · 37 IDs affected
ids_overlap = pool.which(TimeCriterion(start_ge=window_start, end_le=window_end))
[which] TimeCriterion → 16 / 50 IDs (32.0%)
ids_within = pool.which(
TimeCriterion(start_ge=window_start, end_le=window_end, duration_within=True)
)
[which] TimeCriterion → 13 / 50 IDs (26.0%)
Event pools (single time column)#
For event sequences only start_ge / start_le apply; end_ge /
end_le and duration_within are unavailable.
raw_events = simulate_events(n_ids=50, features=["value", "status"], seed=1)
event_pool = build_events(
temporal_data=raw_events,
id_column="id",
time_column="time",
)
# Inspect time range.
ev_df = event_pool.temporal_data()
print(f"event time range: {ev_df['time'].min()} → {ev_df['time'].max()}")
┌─ Event SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 323 entities · 0.00s)
event time range: 2000-01-19 18:47:39.050831 → 2024-12-18 20:09:17.806869
ev_cutoff = dt.datetime(2000, 6, 1)
ids_ev = event_pool.which(TimeCriterion(start_ge=ev_cutoff))
[which] TimeCriterion → 50 / 50 IDs (100.0%)
match(): single-sequence evaluation#
%%
criterion = TimeCriterion(start_ge=cutoff)
# Iterate to find the first sequence that matches.
first_match = next((s for s in pool if s.match(criterion)), None)
if first_match:
print(f"First matching sequence: id={first_match.id_value}")
First matching sequence: id=1
Total running time of the script: (0 minutes 0.052 seconds)