Note

Go to the end to download the full example code.

LengthCriterion#

Select sequences by their number of entity rows (sequence length).

Parameter	Description
`gt` / `ge`	Strictly greater than / greater than or equal to.
`lt` / `le`	Strictly less than / less than or equal to.

At least one bound must be supplied. Contradictory bounds (e.g. gt=5, lt=3) are rejected at construction time.

LengthCriterion supports SEQUENCE level only (which(), match()); filter_entities() is not available.

See Criteria for the full reference.

Imports#

from tanat import build_intervals
from tanat.criterion import LengthCriterion
from tanat.dataset import simulate_intervals

Simulate data#

temporal = simulate_intervals(n_ids=50, features=["value", "status"], seed=42)

pool = build_intervals(
    temporal_data=temporal,
    id_column="id",
    start_column="start",
    end_column="end",
)

┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity & time index features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 343 entities · 0.00s)

print(pool)

┌────────────────────────────────────────────────┐
│          IntervalSequencePool Summary          │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          50
  Store              /home/runner/.tanat/_quick_interval_2793f7bf
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-01-12 06:14:52.240595 → 2025-01-20 05:35:23.188780]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • status              String [len 1 → 1]
  • value               Numerical [1 → 100]

Inspect length distribution or other summary statistics.

pool.describe(by_id=False)

	length	n_unique_entities	temporal_span	mean_duration	median_duration	duration_std
count	50.0	50.0	50	50	50	50
mean	6.86	6.76	6480 days, 2:48:22.247079	15 days, 3:30:25.259960	15 days, 5:12:18.547281	7 days, 21:43:35.511771
std	2.285804	2.254791	1941 days, 6:52:13.531688	3 days, 5:28:15.660040	4 days, 17:46:58.276359	2 days, 14:25:50.553305
min	3.0	3.0	1706 days, 19:27:07.917732	5 days, 8:01:28.714157	5 days, 0:11:27.379177	1 day, 4:52:50.506351
25%	5.0	5.0	5254 days 23:37:06.250402	13 days 00:36:46.440355	11 days 18:29:55.648568	6 days 18:26:20.009336
50%	7.0	7.0	7335 days 12:11:54.677473	16 days 02:40:15.396103	15 days 21:25:56.404198	8 days 03:08:02.672516
75%	9.0	9.0	7857 days 05:37:09.423327	17 days 03:52:02.575867	18 days 10:58:30.878432	9 days 15:57:33.150707
max	10.0	10.0	9050 days, 11:50:31.178892	20 days, 0:24:03.241089	24 days, 11:07:49.202588	14 days, 11:20:26.388911

`which()`: single-bound selection#

# Long sequences: more than 6 entities.
ids_long = pool.which(LengthCriterion(gt=6))

[which]           LengthCriterion → 29 / 50 IDs (58.0%)

# Short sequences: at most 3 entities.
ids_short = pool.which(LengthCriterion(le=3))
print(f"Length ≤ 3 : {len(ids_short)} / {len(pool)} IDs")

[which]           LengthCriterion → 6 / 50 IDs (12.0%)
Length ≤ 3 : 6 / 50 IDs

Range selection#

Combine bounds to select sequences whose length falls in a range.

# Length = ]3, 6]
ids_medium = pool.which(LengthCriterion(gt=3, le=6))

[which]           LengthCriterion → 15 / 50 IDs (30.0%)

Subset the pool#

Use subset() to obtain a restricted pool from the selected IDs.

pool_long = pool.subset(ids_long)

print(pool_long)

┌────────────────────────────────────────────────┐
│          IntervalSequencePool Summary          │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          29
  Store              /home/runner/.tanat/_quick_interval_2793f7bf
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-01-12 06:14:52.240595 → 2025-01-05 21:55:52.963626]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • status              String [len 1 → 1]
  • value               Numerical [1 → 100]

# Inspect the length distribution in the subset.
pool_long.describe(by_id=False)

	length	n_unique_entities	temporal_span	mean_duration	median_duration	duration_std
count	29.0	29.0	29	29	29	29
mean	8.551724	8.413793	7328 days, 8:26:00.520809	15 days, 20:58:02.933330	15 days, 23:06:09.672359	8 days, 1:20:48.519965
std	0.985111	1.052794	1134 days, 21:23:06.091385	2 days, 9:56:11.098362	3 days, 22:47:57.361909	1 day, 12:37:21.243486
min	7.0	6.0	4366 days, 10:32:34.098950	9 days, 8:30:53.548390	5 days, 0:11:27.379177	4 days, 16:56:01.574529
25%	8.0	8.0	6845 days 05:07:11.823338	14 days 12:02:41.460306	14 days 17:06:18.669219	7 days 02:18:28.157876
50%	9.0	9.0	7689 days 12:33:58.121600	16 days 03:03:48.430141	15 days 21:59:53.357110	8 days 00:15:55.894990
75%	9.0	9.0	8004 days 19:04:00.961682	17 days 09:54:28.346099	18 days 12:41:38.304297	9 days 06:20:38.367288
max	10.0	10.0	9050 days, 11:50:31.178892	19 days, 12:30:30.076833	24 days, 11:07:49.202588	11 days, 1:30:43.139872

`match()`: single-sequence evaluation#

seq = pool[pool.unique_ids[0]]
seq_len = len(seq)
print(
    f"Sequence {seq.id_value}: length={seq_len}  "
    f"gt=6? {seq.match(LengthCriterion(gt=6))}  "
    f"le=3? {seq.match(LengthCriterion(le=3))}"
)

Sequence 1: length=3  gt=6? False  le=3? True

Total running time of the script: (0 minutes 0.054 seconds)

Gallery generated by Sphinx-Gallery

LengthCriterion#

Imports#

Simulate data#

which(): single-bound selection#

Range selection#

Subset the pool#

match(): single-sequence evaluation#

`which()`: single-bound selection#

`match()`: single-sequence evaluation#