RankCriterion#

Prune entity rows by their 0-based positional rank within each sequence. Ranks can be absolute (from the first entity) or relative to T0 (the nearest entity to the reference date set via pool.set_t0()).

Exactly one parameter group must be specified:

Group

Description

first=N

Keep first N rows (N < 0 → all except last |N|).

last=N

Keep last N rows (N < 0 → all except first |N|).

start / end / step

Python-slice semantics (negative indices supported).

ranks=[…]

Explicit list of 0-based positions (negative = from end).

Pass relative=True to interpret ranks relative to T0 rather than the start of the sequence.

RankCriterion supports ENTITY level only (filter_entities()); which() and match() are not available.

See Criteria for the full reference.

Imports#

from tanat import build_intervals
from tanat.criterion import RankCriterion
from tanat.dataset import simulate_intervals, simulate_static

Simulate data#

temporal = simulate_intervals(n_ids=50, features=["value", "status"], seed=42)
static = simulate_static(n_ids=50, features=["age"], seed=0)

pool = build_intervals(
    temporal_data=temporal,
    id_column="id",
    start_column="start",
    end_column="end",
    static_data=static,
)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity, time index & static features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 343 entities · 0.01s)
print(pool)
┌────────────────────────────────────────────────┐
│          IntervalSequencePool Summary          │
└────────────────────────────────────────────────┘

Overview
─────────────────────────
  Sequences          50
  Store              /home/runner/.tanat/_quick_interval_8578a7a3
  id_column          id

Time Index
─────────────────────────
  Type               Datetime(time_unit='us', time_zone=None) [2000-01-12 06:14:52.240595 → 2025-01-20 05:35:23.188780]
  Columns            ['start', 'end']
  t0                 position=0, anchor=start

Entity Features (2)
─────────────────────────
  • status              String [len 1 → 1]
  • value               Numerical [1 → 100]

Static Features (1)
─────────────────────────
  • age                 Numerical [1 → 98]
# Inspect length distribution or other summary statistics.
pool.describe(by_id=False)
length n_unique_entities temporal_span mean_duration median_duration duration_std
count 50.0 50.0 50 50 50 50
mean 6.86 6.76 6480 days, 2:48:22.247079 15 days, 3:30:25.259960 15 days, 5:12:18.547281 7 days, 21:43:35.511771
std 2.285804 2.254791 1941 days, 6:52:13.531688 3 days, 5:28:15.660040 4 days, 17:46:58.276359 2 days, 14:25:50.553305
min 3.0 3.0 1706 days, 19:27:07.917732 5 days, 8:01:28.714157 5 days, 0:11:27.379177 1 day, 4:52:50.506351
25% 5.0 5.0 5254 days 23:37:06.250402 13 days 00:36:46.440355 11 days 18:29:55.648568 6 days 18:26:20.009336
50% 7.0 7.0 7335 days 12:11:54.677473 16 days 02:40:15.396103 15 days 21:25:56.404198 8 days 03:08:02.672516
75% 9.0 9.0 7857 days 05:37:09.423327 17 days 03:52:02.575867 18 days 10:58:30.878432 9 days 15:57:33.150707
max 10.0 10.0 9050 days, 11:50:31.178892 20 days, 0:24:03.241089 24 days, 11:07:49.202588 14 days, 11:20:26.388911


first and last#

Positive N: keep the first (or last) N entities per sequence. Negative N: drop the last (or first) |N| entities per sequence.

Keep the first 2 entities.

pool_first2 = pool.filter_entities(RankCriterion(first=2))
[filter_entities] RankCriterion → 100 / 343 entities (29.2%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_first2.describe(by_id=False)
length n_unique_entities temporal_span mean_duration median_duration duration_std
count 50.0 50.0 50 50 50 50
mean 2.0 2.0 1102 days, 17:12:06.441231 15 days, 6:17:02.075181 15 days, 6:17:02.075181 6 days, 11:46:38.236155
std 0.0 0.0 1295 days, 21:59:44.794284 5 days, 17:25:40.840769 5 days, 17:25:40.840769 4 days, 11:30:05.029431
min 2.0 2.0 27 days, 13:00:38.184406 3 days, 6:58:38.300422 3 days, 6:58:38.300422 8:19:40.982244
25% 2.0 2.0 198 days 12:28:46.096482 11 days 06:23:14.007313 11 days 06:23:14.007313 2 days 12:54:47.526374
50% 2.0 2.0 612 days 01:31:56.227738 15 days 17:37:18.523221 15 days 17:37:18.523221 5 days 18:24:00.869290
75% 2.0 2.0 1505 days 07:18:38.524703 19 days 06:09:47.451627 19 days 06:09:47.451627 9 days 15:54:10.479344
max 2.0 2.0 6096 days, 18:48:20.592952 29 days, 1:30:26.294125 29 days, 1:30:26.294125 16 days, 0:10:41.477308


Keep the last 3 entities.

pool_last3 = pool.filter_entities(RankCriterion(last=3))
[filter_entities] RankCriterion → 150 / 343 entities (43.7%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_last3.describe(by_id=False)
length n_unique_entities temporal_span mean_duration median_duration duration_std
count 50.0 50.0 50 50 50 50
mean 3.0 3.0 2409 days, 22:36:59.299727 14 days, 10:18:46.546880 14 days, 4:22:39.013089 7 days, 23:27:06.548171
std 0.0 0.0 1711 days, 19:00:00.715298 4 days, 9:35:30.105332 6 days, 5:36:19.771278 3 days, 21:03:59.597423
min 3.0 3.0 114 days, 8:25:23.982373 5 days, 8:01:28.714157 3 days, 23:41:23.525356 1 day, 8:36:49.778852
25% 3.0 3.0 1178 days 11:02:10.068570 11 days 19:34:17.671606 9 days 07:11:18.119190 5 days 03:15:13.768991
50% 3.0 3.0 1906 days 06:23:15.141141 13 days 17:41:16.683981 14 days 09:28:58.416352 8 days 10:13:13.658113
75% 3.0 3.0 3234 days 03:57:33.858806 17 days 14:12:27.847166 18 days 22:14:07.122773 10 days 22:49:35.909691
max 3.0 3.0 7693 days, 14:18:02.546716 25 days, 4:12:04.688773 25 days, 19:25:19.607111 14 days, 21:22:24.397270


Drop the last entity: first=-1 keeps all except the final row.

pool_drop_last = pool.filter_entities(RankCriterion(first=-1))
[filter_entities] RankCriterion → 293 / 343 entities (85.4%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_drop_last.describe(by_id=False)
length n_unique_entities temporal_span mean_duration median_duration duration_std
count 50.0 50.0 50 50 50 50
mean 5.86 5.78 5409 days, 17:00:39.123569 15 days, 11:46:19.223260 16 days, 1:57:10.295216 7 days, 20:20:49.523824
std 2.285804 2.261388 2132 days, 19:09:44.231253 3 days, 20:23:33.200145 5 days, 5:21:02.459091 2 days, 21:11:24.320615
min 2.0 2.0 84 days, 6:22:44.238474 4 days, 20:10:02.642094 4 days, 14:35:40.286550 10:38:10.195443
25% 4.0 4.0 4265 days 05:44:33.344111 13 days 18:08:33.661429 13 days 07:10:07.615751 6 days 18:09:43.067696
50% 6.0 6.0 5892 days 23:37:25.765871 16 days 09:20:36.611135 16 days 20:36:33.650821 8 days 04:02:17.117695
75% 8.0 8.0 7287 days 00:11:11.270728 17 days 23:00:06.184673 19 days 13:43:56.072013 9 days 14:45:16.126218
max 9.0 9.0 9022 days, 18:03:04.805359 21 days, 23:52:13.248387 28 days, 15:17:14.627064 15 days, 0:08:02.244075


Slice: start / end / step#

Python-slice semantics. Negative indices count from the end of each sequence.

# Entities at absolute ranks 1, 2, 3 (0-based → second to fourth row).
pool_slice = pool.filter_entities(RankCriterion(start=1, end=4))
[filter_entities] RankCriterion → 144 / 343 entities (42.0%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_slice.describe(by_id=False)
length n_unique_entities temporal_span mean_duration median_duration duration_std
count 50.0 50.0 50 50 50 50
mean 2.88 2.84 2527 days, 1:50:01.528276 15 days, 23:20:10.328789 16 days, 1:16:14.553214 7 days, 20:22:40.269830
std 0.328261 0.370328 1532 days, 23:47:23.996785 4 days, 0:18:49.335931 5 days, 13:30:25.115618 3 days, 14:41:44.646000
min 2.0 2.0 330 days, 7:26:00.883757 4 days, 23:47:18.505237 4 days, 23:47:18.505237 11:30:32.534109
25% 3.0 3.0 1461 days 05:33:23.019443 12 days 15:44:47.537266 12 days 09:35:18.809547 5 days 07:36:32.281857
50% 3.0 3.0 2296 days 00:02:41.542028 16 days 14:53:08.575124 16 days 11:51:07.018873 8 days 00:00:51.582739
75% 3.0 3.0 3165 days 10:05:16.732374 18 days 16:33:56.745437 20 days 20:52:01.806712 10 days 07:55:43.200818
max 3.0 3.0 7498 days, 12:04:19.250631 23 days, 9:55:02.392406 24 days, 17:32:32.195825 16 days, 4:36:46.197545


# Every other entity (even-ranked rows).
pool_step = pool.filter_entities(RankCriterion(step=2))
[filter_entities] RankCriterion → 184 / 343 entities (53.6%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_step.describe(by_id=False)
length n_unique_entities temporal_span mean_duration median_duration duration_std
count 50.0 50.0 50 50 50 50
mean 3.68 3.66 6039 days, 6:55:55.528152 14 days, 13:55:34.788337 14 days, 18:34:19.437633 6 days, 17:44:46.282293
std 1.150687 1.135872 2032 days, 18:53:36.531325 5 days, 2:34:14.119958 6 days, 0:14:32.681523 3 days, 19:16:13.214073
min 2.0 2.0 1252 days, 16:33:53.835901 4 days, 13:53:32.425783 3 days, 2:09:11.398906 5:07:15.494454
25% 3.0 3.0 5091 days 11:49:33.900772 11 days 22:14:27.194932 10 days 16:29:13.043258 3 days 23:37:27.693552
50% 4.0 4.0 6914 days 00:09:57.672607 14 days 09:13:14.053611 15 days 15:01:23.480211 6 days 12:43:00.014330
75% 5.0 5.0 7533 days 14:55:40.472623 17 days 16:05:05.697831 18 days 21:35:19.501998 8 days 23:18:23.792210
max 5.0 5.0 9050 days, 11:50:31.178892 25 days, 17:01:49.395339 27 days, 18:57:11.476803 18 days, 16:07:16.124050


Explicit ranks#

Pass a list of 0-based positions. Negative values index from the end.

# First and last entity of each sequence.
pool_ends = pool.filter_entities(RankCriterion(ranks=[0, -1]))
[filter_entities] RankCriterion → 100 / 343 entities (29.2%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_ends.describe(by_id=False)
length n_unique_entities temporal_span mean_duration median_duration duration_std
count 50.0 50.0 50 50 50 50
mean 2.0 2.0 6479 days, 13:22:26.821449 14 days, 4:00:00.062427 14 days, 4:00:00.062427 7 days, 7:34:39.259236
std 0.0 0.0 1941 days, 7:56:06.169969 5 days, 4:38:10.410121 5 days, 4:38:10.410121 5 days, 2:59:04.771182
min 2.0 2.0 1706 days, 19:27:07.917732 2 days, 19:45:59.936197 2 days, 19:45:59.936197 0:51:14.229357
25% 2.0 2.0 5251 days 16:40:04.503601 10 days 21:42:49.048818 10 days 21:42:49.048818 3 days 20:47:16.152678
50% 2.0 2.0 7335 days 12:11:54.677473 13 days 21:07:13.730603 13 days 21:07:13.730603 6 days 07:37:39.644856
75% 2.0 2.0 7857 days 05:37:09.423327 17 days 04:53:44.247330 17 days 04:53:44.247330 10 days 16:25:03.828029
max 2.0 2.0 9050 days, 11:50:31.178892 26 days, 20:15:14.330130 26 days, 20:15:14.330130 18 days, 1:53:42.911846


Relative mode: ranks relative to T0#

Set a reference date with pool.set_t0() first. Then relative=True interprets ranks relative to the nearest entity to T0: rank 0 = that entity, rank -1 = one entity before, rank +1 = one after.

pool.set_t0(position=-1, anchor="start")  # T0 = start of last entity

# Keep the entity at T0 and the 2 entities before it: [T-2, T-1, T0].
# NOTE: relative=True, end is exclusive.
pool_t0 = pool.filter_entities(RankCriterion(start=-2, end=1, relative=True))

# Inspect length of filtered sequences.
pool_t0.describe(by_id=False)
[filter_entities] RankCriterion → 150 / 343 entities (43.7%) · 0 IDs affected
length n_unique_entities temporal_span mean_duration median_duration duration_std
count 50.0 50.0 50 50 50 50
mean 3.0 3.0 2409 days, 22:36:59.299727 14 days, 10:18:46.546880 14 days, 4:22:39.013089 7 days, 23:27:06.548171
std 0.0 0.0 1711 days, 19:00:00.715298 4 days, 9:35:30.105332 6 days, 5:36:19.771278 3 days, 21:03:59.597423
min 3.0 3.0 114 days, 8:25:23.982373 5 days, 8:01:28.714157 3 days, 23:41:23.525356 1 day, 8:36:49.778852
25% 3.0 3.0 1178 days 11:02:10.068570 11 days 19:34:17.671606 9 days 07:11:18.119190 5 days 03:15:13.768991
50% 3.0 3.0 1906 days 06:23:15.141141 13 days 17:41:16.683981 14 days 09:28:58.416352 8 days 10:13:13.658113
75% 3.0 3.0 3234 days 03:57:33.858806 17 days 14:12:27.847166 18 days 22:14:07.122773 10 days 22:49:35.909691
max 3.0 3.0 7693 days, 14:18:02.546716 25 days, 4:12:04.688773 25 days, 19:25:19.607111 14 days, 21:22:24.397270


Rank 0 alone: a single “anchor” entity per sequence.

pool_anchor = pool.filter_entities(RankCriterion(ranks=0, relative=True))

# Inspect T0 anchor entities.
pool_anchor.temporal_data().head()
[filter_entities] RankCriterion → 50 / 343 entities (14.6%) · 0 IDs affected
id start end status value
0 1 2022-06-17 21:11:58.027679 2022-06-28 00:02:58.842079 E 37
1 2 2023-05-28 19:55:43.087483 2023-06-02 08:47:10.044719 C 75
2 3 2020-08-20 13:09:36.251551 2020-09-14 19:38:24.321550 B 47
3 4 2024-12-23 19:47:08.046880 2025-01-20 05:35:23.188780 B 48
4 5 2020-01-15 03:09:55.880770 2020-02-07 12:33:10.685566 C 44


Total running time of the script: (0 minutes 0.158 seconds)

Gallery generated by Sphinx-Gallery