RankCriterion
Prune entity rows by their 0-based positional rank within each sequence.
Ranks can be absolute (from the first entity) or relative to T0 (the nearest
entity to the reference date set via pool.set_t0()).
Exactly one parameter group must be specified:
Pass relative=True to interpret ranks relative to T0 rather than the
start of the sequence.
RankCriterion supports ENTITY level only
(filter_entities()); which() and match() are not available.
See Criteria for the full reference.
Imports
from tanat import build_intervals
from tanat.criterion import RankCriterion
from tanat.dataset import simulate_intervals, simulate_static
Simulate data
temporal = simulate_intervals(n_ids=50, features=["value", "status"], seed=42)
static = simulate_static(n_ids=50, features=["age"], seed=0)
pool = build_intervals(
temporal_data=temporal,
id_column="id",
start_column="start",
end_column="end",
static_data=static,
)
┌─ Interval SequenceStore
│
│ Step 1/4: Sorting & preparing data
│
│ Step 2/4: Building sequence index
│
│ Step 3/4: Writing entity, time index & static features
│
│ Step 4/4: Computing & writing metadata
│
└─ Done (50 sequences · 343 entities · 0.01s)
┌────────────────────────────────────────────────┐
│ IntervalSequencePool Summary │
└────────────────────────────────────────────────┘
Overview
─────────────────────────
Sequences 50
Store /home/runner/.tanat/_quick_interval_8578a7a3
id_column id
Time Index
─────────────────────────
Type Datetime(time_unit='us', time_zone=None) [2000-01-12 06:14:52.240595 → 2025-01-20 05:35:23.188780]
Columns ['start', 'end']
t0 position=0, anchor=start
Entity Features (2)
─────────────────────────
• status String [len 1 → 1]
• value Numerical [1 → 100]
Static Features (1)
─────────────────────────
• age Numerical [1 → 98]
# Inspect length distribution or other summary statistics.
pool.describe(by_id=False)
|
length |
n_unique_entities |
temporal_span |
mean_duration |
median_duration |
duration_std |
| count |
50.0 |
50.0 |
50 |
50 |
50 |
50 |
| mean |
6.86 |
6.76 |
6480 days, 2:48:22.247079 |
15 days, 3:30:25.259960 |
15 days, 5:12:18.547281 |
7 days, 21:43:35.511771 |
| std |
2.285804 |
2.254791 |
1941 days, 6:52:13.531688 |
3 days, 5:28:15.660040 |
4 days, 17:46:58.276359 |
2 days, 14:25:50.553305 |
| min |
3.0 |
3.0 |
1706 days, 19:27:07.917732 |
5 days, 8:01:28.714157 |
5 days, 0:11:27.379177 |
1 day, 4:52:50.506351 |
| 25% |
5.0 |
5.0 |
5254 days 23:37:06.250402 |
13 days 00:36:46.440355 |
11 days 18:29:55.648568 |
6 days 18:26:20.009336 |
| 50% |
7.0 |
7.0 |
7335 days 12:11:54.677473 |
16 days 02:40:15.396103 |
15 days 21:25:56.404198 |
8 days 03:08:02.672516 |
| 75% |
9.0 |
9.0 |
7857 days 05:37:09.423327 |
17 days 03:52:02.575867 |
18 days 10:58:30.878432 |
9 days 15:57:33.150707 |
| max |
10.0 |
10.0 |
9050 days, 11:50:31.178892 |
20 days, 0:24:03.241089 |
24 days, 11:07:49.202588 |
14 days, 11:20:26.388911 |
first and last
Positive N: keep the first (or last) N entities per sequence.
Negative N: drop the last (or first) |N| entities per sequence.
Keep the first 2 entities.
pool_first2 = pool.filter_entities(RankCriterion(first=2))
[filter_entities] RankCriterion → 100 / 343 entities (29.2%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_first2.describe(by_id=False)
|
length |
n_unique_entities |
temporal_span |
mean_duration |
median_duration |
duration_std |
| count |
50.0 |
50.0 |
50 |
50 |
50 |
50 |
| mean |
2.0 |
2.0 |
1102 days, 17:12:06.441231 |
15 days, 6:17:02.075181 |
15 days, 6:17:02.075181 |
6 days, 11:46:38.236155 |
| std |
0.0 |
0.0 |
1295 days, 21:59:44.794284 |
5 days, 17:25:40.840769 |
5 days, 17:25:40.840769 |
4 days, 11:30:05.029431 |
| min |
2.0 |
2.0 |
27 days, 13:00:38.184406 |
3 days, 6:58:38.300422 |
3 days, 6:58:38.300422 |
8:19:40.982244 |
| 25% |
2.0 |
2.0 |
198 days 12:28:46.096482 |
11 days 06:23:14.007313 |
11 days 06:23:14.007313 |
2 days 12:54:47.526374 |
| 50% |
2.0 |
2.0 |
612 days 01:31:56.227738 |
15 days 17:37:18.523221 |
15 days 17:37:18.523221 |
5 days 18:24:00.869290 |
| 75% |
2.0 |
2.0 |
1505 days 07:18:38.524703 |
19 days 06:09:47.451627 |
19 days 06:09:47.451627 |
9 days 15:54:10.479344 |
| max |
2.0 |
2.0 |
6096 days, 18:48:20.592952 |
29 days, 1:30:26.294125 |
29 days, 1:30:26.294125 |
16 days, 0:10:41.477308 |
Keep the last 3 entities.
pool_last3 = pool.filter_entities(RankCriterion(last=3))
[filter_entities] RankCriterion → 150 / 343 entities (43.7%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_last3.describe(by_id=False)
|
length |
n_unique_entities |
temporal_span |
mean_duration |
median_duration |
duration_std |
| count |
50.0 |
50.0 |
50 |
50 |
50 |
50 |
| mean |
3.0 |
3.0 |
2409 days, 22:36:59.299727 |
14 days, 10:18:46.546880 |
14 days, 4:22:39.013089 |
7 days, 23:27:06.548171 |
| std |
0.0 |
0.0 |
1711 days, 19:00:00.715298 |
4 days, 9:35:30.105332 |
6 days, 5:36:19.771278 |
3 days, 21:03:59.597423 |
| min |
3.0 |
3.0 |
114 days, 8:25:23.982373 |
5 days, 8:01:28.714157 |
3 days, 23:41:23.525356 |
1 day, 8:36:49.778852 |
| 25% |
3.0 |
3.0 |
1178 days 11:02:10.068570 |
11 days 19:34:17.671606 |
9 days 07:11:18.119190 |
5 days 03:15:13.768991 |
| 50% |
3.0 |
3.0 |
1906 days 06:23:15.141141 |
13 days 17:41:16.683981 |
14 days 09:28:58.416352 |
8 days 10:13:13.658113 |
| 75% |
3.0 |
3.0 |
3234 days 03:57:33.858806 |
17 days 14:12:27.847166 |
18 days 22:14:07.122773 |
10 days 22:49:35.909691 |
| max |
3.0 |
3.0 |
7693 days, 14:18:02.546716 |
25 days, 4:12:04.688773 |
25 days, 19:25:19.607111 |
14 days, 21:22:24.397270 |
Drop the last entity: first=-1 keeps all except the final row.
pool_drop_last = pool.filter_entities(RankCriterion(first=-1))
[filter_entities] RankCriterion → 293 / 343 entities (85.4%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_drop_last.describe(by_id=False)
|
length |
n_unique_entities |
temporal_span |
mean_duration |
median_duration |
duration_std |
| count |
50.0 |
50.0 |
50 |
50 |
50 |
50 |
| mean |
5.86 |
5.78 |
5409 days, 17:00:39.123569 |
15 days, 11:46:19.223260 |
16 days, 1:57:10.295216 |
7 days, 20:20:49.523824 |
| std |
2.285804 |
2.261388 |
2132 days, 19:09:44.231253 |
3 days, 20:23:33.200145 |
5 days, 5:21:02.459091 |
2 days, 21:11:24.320615 |
| min |
2.0 |
2.0 |
84 days, 6:22:44.238474 |
4 days, 20:10:02.642094 |
4 days, 14:35:40.286550 |
10:38:10.195443 |
| 25% |
4.0 |
4.0 |
4265 days 05:44:33.344111 |
13 days 18:08:33.661429 |
13 days 07:10:07.615751 |
6 days 18:09:43.067696 |
| 50% |
6.0 |
6.0 |
5892 days 23:37:25.765871 |
16 days 09:20:36.611135 |
16 days 20:36:33.650821 |
8 days 04:02:17.117695 |
| 75% |
8.0 |
8.0 |
7287 days 00:11:11.270728 |
17 days 23:00:06.184673 |
19 days 13:43:56.072013 |
9 days 14:45:16.126218 |
| max |
9.0 |
9.0 |
9022 days, 18:03:04.805359 |
21 days, 23:52:13.248387 |
28 days, 15:17:14.627064 |
15 days, 0:08:02.244075 |
Slice: start / end / step
Python-slice semantics. Negative indices count from the end of each
sequence.
# Entities at absolute ranks 1, 2, 3 (0-based → second to fourth row).
pool_slice = pool.filter_entities(RankCriterion(start=1, end=4))
[filter_entities] RankCriterion → 144 / 343 entities (42.0%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_slice.describe(by_id=False)
|
length |
n_unique_entities |
temporal_span |
mean_duration |
median_duration |
duration_std |
| count |
50.0 |
50.0 |
50 |
50 |
50 |
50 |
| mean |
2.88 |
2.84 |
2527 days, 1:50:01.528276 |
15 days, 23:20:10.328789 |
16 days, 1:16:14.553214 |
7 days, 20:22:40.269830 |
| std |
0.328261 |
0.370328 |
1532 days, 23:47:23.996785 |
4 days, 0:18:49.335931 |
5 days, 13:30:25.115618 |
3 days, 14:41:44.646000 |
| min |
2.0 |
2.0 |
330 days, 7:26:00.883757 |
4 days, 23:47:18.505237 |
4 days, 23:47:18.505237 |
11:30:32.534109 |
| 25% |
3.0 |
3.0 |
1461 days 05:33:23.019443 |
12 days 15:44:47.537266 |
12 days 09:35:18.809547 |
5 days 07:36:32.281857 |
| 50% |
3.0 |
3.0 |
2296 days 00:02:41.542028 |
16 days 14:53:08.575124 |
16 days 11:51:07.018873 |
8 days 00:00:51.582739 |
| 75% |
3.0 |
3.0 |
3165 days 10:05:16.732374 |
18 days 16:33:56.745437 |
20 days 20:52:01.806712 |
10 days 07:55:43.200818 |
| max |
3.0 |
3.0 |
7498 days, 12:04:19.250631 |
23 days, 9:55:02.392406 |
24 days, 17:32:32.195825 |
16 days, 4:36:46.197545 |
# Every other entity (even-ranked rows).
pool_step = pool.filter_entities(RankCriterion(step=2))
[filter_entities] RankCriterion → 184 / 343 entities (53.6%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_step.describe(by_id=False)
|
length |
n_unique_entities |
temporal_span |
mean_duration |
median_duration |
duration_std |
| count |
50.0 |
50.0 |
50 |
50 |
50 |
50 |
| mean |
3.68 |
3.66 |
6039 days, 6:55:55.528152 |
14 days, 13:55:34.788337 |
14 days, 18:34:19.437633 |
6 days, 17:44:46.282293 |
| std |
1.150687 |
1.135872 |
2032 days, 18:53:36.531325 |
5 days, 2:34:14.119958 |
6 days, 0:14:32.681523 |
3 days, 19:16:13.214073 |
| min |
2.0 |
2.0 |
1252 days, 16:33:53.835901 |
4 days, 13:53:32.425783 |
3 days, 2:09:11.398906 |
5:07:15.494454 |
| 25% |
3.0 |
3.0 |
5091 days 11:49:33.900772 |
11 days 22:14:27.194932 |
10 days 16:29:13.043258 |
3 days 23:37:27.693552 |
| 50% |
4.0 |
4.0 |
6914 days 00:09:57.672607 |
14 days 09:13:14.053611 |
15 days 15:01:23.480211 |
6 days 12:43:00.014330 |
| 75% |
5.0 |
5.0 |
7533 days 14:55:40.472623 |
17 days 16:05:05.697831 |
18 days 21:35:19.501998 |
8 days 23:18:23.792210 |
| max |
5.0 |
5.0 |
9050 days, 11:50:31.178892 |
25 days, 17:01:49.395339 |
27 days, 18:57:11.476803 |
18 days, 16:07:16.124050 |
Explicit ranks
Pass a list of 0-based positions. Negative values index from the end.
# First and last entity of each sequence.
pool_ends = pool.filter_entities(RankCriterion(ranks=[0, -1]))
[filter_entities] RankCriterion → 100 / 343 entities (29.2%) · 0 IDs affected
# Inspect length of filtered sequences.
pool_ends.describe(by_id=False)
|
length |
n_unique_entities |
temporal_span |
mean_duration |
median_duration |
duration_std |
| count |
50.0 |
50.0 |
50 |
50 |
50 |
50 |
| mean |
2.0 |
2.0 |
6479 days, 13:22:26.821449 |
14 days, 4:00:00.062427 |
14 days, 4:00:00.062427 |
7 days, 7:34:39.259236 |
| std |
0.0 |
0.0 |
1941 days, 7:56:06.169969 |
5 days, 4:38:10.410121 |
5 days, 4:38:10.410121 |
5 days, 2:59:04.771182 |
| min |
2.0 |
2.0 |
1706 days, 19:27:07.917732 |
2 days, 19:45:59.936197 |
2 days, 19:45:59.936197 |
0:51:14.229357 |
| 25% |
2.0 |
2.0 |
5251 days 16:40:04.503601 |
10 days 21:42:49.048818 |
10 days 21:42:49.048818 |
3 days 20:47:16.152678 |
| 50% |
2.0 |
2.0 |
7335 days 12:11:54.677473 |
13 days 21:07:13.730603 |
13 days 21:07:13.730603 |
6 days 07:37:39.644856 |
| 75% |
2.0 |
2.0 |
7857 days 05:37:09.423327 |
17 days 04:53:44.247330 |
17 days 04:53:44.247330 |
10 days 16:25:03.828029 |
| max |
2.0 |
2.0 |
9050 days, 11:50:31.178892 |
26 days, 20:15:14.330130 |
26 days, 20:15:14.330130 |
18 days, 1:53:42.911846 |
Relative mode: ranks relative to T0
Set a reference date with pool.set_t0() first. Then
relative=True interprets ranks relative to the nearest entity to T0:
rank 0 = that entity, rank -1 = one entity before, rank +1 = one after.
pool.set_t0(position=-1, anchor="start") # T0 = start of last entity
# Keep the entity at T0 and the 2 entities before it: [T-2, T-1, T0].
# NOTE: relative=True, end is exclusive.
pool_t0 = pool.filter_entities(RankCriterion(start=-2, end=1, relative=True))
# Inspect length of filtered sequences.
pool_t0.describe(by_id=False)
[filter_entities] RankCriterion → 150 / 343 entities (43.7%) · 0 IDs affected
|
length |
n_unique_entities |
temporal_span |
mean_duration |
median_duration |
duration_std |
| count |
50.0 |
50.0 |
50 |
50 |
50 |
50 |
| mean |
3.0 |
3.0 |
2409 days, 22:36:59.299727 |
14 days, 10:18:46.546880 |
14 days, 4:22:39.013089 |
7 days, 23:27:06.548171 |
| std |
0.0 |
0.0 |
1711 days, 19:00:00.715298 |
4 days, 9:35:30.105332 |
6 days, 5:36:19.771278 |
3 days, 21:03:59.597423 |
| min |
3.0 |
3.0 |
114 days, 8:25:23.982373 |
5 days, 8:01:28.714157 |
3 days, 23:41:23.525356 |
1 day, 8:36:49.778852 |
| 25% |
3.0 |
3.0 |
1178 days 11:02:10.068570 |
11 days 19:34:17.671606 |
9 days 07:11:18.119190 |
5 days 03:15:13.768991 |
| 50% |
3.0 |
3.0 |
1906 days 06:23:15.141141 |
13 days 17:41:16.683981 |
14 days 09:28:58.416352 |
8 days 10:13:13.658113 |
| 75% |
3.0 |
3.0 |
3234 days 03:57:33.858806 |
17 days 14:12:27.847166 |
18 days 22:14:07.122773 |
10 days 22:49:35.909691 |
| max |
3.0 |
3.0 |
7693 days, 14:18:02.546716 |
25 days, 4:12:04.688773 |
25 days, 19:25:19.607111 |
14 days, 21:22:24.397270 |
Rank 0 alone: a single “anchor” entity per sequence.
pool_anchor = pool.filter_entities(RankCriterion(ranks=0, relative=True))
# Inspect T0 anchor entities.
pool_anchor.temporal_data().head()
[filter_entities] RankCriterion → 50 / 343 entities (14.6%) · 0 IDs affected
|
id |
start |
end |
status |
value |
| 0 |
1 |
2022-06-17 21:11:58.027679 |
2022-06-28 00:02:58.842079 |
E |
37 |
| 1 |
2 |
2023-05-28 19:55:43.087483 |
2023-06-02 08:47:10.044719 |
C |
75 |
| 2 |
3 |
2020-08-20 13:09:36.251551 |
2020-09-14 19:38:24.321550 |
B |
47 |
| 3 |
4 |
2024-12-23 19:47:08.046880 |
2025-01-20 05:35:23.188780 |
B |
48 |
| 4 |
5 |
2020-01-15 03:09:55.880770 |
2020-02-07 12:33:10.685566 |
C |
44 |
Total running time of the script: (0 minutes 0.158 seconds)
Gallery generated by Sphinx-Gallery