Data Manipulation#

Reference for main operations available on sequence pools, trajectory pools, individual sequences, trajectories, and entities.

Iteration#

All containers implement the standard Python iteration protocol.

Syntax → yields	SP	S	TP	T
`for x in obj` → item	Sequence	Entity	Trajectory	alias (str)
`obj.items()` → (key, item) pairs	✗	✗	(id, Trajectory)	(alias, Sequence)

SP: SequencePool · TP: TrajectoryPool · S: Sequence · T: Trajectory

for traj in tpool:                  # TP → Trajectory
    print(traj.id_value)

for alias, seq in traj.items():     # T  → (alias, Sequence)
    print(alias, len(seq))

for seq in pool:                    # SP → Sequence
    print(seq.id_value, len(seq))

for entity in seq:                  # S  → Entity
    print(entity.temporal_extent, entity.data())

Subset#

Restrict a pool to a subset of IDs without copying data.

view = pool.subset(ids=["id_001", "id_042", "id_099"])
print(len(view))   # 3

The returned object is a view: it shares the same underlying store. Changes to the view (casts, feature drops…) are visible through the view only.

Feature Engineering#

All methods below operate lazily: transformations are applied on the fly at materialisation time and do not rewrite the store. Call pool.save to persist them.

Add and remove columns#

Attach new columns to the view or hide existing ones; the underlying store is never rewritten.

Method	Scope	Description
`pool.add_entity_features(df)`	SP	Append new entity-level columns. `df` must be positionally aligned with the full entity row set of the store. Blocked on filtered views; call `pool.save()` first.
`pool.add_static_features(df)`	SP, TP	Append new static columns joined by ID. Works on filtered views. Pass `id_column` if the join key column has a non-standard name.
`pool.drop_features(names, is_static)`	SP	Hide entity (default) or static features from the view. Pass `permanently=True` to also delete from disk.
`tpool.drop_static_features(names)`	TP	Hide static features from a TrajectoryPool view. Pass `permanently=True` to also delete from disk.

SP: SequencePool · TP: TrajectoryPool

Type casting#

All casts are lazy and scoped to the current view. Call pool.save to persist.

Method	Scope	Description
`pool.cast_features(schema, is_static)`	SP	Re-type entity (default) or static features. `schema` is a `dict[str, pl.DataType]`.
`tpool.cast_static_features(schema)`	TP	Re-type static features. Entity features must be cast on each linked sequence pool directly.
`pool.cast_to_datetime(unit, time_zone)`	SP, TP	Convert the time index to `pl.Datetime`. unit: `"s"` / `"ms"` / `"us"` (default) / `"ns"`. On TP the cast propagates to all linked sequence pools.
`pool.cast_to_timestep(dtype)`	SP, TP	Convert the time index to an integer type (e.g. `pl.Int64`). Cannot be applied if the time index is already in Datetime format.

SP: SequencePool · TP: TrajectoryPool

import polars as pl

# SequencePool: cast entity feature
pool.cast_features({"status": pl.Categorical})

# SequencePool: cast static feature
pool.cast_features({"age": pl.UInt8}, is_static=True)

# TrajectoryPool: cast static feature (different method name!)
tpool.cast_static_features({"group": pl.Categorical})

# Both: convert time index
pool.cast_to_datetime(unit="us", time_zone="UTC")
pool.cast_to_timestep(pl.Int32)

# Drop (SP only with is_static; TP: drop_static_features)
pool.drop_features(["flag_valid"], is_static=False)
tpool.drop_static_features(["debug_col"])

Transformation#

All methods in this section return a new DataFrame and do not modify the pool.

`apply`: evaluate an expression#

Evaluate a Polars expression against the pool’s temporal or static data. Available on SP and TP. Pass is_static=True to target static features.

At pool level, by_id=True groups the evaluation per ID, making it ideal for deriving per-sequence aggregates. A natural follow-up is to pipe the result directly into add_static_features or add_entity_features:

# Per-sequence mean → attach as a static feature
means = pool.apply(pl.col("value").mean().alias("value_mean"), by_id=True)
pool.add_static_features(means)

# Without by_id: expression runs over the full temporal data
flags = pool.apply(pl.col("value") > 0)

# On static data
result = pool.apply(pl.col("age") > 65, is_static=True)

# Works on TrajectoryPool too
stats = tpool.apply(pl.col("score").max().alias("score_max"), by_id=True)

SP: SequencePool · TP: TrajectoryPool

`to_dummies`: one-hot encode#

One-hot encode one or more Categorical features into binary indicator columns. Pass is_static=True to target static features instead of entity features.

# Entity features (default)
dummies = pool.to_dummies(["status", "category"])

# Static features
dummies = pool.to_dummies(["site"], is_static=True)

`binned_data` / `to_tensor`: regular time bins#

Project temporal features onto a regular time grid.

binned_data() returns a long-format DataFrame (pandas or polars). Useful for exploration, joins, and plotting.
to_tensor() returns a dense (N, M, K) ndarray together with IDs and K-axis feature labels. Useful for ML pipelines.

# Long-format dataframe
df = pool.binned_data(features=["value", "score"], bin_size="1d")

# ML-ready tensor with IDs and feature names
arr, ids, feature_names = pool.to_tensor(features=["value", "score"], bin_size="1d")

Descriptive Statistics#

# One row per sequence (length, temporal span, …)
pool.describe()

# Cross-ID aggregated stats (equivalent to pandas .describe())
pool.describe(by_id=False)

# Attach stats as static features (side-effect)
pool.describe(add_to_static=True)

# Single sequence
seq = pool[pool.unique_ids[0]]
seq.describe()

# TrajectoryPool: one row per trajectory, columns prefixed by alias
tpool.describe()

Persistence#

Transformations are lazy by default. Save a snapshot to make them permanent or to share a modified pool.

# Save under a new name (returns the new store path)
saved_path = pool.save("my_pool_optimised", overwrite=True)

# Copy the pool in-memory (deep copy of settings, same store)
clone = pool.copy()

Composition#

`extend`#

Merge another pool into the current one. Two execution paths:

Situation	Behaviour
Both pools share the same store	Fast path: union of ID masks, no I/O
Different stores	Cross-store: rebuilds a new store on disk; `destination` is required

# Same-store fast path (e.g. after train_test_split)
train, test = pool.train_test_split(test_size=0.3)
merged = train.extend(test)

# Cross-store merge
pool_a.extend(pool_b, destination="merged_pool",
              on_duplicate="skip", overwrite=True)

`train_test_split`#

Split a pool by unique IDs. The interface mirrors sklearn.model_selection.train_test_split.

train, test = pool.train_test_split(test_size=0.2, random_state=42)

# Guarantee: zero ID overlap
assert not set(train.unique_ids) & set(test.unique_ids)

Type Conversion#

Convert a pool between the three sequence types. The conversion is always view-level: the original store is not modified.

Method	Converts to
`pool.as_event()`	`EventSequencePool`
`pool.as_interval()`	`IntervalSequencePool`
`pool.as_state()`	`StateSequencePool`

event_view = interval_pool.as_event()   # treat interval start as event time

Temporal Alignment#

See Zeroing & Alignment for the full T0 reference.

# Set a reference date using the position strategy
pool.set_t0(position=0, anchor="start")

# Retrieve T0 values as a DataFrame
pool.t0_data()

# Sequence-level properties (available after set_t0)
seq = pool[pool.unique_ids[0]]
seq.t0               # T0 value for this sequence
seq.t0_nearest_rank  # 0-based index of the entity at or just before T0

Data Manipulation#

Navigation#

Iteration#

Subset#

Feature Engineering#

Add and remove columns#

Type casting#

Transformation#

`apply`: evaluate an expression#

`to_dummies`: one-hot encode#

`binned_data` / `to_tensor`: regular time bins#

Descriptive Statistics#

Persistence#

Composition#

`extend`#

`train_test_split`#

Type Conversion#

Temporal Alignment#

See Also#

Data Manipulation#

Navigation#

Iteration#

Subset#

Feature Engineering#

Add and remove columns#

Type casting#

Transformation#

apply: evaluate an expression#

to_dummies: one-hot encode#

binned_data / to_tensor: regular time bins#

Descriptive Statistics#

Persistence#

Composition#

extend#

train_test_split#

Type Conversion#

Temporal Alignment#

See Also#

`apply`: evaluate an expression#

`to_dummies`: one-hot encode#

`binned_data` / `to_tensor`: regular time bins#

`extend`#

`train_test_split`#