tanat.trajectory package#
Submodules#
tanat.trajectory.pool module#
TrajectoryPool: aggregation of SequencePool views.
- class tanat.trajectory.pool.TrajectoryPool(store: str | Path | TrajectoryStore, *, id_column: str = 'id', static_features: list[str] | None = None, cast_recipe: TrajectoryCastRecipe | dict | None = None)[source]#
Bases:
TrajectoryViewMixin,CachableSettingsAggregates
SequencePoolviews into trajectories.Accepts a store name, path, or
TrajectoryStoreinstance, following the same convention asSequencePool.Usage:
store_path = ( TrajectoryPool.builder() .add("medical", medical_pool) .add("lab", lab_pool) .build("./my_trajectories") ) pool = TrajectoryPool(store="./my_trajectories")
- SETTINGS_CLASS[source]#
alias of
TrajectorySettings
- __init__(store: str | Path | TrajectoryStore, *, id_column: str = 'id', static_features: list[str] | None = None, cast_recipe: TrajectoryCastRecipe | dict | None = None) None[source]#
Create a trajectory pool backed by store.
- Parameters:
store – Store path, name, or
TrajectoryStoreinstance.id_column – User-facing name for the trajectory ID column.
static_features – Static feature names to expose.
None→ all available.[]→ none.cast_recipe – Optional cast recipe (or dict) applied at read time. Only
idandstaticfields are meaningful at this level. Normalised viaTrajectoryCastRecipe.coerce()and probed eagerly.
- Raises:
TypeError – If cast_recipe is not a
TrajectoryCastRecipe,dict, orNone.
- add_static_features(df: DataFrame | LazyFrame | DataFrame, *, id_column: str | None = None, overwrite: bool = False) None[source]#
Add static features to the trajectory pool via an ID-keyed join.
The input DataFrame must include the trajectory ID column (either under
settings.id_columnor under the name given by id_column). A LEFT JOIN against the full trajectory index is performed internally, so partial DataFrames (covering only a subset of trajectory IDs) are valid: absent IDs receivenullin the virtual context.Because alignment is handled by the join rather than by row position, this method works on views with pending changes (cast, virtual features, masks). Only the IDs visible in the view are exposed when reading back with
static_data().- Parameters:
df – DataFrame containing the ID column plus one or more feature columns. Can be pandas, Polars eager, or Polars lazy.
id_column – Name of the ID column in df. Defaults to
settings.id_columnwhenNone. Pass an explicit name when the join key in df differs from the pool’s public ID name (e.g.id_column="traj_id").overwrite – If
True, replaces features that already exist in the virtual context.
- Raises:
KeyError – If the resolved ID column is not found in df.
- binned_data(features: dict[str, list[str] | str], bin_size: str | int | float, max_bins: int | None = None, fill_value: Any = None, overlap_rule: str = 'first', ohe: bool = False, fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True, bin_col: str = '__bin__') DataFrame | DataFrame[source]#
Project all aliases onto a single shared binned table (long format).
All sub-pools share one global
(t_min, t_max, bin_size)axis, derived from the union of their temporal spans. Output columns are prefixed"{alias}_{feature}"to avoid collisions.For an ML-ready 3-D tensor with feature labels and ID order, see
to_tensor().- Parameters:
features – Mapping
{alias: feature(s)}. Each alias must exist in this pool.strvalues are auto-promoted to[str].bin_size – Bin width on the shared axis.
max_bins – Capped by
MAX_BINS_LIMITwhenNone. An explicit value bypasses the cap — the caller opts in knowingly.fill_value – Applied once, after the cross-join over trajectory IDs. Per-alias fills are not applied.
overlap_rule – In-bin aggregation, applied per alias.
ohe – One-hot encode per alias. Output names reflect post-OHE columns.
fmt –
"pandas"or"polars".use_arrow – Arrow-backed pandas conversion.
bin_col – Output bin index column name.
- Returns:
DataFrame with columns
[traj_id, bin_col, "{alias1}_{feat1}", "{alias1}_{feat2}", ..., "{alias2}_{feat1}", ...].
- classmethod builder() TrajectoryStoreBuilder[source]#
Return a fluent builder for constructing a trajectory store.
- cast_id(dtype: DataType) None[source]#
Casts the trajectory ID column to a new type.
The cast is propagated automatically to all linked sequence pools (accessible via
sequence_pools) so that entity data and static data surface IDs in the same type at every level.- Parameters:
dtype – Target Polars DataType (e.g.
pl.String,pl.UInt32).- Raises:
TypeError – If the cast is incompatible with the stored ID values.
- cast_static_features(schema: dict[str, DataType | type]) None[source]#
Casts trajectory-level static-feature columns to new types.
Only static features can be cast at trajectory level: entity features live inside the linked sequence stores and must be cast there.
- Parameters:
schema – Dictionary mapping feature names to target Polars DataTypes (e.g.
{"group": pl.Categorical}).- Raises:
TypeError – If schema is not a dict.
KeyError – If a feature name does not exist in the current view.
- cast_to_datetime(unit: str = 'us', time_zone: str | None = None) None[source]#
Casts time columns to Datetime across all linked sequence pools.
All sequence stores are guaranteed to share the same temporal schema (enforced at build time), so a single probe against the trajectory store is sufficient - exactly like
cast_id(). The cast is stored in the trajectory-level recipe and re-propagated to every pool on nextsequence_poolsaccess.- Parameters:
unit – Datetime resolution (
"ms","us","ns"). Default is"us"(microsecond).time_zone – Optional timezone string (e.g.
"UTC","Europe/Paris").
- Raises:
ValueError – If unit is not one of the accepted values.
TypeError – If the cast is incompatible with the temporal data.
- cast_to_timestep(dtype: DataType = Int64) None[source]#
Casts time columns to numeric-based timesteps across all linked sequence pools.
All sequence stores are guaranteed to share the same temporal schema (enforced at build time), so a single probe against the trajectory store is sufficient - exactly like
cast_id(). The cast is stored in the trajectory-level recipe and re-propagated to every pool on nextsequence_poolsaccess.- Parameters:
dtype – Target numeric type (e.g.
pl.UInt32,pl.Int64,pl.Float64). Default ispl.Int64.- Raises:
TypeError – If dtype is not a numeric type, or if the temporal data is already in Datetime format.
- copy() TrajectoryPool[source]#
Return a shallow copy sharing the same store, with all view state preserved.
The new pool references the same
TrajectoryStoreand the same virtual context (_virtual_id) so virtual features are immediately visible.- Returns:
A new
TrajectoryPoolwith identical settings, casts, masks and virtual context.
Note
Chaining with
save()produces a fully independent pool at a new path without mutating the original instance:pool2 = TrajectoryPool(store=pool.copy().save("other_path"))
Use this when you need both the original and a snapshot at a new destination.
pool.save("other_path")alone would redirect pool itself to"other_path".See also
- describe(by_id: bool = True, add_to_static: bool = False, separator: str = '_', fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True) DataFrame | DataFrame[source]#
Compute summary statistics across all sequences and all trajectories.
- Parameters:
by_id – If
True(default), return one row per trajectory. IfFalse, return cross-trajectory pandas.describe().add_to_static – If
True, persist the per-ID result viaadd_static_features(). Ignored (with a warning) whenby_id=False.separator – Separator between alias and metric name (default
_).fmt –
"pandas"(default) or"polars".use_arrow – Use Arrow extension arrays for polars -> pandas conversion.
- Returns:
DataFrame with columns
[id, n_sequences, {alias}{sep}length, …].
Examples:
traj_pool.describe() traj_pool.describe(separator=".") traj_pool.describe(by_id=False) traj_pool.describe(add_to_static=True)
- drop_sequence_pools(*aliases: str) None[source]#
Hides one or more store aliases from this view.
The underlying
TrajectoryStoreis not modified. Only the pool’s visible aliases (and derived properties liketrajectory_indexandunique_ids) are affected.- Parameters:
aliases – One or more alias names to hide.
- Raises:
RuntimeError – If the pool is not built yet.
KeyError – If an alias does not exist in the store.
- drop_static_features(features: list[str] | str, *, permanently: bool = False) None[source]#
Removes static features from the view (and optionally from disk).
By default this is a soft drop: features are removed from the settings so they no longer appear in
static_data(), but the underlying data is left untouched.With
permanently=Truethe columns are also deleted from disk / virtual context (irreversible).- Parameters:
features – Feature name(s) to drop.
permanently – If
True, also remove from disk/virtual.
- extend(other: TrajectoryPool | Trajectory, destination: str | Path | None = None, *, on_duplicate: Literal['raise', 'skip'] = 'raise', overwrite: bool = False) TrajectoryPool[source]#
Merge other into this trajectory pool and write the result to disk.
Mirrors the semantics of
save().Same-store fast path - if both trajectory pools share
_store.root_pathand neither carries virtual content (_virtual_id is Noneon both sides), no I/O is performed. A new pool backed by the same store with the union of ID masks is returned immediately. If destination is provided the merged pool is materialised viasave(); otherwise it is returned as an in-memory view with zero I/O.Different stores (or virtual content present) - destination is required. For each alias in this pool,
extend()is called on the corresponding sub-pools; the results are assembled into a new trajectory store via the builder. Passdestination=self._store.root_pathwithoverwrite=Trueto rewrite in-place.- Parameters:
other – Trajectory pool or single
Trajectoryto merge.destination –
None→ in-memory view (same-store fast path only; no I/O);str/Path→ materialise the merged data to disk. destination is required when merging from different stores.on_duplicate –
Behaviour when other contains a trajectory ID already present in this pool:
"raise"(default): raiseValueError."skip": silently ignore duplicates.
overwrite – Allows overwriting an existing destination when it already exists on disk.
- Returns:
Always a new
TrajectoryPool- neverself.- Raises:
TypeError – If other is not a
TrajectoryPoolorTrajectory.TypeError – If a sub-pool has an incompatible ID dtype or temporal schema.
ValueError – If a sub-pool in other is missing features present in the corresponding sub-pool of self.
ValueError – If
on_duplicate="raise"and duplicate IDs are found.ValueError – If
destination=Noneand stores differ.FileExistsError – If destination exists and
overwrite=False.
Note
Aliases present in other but absent from
selfare silently ignored (logged atWARNING). Aliases present inselfbut absent from other are carried over unchanged.
- filter_entities(criterion: Criterion, *, alias: str, inplace: bool = False, verbose: bool = True) TrajectoryPool[source]#
Return a new TrajectoryPool view with entities filtered by criterion.
- Parameters:
criterion – A
Criterioninstance.alias – Sequence alias to apply the criterion on.
inplace – If
True, modify this pool’s in place instead of returning a new view.verbose – If
True, print a one-line report.
- Returns:
A new
TrajectoryPoolview with the criterion applied, orselfif inplace isTrue.- Raises:
TypeError – If criterion is not a Criterion object.
CriterionLevelError – If the criterion is incompatible with entity filtering.
- get_trajectories(static_features: list[str] | None = None, aliases: list[str] | None = None) dict[str, Trajectory][source]#
All visible
Trajectoryinstances, keyed by ID.Materialises every trajectory reachable through the current view (respecting
_id_maskand_alias_mask). Useful for iteration-heavy workflows where the same trajectory is accessed multiple times.- Parameters:
static_features – Static features to expose in each
Trajectory.None→ use the pool-level setting.[]→ no static features.aliases – Sequence-store aliases to expose in each
Trajectory.None→ use the pool-level alias mask. Must be a subset of the pool’s visible aliases.
- Raises:
KeyError – If any alias in aliases is not visible in the current pool view.
- property is_dirty: bool[source]#
Trueif the pool (or any linked sequence pool) has unsaved state.Trajectory-level: virtual features, ID mask, type casts, soft drops. Sub-pool level: delegates to each pool’s
is_dirty.A dirty pool needs
save()with a destination to materialise all pending changes (sub-pool changes cannot be saved in-place).
- save(destination: str | Path | None = None, *, overwrite: bool = False, deep: bool = False) Path[source]#
Persists the current pool state to disk.
Trajectory-level (
trajectory_index.arrow,static_features.arrow) is always written, with ID and static casts baked in.Sequence pools - persisted according to their state:
Modified pools (virtual features or casts) are always saved to
destination/stores/<alias>/, regardless of deep.Unmodified pools: copied when
deep=True; referenced by absolute path whendeep=False.
For in-place saves the linked sequence stores are never touched (they may be shared with other pool instances).
Without destination the trajectory-level files are rewritten in-place. With destination a copy is created; the original is untouched.
- Parameters:
destination – Where to save. Can be: -
None→ in-place, - a workspace store name (no/or\), - or a filesystemPath/ path string. Passing a path that resolves to the current store root is equivalent toNone(treated as in-place).overwrite – Required when saving in-place (trajectory-level files will be overwritten). Also allows overwriting an existing destination. Each dirty sub-pool is saved in-place at its current location with
overwrite=Trueautomatically.deep – If
True, all sequence stores (including unmodified ones) are copied todestination/stores/<alias>/. WhenFalse(default), only modified stores are materialised there; the rest are kept as absolute links. Ignored when saving in-place.
- Returns:
The
Pathof the written store - the in-place root when destination isNone, otherwise the resolved destination path. Useful for chaining:pool2 = TrajectoryPool(store=pool.save("my_trajectories"))
- Raises:
RuntimeError – If saving in-place without
overwrite=True.FileExistsError – If destination already exists and overwrite is
False.
Note
This method mutates
self: after the call, the pool is redirected to destination (its store, masks and virtual context are all reset to the written state). To keep the original instance unchanged while creating an independent copy elsewhere, usecopy()first:pool2 = TrajectoryPool(store=pool.copy().save("other_path")) # pool is still pointing to its original store
See also
Warning
Saving in-place with dirty sequence pools will overwrite those stores at their current location (
stores/<alias>/within the trajectory root). If the stores are shared with other pool instances those instances will also reflect the changes.Note
With
deep=Trueall links are relative - suitable for archiving or transfer. Withdeep=False, absolute links to unchanged stores are not portable across machines.
- property sequence_pools: MappingProxyType[source]#
Visible
SequencePoolinstances, keyed by alias.Returns a read-only mapping filtered by the current alias mask. Direct item assignment (e.g.
tpool.sequence_pools[alias] = …) raisesTypeError- usesubset()ordrop_sequence_pools()to change the visible pools.
- set_t0(*, position: int | None = None, direct: datetime | date | int | float | None | dict[Any, datetime | date | int | float | None] = None, feature: str | None = None, query: Expr | None = None, anchor: Literal['start', 'end', 'middle'] | None = None, use_first: bool = True, on: str | None = None) TrajectoryPool[source]#
Configure the T0 strategy for this trajectory pool.
Builds a
T0Settervia the registry and delegates tosetter.compute_from_trajectory(self, on=on). The setter stores the resulting[id_col, _T0_]DataFrame; per-alias nearest ranks are computed lazily in_get_traj_t0_df().- Parameters:
position – Row index (0-based; negative indexing supported).
direct – Scalar value or
{traj_id: value}dict.feature – Trajectory-level static feature column name.
query – Polars boolean expression evaluated on the reference sub-pool’s columns.
anchor –
"start"/"end"/"middle"for interval/state pools.use_first – For the query strategy only.
on – Alias of the sub-pool used to compute T0. Required for
positionandquerystrategies. Ignored (with warning) fordirectandfeature.
- Returns:
selffor chaining.- Raises:
TypeError – If
onis missing forposition/query.KeyError – If
onrefers to an alias not visible in this pool.
- static_data(features: list[str] | str | None = None, fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True) DataFrame | DataFrame | None[source]#
Return trajectory-level static data for visible trajectories.
- Parameters:
features – Static feature name(s) to include.
None-> all visible static features.fmt –
"pandas"(default) or"polars".use_arrow – Use Arrow extension arrays for polars -> pandas conversion.
- Returns:
One-row-per-trajectory DataFrame with columns
[id, feature...].Nonewhen no static features are exposed by this pool view.
To restrict to a subset of IDs, use
pool.subset(ids).static_data().
- subset(ids, *, inplace: bool = False) TrajectoryPool[source]#
Return a view restricted to the given trajectory IDs.
All IDs must be present in the current
unique_ids(i.e. they must pass the existing mask, if any). The new view inherits the full pool state (casts, virtual features, alias mask).- Parameters:
ids – Trajectory ID(s) to keep. A single value is accepted and treated as a one-element list.
inplace – If
True, modify this pool in-place rather than returning a new instance.
- Returns:
A
TrajectoryPoolrestricted to ids (orselfwhen inplace=True).- Raises:
KeyError – If any ID is not present in
unique_ids.
- t0_data(fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True) DataFrame | DataFrame[source]#
Return the T0 table for all visible trajectories.
Columns:
[id_col, _T0_, <alias1>_T0_NEAREST_RANK_, ...]. Each alias gets its own nearest-rank column because the floor lookup depends on the alias-specific temporal index.- Parameters:
fmt –
"pandas"(default) or"polars".use_arrow – Use Arrow extension arrays for polars -> pandas conversion.
- Returns:
One row per visible trajectory ID.
- to_tensor(features: dict[str, list[str] | str], bin_size: str | int | float, max_bins: int | None = None, fill_value: Any = None, overlap_rule: str = 'first', ohe: bool = False, bin_col: str = '__bin__') tuple[ndarray, list, list[str]][source]#
Project all aliases onto a single 3-D tensor with prefixed labels.
The K axis stacks features from every alias, ordered by the iteration order of
featuresthen by feature order within each alias. The returnedfeature_nameslist mirrors that ordering exactly.For a long-format dataframe variant (joins, plotting, exploration), see
binned_data().- Parameters:
features – Same as
binned_data().bin_size – Same semantics as
binned_data().max_bins – Same semantics as
binned_data().fill_value – Same semantics as
binned_data().overlap_rule – Same semantics as
binned_data().ohe – Same semantics as
binned_data().bin_col – Same semantics as
binned_data().
- Returns:
arrhas shape(N, M, K)=(len(unique_ids), n_bins, len(feature_names)).idsis the trajectory ID sequence matching the N-axis order (identical tounique_ids).feature_nameslists the K-axis labels in column order, prefixed"{alias}_{feat}".
- Return type:
A 3-tuple
(arr, ids, feature_names)where
Examples:
arr, ids, names = tpool.to_tensor( {"drugs": "dose", "labs": ["hb", "wbc"]}, "1d" ) # names == ["drugs_dose", "labs_hb", "labs_wbc"] # arr.shape == (N, M, 3)
- train_test_split(*, test_size: float | int | None = None, train_size: float | int | None = None, random_state: int | None = None, shuffle: bool = True) tuple[TrajectoryPool, TrajectoryPool][source]#
Split the pool into train and test subsets.
Mirrors the interface of
sklearn.model_selection.train_test_split().- Parameters:
test_size – Proportion (
floatin(0, 1)) or absolute count (int) of samples for the test subset. Defaults to0.25when both test_size and train_size areNone.train_size – Proportion (
floatin(0, 1)) or absolute count (int) of samples for the train subset. Defaults to the complement of test_size.random_state – Seed for the random number generator. Pass an integer for reproducibility.
shuffle – Whether to shuffle IDs before splitting. When
False, the first IDs go to train and the last to test.
- Returns:
(train_pool, test_pool)- two new non-overlapping pool views.- Raises:
ValueError – If the pool is empty, sizes are non-positive, or
n_train + n_testexceeds the pool size.
- property unique_ids: list[source]#
Visible trajectory IDs as a plain Python list.
Respects
_id_mask.Warning
listerases rich Polars dtypes. Prefer_id_lfwhen the result feeds a Polars join.
- which(criterion: Criterion, *, verbose: bool = True) set[source]#
Return the set of IDs in this pool satisfying criterion.
- Parameters:
criterion – A
Criterioninstance.verbose – If
True, print a one-line report.
- Returns:
Set of matching IDs.
- Raises:
TypeError – If criterion is not a Criterion object.
CriterionLevelError – If the criterion is incompatible with Trajectory level.
tanat.trajectory.settings module#
Settings for TrajectoryPool views.
Mirrors the SequenceSettings pattern: the Store holds all
columns on disk; the view exposes only the features listed here.
- class tanat.trajectory.settings.TrajectorySettings(*, id_column: str = '_traj_id', static_features: list[str] = <factory>)[source]#
Bases:
objectView-layer settings for a
TrajectoryPool.- static_features[source]#
Feature names visible in
static_data().Nonemeans no static features exposed (the default untiladd_static_featuresis called).- Type:
list[str]
- get_column_rename_map() dict[str, str][source]#
Returns the mapping from store-internal column names to user-facing names.
Currently only the trajectory-ID column is renamed:
_traj_id→id_column.
- model_dump(*, mode='python', **dump_kwargs)[source]#
Dump settings to a dict via Pydantic serialization.
- validate_features(features: list[str] | str, *, on_missing: str = 'raise') list[str][source]#
Validates explicit feature names against the current settings.
- Parameters:
features – Feature name(s) to validate.
on_missing –
"raise"(default),"warn"or"ignore".
- Returns:
List of validated feature names.
- Raises:
KeyError – If
on_missing="raise"and a feature is missing.
tanat.trajectory.shortcuts module#
Quick-build helper for trajectory pools.
- tanat.trajectory.shortcuts.build_trajectories(pools: dict[str, SequencePool], *, static_data: pd.DataFrame | pl.DataFrame | pl.LazyFrame | None = None, id_column: str | None = None, store_name: str | None = None) TrajectoryPool[source]#
Build a
TrajectoryPoolfrom a dict of pre-built sequence pools.- Parameters:
pools – Mapping of
{alias: SequencePool}. Each alias becomes the key used to access the sub-sequence inside a trajectory (e.g.traj["admissions"]).static_data – Optional DataFrame or LazyFrame with per-trajectory static features. When provided,
id_columnmust also be set.id_column – Name of the id column in
static_data. Required whenstatic_datais notNone. Ignored otherwise.store_name – Name for the on-disk store. When
Nonea unique name is generated automatically (_quick_trajectory_<hex8>).
- Returns:
A ready-to-use
TrajectoryPool.- Raises:
ValueError – If
static_datais provided withoutid_column, or ifid_columnis absent fromstatic_data.
Examples:
tpool = build_trajectories( pools={"admissions": adm_pool, "procedures": proc_pool}, ) tpool[tpool.unique_ids[0]]["admissions"].temporal_data(fmt="polars")
tanat.trajectory.trajectory module#
Single trajectory: sequences sharing the same ID across stores.
- class tanat.trajectory.trajectory.Trajectory(id_value, store: str | Path | TrajectoryStore, *, id_column: str = 'id', static_features: list[str] | None = None)[source]#
Bases:
TrajectoryViewMixin,CachableSettingsAccess every
Sequencethat shares a given ID across the linked stores.Usage:
traj["medical"] → Sequence "medical" in traj → bool for alias in traj: ... → iterates aliases for alias, seq in traj.items(): ...
- SETTINGS_CLASS[source]#
alias of
TrajectorySettings
- __init__(id_value, store: str | Path | TrajectoryStore, *, id_column: str = 'id', static_features: list[str] | None = None) None[source]#
Create a trajectory view for id_value.
- Parameters:
id_value – Trajectory identifier.
store – Store path, name, or
TrajectoryStoreinstance.id_column – User-facing name for the trajectory ID column.
static_features – Static feature names to expose.
None→ all available.[]→ none.
- describe(separator: str = '_', fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True) DataFrame | DataFrame[source]#
Compute summary statistics for this single trajectory.
Calls
seq.describe()for each visible sequence and prefixes metric columns with{alias}{separator}. The result is a single-row DataFrame.- Parameters:
separator – Separator between alias and metric name (default
_).fmt –
"pandas"(default) or"polars".use_arrow – Use Arrow extension arrays for polars -> pandas conversion.
- Returns:
Single-row DataFrame with columns
[n_sequences, {alias}{sep}length, …].
Examples:
traj = traj_pool[42] traj.describe() traj.describe(separator=".", fmt="polars")
- classmethod from_parent(id_value, store: TrajectoryStore, settings: TrajectorySettings, *, parent_pool: TrajectoryPool, alias_mask: set[str] | None = None) Trajectory[source]#
Create a pool-managed trajectory. Not part of the public API.
Bypasses store resolution, feature resolution, and cast probe: all already performed by the pool. Pool context (casts, virtual ID, sequence pools, metadata) is read lazily from parent_pool via the corresponding properties.
- Parameters:
id_value – Trajectory identifier.
store – Already-resolved
TrajectoryStore.settings – Fully-resolved
TrajectorySettings.parent_pool – The owning
TrajectoryPool.alias_mask – Override the pool-level alias mask (used by
TrajectoryPool.get_trajectories()with explicit aliases).
- Returns:
A new
Trajectoryinstance bound to parent_pool.
- items() Iterator[tuple[str, Sequence]][source]#
Yield
(alias, sequence)pairs - mirrorsdict.items().
- match(criterion: Criterion) bool[source]#
Return
Trueif this trajectory satisfies criterion.- Parameters:
criterion – A
Criterioninstance.- Raises:
TypeError – If criterion is not a Criterion object.
CriterionLevelError – If the criterion is incompatible with trajectories.
- property sequences: dict[source]#
All visible
Sequenceinstances for this trajectory, keyed by store alias.Cached per trajectory state: built once and reused across calls. Invalidated automatically when underlying settings change.
- static_data(features: list[str] | str | None = None, fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True) DataFrame | DataFrame | None[source]#
Return trajectory-level static data for this trajectory only.
- Parameters:
features – Static feature name(s) to include.
None-> all visible static features.fmt –
"pandas"(default) or"polars".use_arrow – Use Arrow extension arrays for polars -> pandas conversion.
- Returns:
One-row DataFrame with
[id, feature...]orNonewhen no static features are available in the current view.
- property t0: datetime | date | int | float | None[source]#
T0 value for this trajectory.
Nonewhen T0 could not be determined (e.g. no matching row).
- property t0_nearest_rank: dict[str, int | None][source]#
Per-alias nearest rank at or before T0.
Returns a dict keyed by visible alias name, e.g.
{"medical": 2, "lab": 5}. Value isNonewhen T0 isNoneor when no row satisfiesstart <= T0in that alias. An empty dict is returned for standalone (non-pool) trajectories.
tanat.trajectory.view_mixin module#
TrajectoryViewMixin: shared view-layer logic for TrajectoryPool and Trajectory.
Both TrajectoryPool and Trajectory are scoped views on a
TrajectoryStore.
- class tanat.trajectory.view_mixin.TrajectoryFrameAssembler(view: Trajectory | TrajectoryPool)[source]#
Bases:
objectAssembles view-schema LazyFrames from the store for one trajectory view.
- __init__(view: Trajectory | TrajectoryPool) None[source]#
- select(lf: LazyFrame, feature_names: list[str]) LazyFrame[source]#
Select the trajectory ID column plus feature_names.
- class tanat.trajectory.view_mixin.TrajectoryViewMixin[source]#
Bases:
objectMixin providing view-layer helpers shared by
TrajectoryPoolandTrajectory.- apply(exprs: Expr | list[Expr], *, lazy: bool = False, to_pandas: bool = False) LazyFrame | DataFrame | DataFrame[source]#
Evaluates Polars expressions against trajectory-level static features.
This is a read-only computation: the result is returned, not stored. Use
add_static_features()to persist the result.Each expression must produce a named column (
.alias()).- Parameters:
exprs – One or more Polars expressions producing new columns.
lazy – If
True, returns apl.LazyFrame(no collect).to_pandas – If
True, returns apandas.DataFrame.
- Returns:
The computed columns as a DataFrame (or LazyFrame).
- Raises:
ValueError – If no static features are available.
Examples:
result = pool.apply( (pl.col("score") * pl.col("weight")).alias("weighted_score"), ) pool.add_static_features(result)
- property metadata: TrajectoryMetadata[source]#
Returns trajectory-level metadata, fully reflecting this view’s cast recipes, masks, and feature selection.
When created from a parent
TrajectoryPool, the pool’s metadata is returned directly or scoped by the view’s settings (when built with a feature subset).For a standalone view, the traj_id dtype is derived from the cast recipe (or the store schema when no cast is active) - no full plan traversal required.
Automatically cached via
CachableSettings: the cache is invalidated whenever settings change.
Module contents#
Trajectory module.
- class tanat.trajectory.Trajectory(id_value, store: str | Path | TrajectoryStore, *, id_column: str = 'id', static_features: list[str] | None = None)[source]#
Bases:
TrajectoryViewMixin,CachableSettingsAccess every
Sequencethat shares a given ID across the linked stores.Usage:
traj["medical"] → Sequence "medical" in traj → bool for alias in traj: ... → iterates aliases for alias, seq in traj.items(): ...
- SETTINGS_CLASS[source]#
alias of
TrajectorySettings
- __init__(id_value, store: str | Path | TrajectoryStore, *, id_column: str = 'id', static_features: list[str] | None = None) None[source]#
Create a trajectory view for id_value.
- Parameters:
id_value – Trajectory identifier.
store – Store path, name, or
TrajectoryStoreinstance.id_column – User-facing name for the trajectory ID column.
static_features – Static feature names to expose.
None→ all available.[]→ none.
- describe(separator: str = '_', fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True) DataFrame | DataFrame[source]#
Compute summary statistics for this single trajectory.
Calls
seq.describe()for each visible sequence and prefixes metric columns with{alias}{separator}. The result is a single-row DataFrame.- Parameters:
separator – Separator between alias and metric name (default
_).fmt –
"pandas"(default) or"polars".use_arrow – Use Arrow extension arrays for polars -> pandas conversion.
- Returns:
Single-row DataFrame with columns
[n_sequences, {alias}{sep}length, …].
Examples:
traj = traj_pool[42] traj.describe() traj.describe(separator=".", fmt="polars")
- classmethod from_parent(id_value, store: TrajectoryStore, settings: TrajectorySettings, *, parent_pool: TrajectoryPool, alias_mask: set[str] | None = None) Trajectory[source]#
Create a pool-managed trajectory. Not part of the public API.
Bypasses store resolution, feature resolution, and cast probe: all already performed by the pool. Pool context (casts, virtual ID, sequence pools, metadata) is read lazily from parent_pool via the corresponding properties.
- Parameters:
id_value – Trajectory identifier.
store – Already-resolved
TrajectoryStore.settings – Fully-resolved
TrajectorySettings.parent_pool – The owning
TrajectoryPool.alias_mask – Override the pool-level alias mask (used by
TrajectoryPool.get_trajectories()with explicit aliases).
- Returns:
A new
Trajectoryinstance bound to parent_pool.
- items() Iterator[tuple[str, Sequence]][source]#
Yield
(alias, sequence)pairs - mirrorsdict.items().
- match(criterion: Criterion) bool[source]#
Return
Trueif this trajectory satisfies criterion.- Parameters:
criterion – A
Criterioninstance.- Raises:
TypeError – If criterion is not a Criterion object.
CriterionLevelError – If the criterion is incompatible with trajectories.
- property sequences: dict[source]#
All visible
Sequenceinstances for this trajectory, keyed by store alias.Cached per trajectory state: built once and reused across calls. Invalidated automatically when underlying settings change.
- static_data(features: list[str] | str | None = None, fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True) DataFrame | DataFrame | None[source]#
Return trajectory-level static data for this trajectory only.
- Parameters:
features – Static feature name(s) to include.
None-> all visible static features.fmt –
"pandas"(default) or"polars".use_arrow – Use Arrow extension arrays for polars -> pandas conversion.
- Returns:
One-row DataFrame with
[id, feature...]orNonewhen no static features are available in the current view.
- property t0: datetime | date | int | float | None[source]#
T0 value for this trajectory.
Nonewhen T0 could not be determined (e.g. no matching row).
- property t0_nearest_rank: dict[str, int | None][source]#
Per-alias nearest rank at or before T0.
Returns a dict keyed by visible alias name, e.g.
{"medical": 2, "lab": 5}. Value isNonewhen T0 isNoneor when no row satisfiesstart <= T0in that alias. An empty dict is returned for standalone (non-pool) trajectories.
- class tanat.trajectory.TrajectoryPool(store: str | Path | TrajectoryStore, *, id_column: str = 'id', static_features: list[str] | None = None, cast_recipe: TrajectoryCastRecipe | dict | None = None)[source]#
Bases:
TrajectoryViewMixin,CachableSettingsAggregates
SequencePoolviews into trajectories.Accepts a store name, path, or
TrajectoryStoreinstance, following the same convention asSequencePool.Usage:
store_path = ( TrajectoryPool.builder() .add("medical", medical_pool) .add("lab", lab_pool) .build("./my_trajectories") ) pool = TrajectoryPool(store="./my_trajectories")
- SETTINGS_CLASS[source]#
alias of
TrajectorySettings
- __init__(store: str | Path | TrajectoryStore, *, id_column: str = 'id', static_features: list[str] | None = None, cast_recipe: TrajectoryCastRecipe | dict | None = None) None[source]#
Create a trajectory pool backed by store.
- Parameters:
store – Store path, name, or
TrajectoryStoreinstance.id_column – User-facing name for the trajectory ID column.
static_features – Static feature names to expose.
None→ all available.[]→ none.cast_recipe – Optional cast recipe (or dict) applied at read time. Only
idandstaticfields are meaningful at this level. Normalised viaTrajectoryCastRecipe.coerce()and probed eagerly.
- Raises:
TypeError – If cast_recipe is not a
TrajectoryCastRecipe,dict, orNone.
- add_static_features(df: DataFrame | LazyFrame | DataFrame, *, id_column: str | None = None, overwrite: bool = False) None[source]#
Add static features to the trajectory pool via an ID-keyed join.
The input DataFrame must include the trajectory ID column (either under
settings.id_columnor under the name given by id_column). A LEFT JOIN against the full trajectory index is performed internally, so partial DataFrames (covering only a subset of trajectory IDs) are valid: absent IDs receivenullin the virtual context.Because alignment is handled by the join rather than by row position, this method works on views with pending changes (cast, virtual features, masks). Only the IDs visible in the view are exposed when reading back with
static_data().- Parameters:
df – DataFrame containing the ID column plus one or more feature columns. Can be pandas, Polars eager, or Polars lazy.
id_column – Name of the ID column in df. Defaults to
settings.id_columnwhenNone. Pass an explicit name when the join key in df differs from the pool’s public ID name (e.g.id_column="traj_id").overwrite – If
True, replaces features that already exist in the virtual context.
- Raises:
KeyError – If the resolved ID column is not found in df.
- binned_data(features: dict[str, list[str] | str], bin_size: str | int | float, max_bins: int | None = None, fill_value: Any = None, overlap_rule: str = 'first', ohe: bool = False, fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True, bin_col: str = '__bin__') DataFrame | DataFrame[source]#
Project all aliases onto a single shared binned table (long format).
All sub-pools share one global
(t_min, t_max, bin_size)axis, derived from the union of their temporal spans. Output columns are prefixed"{alias}_{feature}"to avoid collisions.For an ML-ready 3-D tensor with feature labels and ID order, see
to_tensor().- Parameters:
features – Mapping
{alias: feature(s)}. Each alias must exist in this pool.strvalues are auto-promoted to[str].bin_size – Bin width on the shared axis.
max_bins – Capped by
MAX_BINS_LIMITwhenNone. An explicit value bypasses the cap — the caller opts in knowingly.fill_value – Applied once, after the cross-join over trajectory IDs. Per-alias fills are not applied.
overlap_rule – In-bin aggregation, applied per alias.
ohe – One-hot encode per alias. Output names reflect post-OHE columns.
fmt –
"pandas"or"polars".use_arrow – Arrow-backed pandas conversion.
bin_col – Output bin index column name.
- Returns:
DataFrame with columns
[traj_id, bin_col, "{alias1}_{feat1}", "{alias1}_{feat2}", ..., "{alias2}_{feat1}", ...].
- classmethod builder() TrajectoryStoreBuilder[source]#
Return a fluent builder for constructing a trajectory store.
- cast_id(dtype: DataType) None[source]#
Casts the trajectory ID column to a new type.
The cast is propagated automatically to all linked sequence pools (accessible via
sequence_pools) so that entity data and static data surface IDs in the same type at every level.- Parameters:
dtype – Target Polars DataType (e.g.
pl.String,pl.UInt32).- Raises:
TypeError – If the cast is incompatible with the stored ID values.
- cast_static_features(schema: dict[str, DataType | type]) None[source]#
Casts trajectory-level static-feature columns to new types.
Only static features can be cast at trajectory level: entity features live inside the linked sequence stores and must be cast there.
- Parameters:
schema – Dictionary mapping feature names to target Polars DataTypes (e.g.
{"group": pl.Categorical}).- Raises:
TypeError – If schema is not a dict.
KeyError – If a feature name does not exist in the current view.
- cast_to_datetime(unit: str = 'us', time_zone: str | None = None) None[source]#
Casts time columns to Datetime across all linked sequence pools.
All sequence stores are guaranteed to share the same temporal schema (enforced at build time), so a single probe against the trajectory store is sufficient - exactly like
cast_id(). The cast is stored in the trajectory-level recipe and re-propagated to every pool on nextsequence_poolsaccess.- Parameters:
unit – Datetime resolution (
"ms","us","ns"). Default is"us"(microsecond).time_zone – Optional timezone string (e.g.
"UTC","Europe/Paris").
- Raises:
ValueError – If unit is not one of the accepted values.
TypeError – If the cast is incompatible with the temporal data.
- cast_to_timestep(dtype: DataType = Int64) None[source]#
Casts time columns to numeric-based timesteps across all linked sequence pools.
All sequence stores are guaranteed to share the same temporal schema (enforced at build time), so a single probe against the trajectory store is sufficient - exactly like
cast_id(). The cast is stored in the trajectory-level recipe and re-propagated to every pool on nextsequence_poolsaccess.- Parameters:
dtype – Target numeric type (e.g.
pl.UInt32,pl.Int64,pl.Float64). Default ispl.Int64.- Raises:
TypeError – If dtype is not a numeric type, or if the temporal data is already in Datetime format.
- copy() TrajectoryPool[source]#
Return a shallow copy sharing the same store, with all view state preserved.
The new pool references the same
TrajectoryStoreand the same virtual context (_virtual_id) so virtual features are immediately visible.- Returns:
A new
TrajectoryPoolwith identical settings, casts, masks and virtual context.
Note
Chaining with
save()produces a fully independent pool at a new path without mutating the original instance:pool2 = TrajectoryPool(store=pool.copy().save("other_path"))
Use this when you need both the original and a snapshot at a new destination.
pool.save("other_path")alone would redirect pool itself to"other_path".See also
- describe(by_id: bool = True, add_to_static: bool = False, separator: str = '_', fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True) DataFrame | DataFrame[source]#
Compute summary statistics across all sequences and all trajectories.
- Parameters:
by_id – If
True(default), return one row per trajectory. IfFalse, return cross-trajectory pandas.describe().add_to_static – If
True, persist the per-ID result viaadd_static_features(). Ignored (with a warning) whenby_id=False.separator – Separator between alias and metric name (default
_).fmt –
"pandas"(default) or"polars".use_arrow – Use Arrow extension arrays for polars -> pandas conversion.
- Returns:
DataFrame with columns
[id, n_sequences, {alias}{sep}length, …].
Examples:
traj_pool.describe() traj_pool.describe(separator=".") traj_pool.describe(by_id=False) traj_pool.describe(add_to_static=True)
- drop_sequence_pools(*aliases: str) None[source]#
Hides one or more store aliases from this view.
The underlying
TrajectoryStoreis not modified. Only the pool’s visible aliases (and derived properties liketrajectory_indexandunique_ids) are affected.- Parameters:
aliases – One or more alias names to hide.
- Raises:
RuntimeError – If the pool is not built yet.
KeyError – If an alias does not exist in the store.
- drop_static_features(features: list[str] | str, *, permanently: bool = False) None[source]#
Removes static features from the view (and optionally from disk).
By default this is a soft drop: features are removed from the settings so they no longer appear in
static_data(), but the underlying data is left untouched.With
permanently=Truethe columns are also deleted from disk / virtual context (irreversible).- Parameters:
features – Feature name(s) to drop.
permanently – If
True, also remove from disk/virtual.
- extend(other: TrajectoryPool | Trajectory, destination: str | Path | None = None, *, on_duplicate: Literal['raise', 'skip'] = 'raise', overwrite: bool = False) TrajectoryPool[source]#
Merge other into this trajectory pool and write the result to disk.
Mirrors the semantics of
save().Same-store fast path - if both trajectory pools share
_store.root_pathand neither carries virtual content (_virtual_id is Noneon both sides), no I/O is performed. A new pool backed by the same store with the union of ID masks is returned immediately. If destination is provided the merged pool is materialised viasave(); otherwise it is returned as an in-memory view with zero I/O.Different stores (or virtual content present) - destination is required. For each alias in this pool,
extend()is called on the corresponding sub-pools; the results are assembled into a new trajectory store via the builder. Passdestination=self._store.root_pathwithoverwrite=Trueto rewrite in-place.- Parameters:
other – Trajectory pool or single
Trajectoryto merge.destination –
None→ in-memory view (same-store fast path only; no I/O);str/Path→ materialise the merged data to disk. destination is required when merging from different stores.on_duplicate –
Behaviour when other contains a trajectory ID already present in this pool:
"raise"(default): raiseValueError."skip": silently ignore duplicates.
overwrite – Allows overwriting an existing destination when it already exists on disk.
- Returns:
Always a new
TrajectoryPool- neverself.- Raises:
TypeError – If other is not a
TrajectoryPoolorTrajectory.TypeError – If a sub-pool has an incompatible ID dtype or temporal schema.
ValueError – If a sub-pool in other is missing features present in the corresponding sub-pool of self.
ValueError – If
on_duplicate="raise"and duplicate IDs are found.ValueError – If
destination=Noneand stores differ.FileExistsError – If destination exists and
overwrite=False.
Note
Aliases present in other but absent from
selfare silently ignored (logged atWARNING). Aliases present inselfbut absent from other are carried over unchanged.
- filter_entities(criterion: Criterion, *, alias: str, inplace: bool = False, verbose: bool = True) TrajectoryPool[source]#
Return a new TrajectoryPool view with entities filtered by criterion.
- Parameters:
criterion – A
Criterioninstance.alias – Sequence alias to apply the criterion on.
inplace – If
True, modify this pool’s in place instead of returning a new view.verbose – If
True, print a one-line report.
- Returns:
A new
TrajectoryPoolview with the criterion applied, orselfif inplace isTrue.- Raises:
TypeError – If criterion is not a Criterion object.
CriterionLevelError – If the criterion is incompatible with entity filtering.
- get_trajectories(static_features: list[str] | None = None, aliases: list[str] | None = None) dict[str, Trajectory][source]#
All visible
Trajectoryinstances, keyed by ID.Materialises every trajectory reachable through the current view (respecting
_id_maskand_alias_mask). Useful for iteration-heavy workflows where the same trajectory is accessed multiple times.- Parameters:
static_features – Static features to expose in each
Trajectory.None→ use the pool-level setting.[]→ no static features.aliases – Sequence-store aliases to expose in each
Trajectory.None→ use the pool-level alias mask. Must be a subset of the pool’s visible aliases.
- Raises:
KeyError – If any alias in aliases is not visible in the current pool view.
- property is_dirty: bool[source]#
Trueif the pool (or any linked sequence pool) has unsaved state.Trajectory-level: virtual features, ID mask, type casts, soft drops. Sub-pool level: delegates to each pool’s
is_dirty.A dirty pool needs
save()with a destination to materialise all pending changes (sub-pool changes cannot be saved in-place).
- save(destination: str | Path | None = None, *, overwrite: bool = False, deep: bool = False) Path[source]#
Persists the current pool state to disk.
Trajectory-level (
trajectory_index.arrow,static_features.arrow) is always written, with ID and static casts baked in.Sequence pools - persisted according to their state:
Modified pools (virtual features or casts) are always saved to
destination/stores/<alias>/, regardless of deep.Unmodified pools: copied when
deep=True; referenced by absolute path whendeep=False.
For in-place saves the linked sequence stores are never touched (they may be shared with other pool instances).
Without destination the trajectory-level files are rewritten in-place. With destination a copy is created; the original is untouched.
- Parameters:
destination – Where to save. Can be: -
None→ in-place, - a workspace store name (no/or\), - or a filesystemPath/ path string. Passing a path that resolves to the current store root is equivalent toNone(treated as in-place).overwrite – Required when saving in-place (trajectory-level files will be overwritten). Also allows overwriting an existing destination. Each dirty sub-pool is saved in-place at its current location with
overwrite=Trueautomatically.deep – If
True, all sequence stores (including unmodified ones) are copied todestination/stores/<alias>/. WhenFalse(default), only modified stores are materialised there; the rest are kept as absolute links. Ignored when saving in-place.
- Returns:
The
Pathof the written store - the in-place root when destination isNone, otherwise the resolved destination path. Useful for chaining:pool2 = TrajectoryPool(store=pool.save("my_trajectories"))
- Raises:
RuntimeError – If saving in-place without
overwrite=True.FileExistsError – If destination already exists and overwrite is
False.
Note
This method mutates
self: after the call, the pool is redirected to destination (its store, masks and virtual context are all reset to the written state). To keep the original instance unchanged while creating an independent copy elsewhere, usecopy()first:pool2 = TrajectoryPool(store=pool.copy().save("other_path")) # pool is still pointing to its original store
See also
Warning
Saving in-place with dirty sequence pools will overwrite those stores at their current location (
stores/<alias>/within the trajectory root). If the stores are shared with other pool instances those instances will also reflect the changes.Note
With
deep=Trueall links are relative - suitable for archiving or transfer. Withdeep=False, absolute links to unchanged stores are not portable across machines.
- property sequence_pools: MappingProxyType[source]#
Visible
SequencePoolinstances, keyed by alias.Returns a read-only mapping filtered by the current alias mask. Direct item assignment (e.g.
tpool.sequence_pools[alias] = …) raisesTypeError- usesubset()ordrop_sequence_pools()to change the visible pools.
- set_t0(*, position: int | None = None, direct: datetime | date | int | float | None | dict[Any, datetime | date | int | float | None] = None, feature: str | None = None, query: Expr | None = None, anchor: Literal['start', 'end', 'middle'] | None = None, use_first: bool = True, on: str | None = None) TrajectoryPool[source]#
Configure the T0 strategy for this trajectory pool.
Builds a
T0Settervia the registry and delegates tosetter.compute_from_trajectory(self, on=on). The setter stores the resulting[id_col, _T0_]DataFrame; per-alias nearest ranks are computed lazily in_get_traj_t0_df().- Parameters:
position – Row index (0-based; negative indexing supported).
direct – Scalar value or
{traj_id: value}dict.feature – Trajectory-level static feature column name.
query – Polars boolean expression evaluated on the reference sub-pool’s columns.
anchor –
"start"/"end"/"middle"for interval/state pools.use_first – For the query strategy only.
on – Alias of the sub-pool used to compute T0. Required for
positionandquerystrategies. Ignored (with warning) fordirectandfeature.
- Returns:
selffor chaining.- Raises:
TypeError – If
onis missing forposition/query.KeyError – If
onrefers to an alias not visible in this pool.
- static_data(features: list[str] | str | None = None, fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True) DataFrame | DataFrame | None[source]#
Return trajectory-level static data for visible trajectories.
- Parameters:
features – Static feature name(s) to include.
None-> all visible static features.fmt –
"pandas"(default) or"polars".use_arrow – Use Arrow extension arrays for polars -> pandas conversion.
- Returns:
One-row-per-trajectory DataFrame with columns
[id, feature...].Nonewhen no static features are exposed by this pool view.
To restrict to a subset of IDs, use
pool.subset(ids).static_data().
- subset(ids, *, inplace: bool = False) TrajectoryPool[source]#
Return a view restricted to the given trajectory IDs.
All IDs must be present in the current
unique_ids(i.e. they must pass the existing mask, if any). The new view inherits the full pool state (casts, virtual features, alias mask).- Parameters:
ids – Trajectory ID(s) to keep. A single value is accepted and treated as a one-element list.
inplace – If
True, modify this pool in-place rather than returning a new instance.
- Returns:
A
TrajectoryPoolrestricted to ids (orselfwhen inplace=True).- Raises:
KeyError – If any ID is not present in
unique_ids.
- t0_data(fmt: Literal['pandas', 'polars'] = 'pandas', use_arrow: bool = True) DataFrame | DataFrame[source]#
Return the T0 table for all visible trajectories.
Columns:
[id_col, _T0_, <alias1>_T0_NEAREST_RANK_, ...]. Each alias gets its own nearest-rank column because the floor lookup depends on the alias-specific temporal index.- Parameters:
fmt –
"pandas"(default) or"polars".use_arrow – Use Arrow extension arrays for polars -> pandas conversion.
- Returns:
One row per visible trajectory ID.
- to_tensor(features: dict[str, list[str] | str], bin_size: str | int | float, max_bins: int | None = None, fill_value: Any = None, overlap_rule: str = 'first', ohe: bool = False, bin_col: str = '__bin__') tuple[ndarray, list, list[str]][source]#
Project all aliases onto a single 3-D tensor with prefixed labels.
The K axis stacks features from every alias, ordered by the iteration order of
featuresthen by feature order within each alias. The returnedfeature_nameslist mirrors that ordering exactly.For a long-format dataframe variant (joins, plotting, exploration), see
binned_data().- Parameters:
features – Same as
binned_data().bin_size – Same semantics as
binned_data().max_bins – Same semantics as
binned_data().fill_value – Same semantics as
binned_data().overlap_rule – Same semantics as
binned_data().ohe – Same semantics as
binned_data().bin_col – Same semantics as
binned_data().
- Returns:
arrhas shape(N, M, K)=(len(unique_ids), n_bins, len(feature_names)).idsis the trajectory ID sequence matching the N-axis order (identical tounique_ids).feature_nameslists the K-axis labels in column order, prefixed"{alias}_{feat}".
- Return type:
A 3-tuple
(arr, ids, feature_names)where
Examples:
arr, ids, names = tpool.to_tensor( {"drugs": "dose", "labs": ["hb", "wbc"]}, "1d" ) # names == ["drugs_dose", "labs_hb", "labs_wbc"] # arr.shape == (N, M, 3)
- train_test_split(*, test_size: float | int | None = None, train_size: float | int | None = None, random_state: int | None = None, shuffle: bool = True) tuple[TrajectoryPool, TrajectoryPool][source]#
Split the pool into train and test subsets.
Mirrors the interface of
sklearn.model_selection.train_test_split().- Parameters:
test_size – Proportion (
floatin(0, 1)) or absolute count (int) of samples for the test subset. Defaults to0.25when both test_size and train_size areNone.train_size – Proportion (
floatin(0, 1)) or absolute count (int) of samples for the train subset. Defaults to the complement of test_size.random_state – Seed for the random number generator. Pass an integer for reproducibility.
shuffle – Whether to shuffle IDs before splitting. When
False, the first IDs go to train and the last to test.
- Returns:
(train_pool, test_pool)- two new non-overlapping pool views.- Raises:
ValueError – If the pool is empty, sizes are non-positive, or
n_train + n_testexceeds the pool size.
- property unique_ids: list[source]#
Visible trajectory IDs as a plain Python list.
Respects
_id_mask.Warning
listerases rich Polars dtypes. Prefer_id_lfwhen the result feeds a Polars join.
- which(criterion: Criterion, *, verbose: bool = True) set[source]#
Return the set of IDs in this pool satisfying criterion.
- Parameters:
criterion – A
Criterioninstance.verbose – If
True, print a one-line report.
- Returns:
Set of matching IDs.
- Raises:
TypeError – If criterion is not a Criterion object.
CriterionLevelError – If the criterion is incompatible with Trajectory level.
- class tanat.trajectory.TrajectorySettings(*, id_column: str = '_traj_id', static_features: list[str] = <factory>)[source]#
Bases:
objectView-layer settings for a
TrajectoryPool.- static_features[source]#
Feature names visible in
static_data().Nonemeans no static features exposed (the default untiladd_static_featuresis called).- Type:
list[str]
- get_column_rename_map() dict[str, str][source]#
Returns the mapping from store-internal column names to user-facing names.
Currently only the trajectory-ID column is renamed:
_traj_id→id_column.
- model_dump(*, mode='python', **dump_kwargs)[source]#
Dump settings to a dict via Pydantic serialization.
- validate_features(features: list[str] | str, *, on_missing: str = 'raise') list[str][source]#
Validates explicit feature names against the current settings.
- Parameters:
features – Feature name(s) to validate.
on_missing –
"raise"(default),"warn"or"ignore".
- Returns:
List of validated feature names.
- Raises:
KeyError – If
on_missing="raise"and a feature is missing.
- tanat.trajectory.build_trajectories(pools: dict[str, SequencePool], *, static_data: pd.DataFrame | pl.DataFrame | pl.LazyFrame | None = None, id_column: str | None = None, store_name: str | None = None) TrajectoryPool[source]#
Build a
TrajectoryPoolfrom a dict of pre-built sequence pools.- Parameters:
pools – Mapping of
{alias: SequencePool}. Each alias becomes the key used to access the sub-sequence inside a trajectory (e.g.traj["admissions"]).static_data – Optional DataFrame or LazyFrame with per-trajectory static features. When provided,
id_columnmust also be set.id_column – Name of the id column in
static_data. Required whenstatic_datais notNone. Ignored otherwise.store_name – Name for the on-disk store. When
Nonea unique name is generated automatically (_quick_trajectory_<hex8>).
- Returns:
A ready-to-use
TrajectoryPool.- Raises:
ValueError – If
static_datais provided withoutid_column, or ifid_columnis absent fromstatic_data.
Examples:
tpool = build_trajectories( pools={"admissions": adm_pool, "procedures": proc_pool}, ) tpool[tpool.unique_ids[0]]["admissions"].temporal_data(fmt="polars")