tanat.store.trajectory package#
Submodules#
tanat.store.trajectory.builder module#
TrajectoryStoreBuilder: fluent builder for creating a TrajectoryPool.
- class tanat.store.trajectory.builder.TrajectoryStoreBuilder[source]#
Bases:
DisplayMixinFluent builder for constructing a
TrajectoryPool.Each
add()call registers aSequencePoolunder an alias. Callbuild()to write the trajectory store to disk and obtain the resolved path - symmetric withSequenceStoreBuilder.Usage:
store_path = ( TrajectoryPool.builder() .add("admissions", admissions_pool) .add("pharmacy", pharmacy_pool) .build("my_trajectories") ) pool = TrajectoryPool(store=store_path)
- add(alias: str, pool: SequencePool, *, overwrite: bool = False) TrajectoryStoreBuilder[source]#
Register a
SequencePoolunder alias.- Parameters:
alias – Short name (e.g.
"admissions").pool – A
SequencePoolinstance.overwrite – If
True, replaces an existing alias silently.
- Returns:
selffor method chaining.- Raises:
TypeError – If pool is not a SequencePool.
ValueError – If the alias is already registered and overwrite is
False.TypeError – If the pool’s schema (ID dtype or time index type) is incompatible with already-registered pools.
- add_csv(path, *, id_column: str, features: list[str], **reader_kwargs) TrajectoryStoreBuilder[source]#
Register a CSV file as static trajectory features.
- add_dataframe(data, *, id_column: str, features: list[str]) TrajectoryStoreBuilder[source]#
Register an in-memory Polars / Pandas DataFrame as static trajectory features.
- add_parquet(path, *, id_column: str, features: list[str], **reader_kwargs) TrajectoryStoreBuilder[source]#
Register a Parquet file (glob patterns supported) as static trajectory features.
- add_sql(connection: str, query: str, *, id_column: str, features: list[str], **sql_kwargs) TrajectoryStoreBuilder[source]#
Register a SQL query as static trajectory features (requires
connectorx).
- build(store_path: str | Path, *, exist_ok: bool = False) Path[source]#
Persist the trajectory store to store_path.
Resolves store_path via the workspace (bare name → workspace directory), then writes
core.json,trajectory_index.arrow, andmetadata.json.- Parameters:
store_path – Destination directory. Can be a workspace store name (no
/), a relative path, or an absolute path.exist_ok – If
True, overwrites an existing store on disk.
- Returns:
The resolved
Pathto the written store directory.- Raises:
RuntimeError – If no pools have been registered.
FileExistsError – If the store exists and
exist_ok=False.
- build_from_frames(store_path: str | Path, traj_idx: LazyFrame, static_lf: LazyFrame | None, links: dict[str, str], *, exist_ok: bool = False) Path[source]#
Write trajectory store files from pre-prepared LazyFrames and links.
- Parameters:
store_path – Destination directory.
traj_idx – Trajectory index frame (
TRAJ_ID+ bool presence columns), already filtered and cast.static_lf – Optional static frame including the
TRAJ_IDcolumn, already filtered and cast. The frame is aligned to traj_idx via a left-join before writing - guaranteeing row order regardless of the frame’s input order.Noneif no static features.links –
{alias: path_string}mapping written verbatim tocore.json.exist_ok – If
True, the destination may already exist.
- Returns:
The resolved
Pathto the written store directory.
tanat.store.trajectory.schema module#
Centralised column-name constants for the Trajectory Store layout.
tanat.store.trajectory.store module#
Trajectory Store: persistent storage for trajectory data.
Layout:
<root>/
├── core.json ← source of truth: {alias: relative_path}
├── metadata.json ← trajectory-level metadata (ID dtype + static features)
├── trajectory_index.arrow ← _traj_id + bool presence columns
├── static_features.arrow ← (optional) trajectory-level features
├── stores/ ← materialised stores (from masked pools)
└── tmp/ ← virtual feature contexts
└── <virtual_id>/
└── static_features.arrow
- class tanat.store.trajectory.store.TrajectoryStore(root_path: str | Path)[source]#
Bases:
BaseStorePersistent storage for a
TrajectoryPool.Manages on disk:
store_links.json:{alias: relative_path}(user-editable)trajectory_index.arrow: presence-mapstatic_features.arrow: trajectory-level features
- copy_to(target: Path, *, exist_ok: bool = False) None[source]#
Copy all trajectory store files to target.
Fast-path used by
save()when no transformation is needed (no virtual features, no masks, no casts).Copies only the physical store files -
tmp/(virtual feature contexts) is intentionally skipped so the destination starts with a clean slate.stores/(materialised sub-stores) is copied when present so that relative store links incore.jsonremain valid at the new location.- Parameters:
target – Destination directory (must not exist, or
exist_ok=True).exist_ok – If
True, allows the target directory to already exist.
- get_frames_for_save(id_mask: set | None, cast_recipe: TrajectoryCastRecipe | None, virtual_id: str | None, features: list[str] | None = None) tuple[pl.LazyFrame, pl.LazyFrame | None][source]#
Prepare trajectory-level frames ready to be handed to the builder.
Centralises all filtering and casting logic that was previously scattered inside
save().- Parameters:
id_mask – Optional set of trajectory IDs to keep (
None= all).cast_recipe –
TrajectoryCastRecipewhoseidandstaticfields are applied.virtual_id – Virtual context UUID for merged static features (
Noneif no virtual features).features – Static feature names to materialise.
Nonekeeps all available columns. Passsettings.static_featuresto materialise soft drops (columns absent from the list are excluded from the written frame).
- Returns:
traj_idx is the trajectory-index LazyFrame with filters and ID cast applied.
static_lf includes the
TRAJ_IDcolumn (required bybuild_from_frames()for left-join alignment), orNonewhen no feature columns exist after filtering.
- Return type:
(traj_idx, static_lf)where
- get_id_lf() LazyFrame[source]#
All trajectory IDs as a single-column lazy frame.
Preserves the physical dtype — stays lazy until collected.
- property sequence_stores: dict[str, SequenceStore][source]#
Linked
SequenceStoreinstances, keyed by alias.
- property traj_id_dtype: DataType[source]#
Physical dtype of the trajectory ID column (IPC header read, no data scan).
Module contents#
Package stub.