tanat.store.trajectory package#

Submodules#

tanat.store.trajectory.builder module#

TrajectoryStoreBuilder: fluent builder for creating a TrajectoryPool.

class tanat.store.trajectory.builder.TrajectoryStoreBuilder[source]#

Bases: DisplayMixin

Fluent builder for constructing a TrajectoryPool.

Each add() call registers a SequencePool under an alias. Call build() to write the trajectory store to disk and obtain the resolved path - symmetric with SequenceStoreBuilder.

Usage:

store_path = (
    TrajectoryPool.builder()
    .add("admissions", admissions_pool)
    .add("pharmacy", pharmacy_pool)
    .build("my_trajectories")
)
pool = TrajectoryPool(store=store_path)
__init__() None[source]#
add(alias: str, pool: SequencePool, *, overwrite: bool = False) TrajectoryStoreBuilder[source]#

Register a SequencePool under alias.

Parameters:
  • alias – Short name (e.g. "admissions").

  • pool – A SequencePool instance.

  • overwrite – If True, replaces an existing alias silently.

Returns:

self for method chaining.

Raises:
  • TypeError – If pool is not a SequencePool.

  • ValueError – If the alias is already registered and overwrite is False.

  • TypeError – If the pool’s schema (ID dtype or time index type) is incompatible with already-registered pools.

add_csv(path, *, id_column: str, features: list[str], **reader_kwargs) TrajectoryStoreBuilder[source]#

Register a CSV file as static trajectory features.

add_dataframe(data, *, id_column: str, features: list[str]) TrajectoryStoreBuilder[source]#

Register an in-memory Polars / Pandas DataFrame as static trajectory features.

add_parquet(path, *, id_column: str, features: list[str], **reader_kwargs) TrajectoryStoreBuilder[source]#

Register a Parquet file (glob patterns supported) as static trajectory features.

add_sql(connection: str, query: str, *, id_column: str, features: list[str], **sql_kwargs) TrajectoryStoreBuilder[source]#

Register a SQL query as static trajectory features (requires connectorx).

build(store_path: str | Path, *, exist_ok: bool = False) Path[source]#

Persist the trajectory store to store_path.

Resolves store_path via the workspace (bare name → workspace directory), then writes core.json, trajectory_index.arrow, and metadata.json.

Parameters:
  • store_path – Destination directory. Can be a workspace store name (no /), a relative path, or an absolute path.

  • exist_ok – If True, overwrites an existing store on disk.

Returns:

The resolved Path to the written store directory.

Raises:
  • RuntimeError – If no pools have been registered.

  • FileExistsError – If the store exists and exist_ok=False.

build_from_frames(store_path: str | Path, traj_idx: LazyFrame, static_lf: LazyFrame | None, links: dict[str, str], *, exist_ok: bool = False) Path[source]#

Write trajectory store files from pre-prepared LazyFrames and links.

Parameters:
  • store_path – Destination directory.

  • traj_idx – Trajectory index frame (TRAJ_ID + bool presence columns), already filtered and cast.

  • static_lf – Optional static frame including the TRAJ_ID column, already filtered and cast. The frame is aligned to traj_idx via a left-join before writing - guaranteeing row order regardless of the frame’s input order. None if no static features.

  • links{alias: path_string} mapping written verbatim to core.json.

  • exist_ok – If True, the destination may already exist.

Returns:

The resolved Path to the written store directory.

tanat.store.trajectory.schema module#

Centralised column-name constants for the Trajectory Store layout.

class tanat.store.trajectory.schema.TrajectorySchema[source]#

Bases: object

Internal column names used by the trajectory store.

class Files[source]#

Bases: object

Physical file names for a trajectory store on disk.

CORE: Final[str] = 'core.json'[source]#
DIR_STORES: Final[str] = 'stores'[source]#
METADATA: Final[str] = 'metadata.json'[source]#
STATIC_FEATURES: Final[str] = 'static_features.arrow'[source]#
TRAJECTORY_INDEX: Final[str] = 'trajectory_index.arrow'[source]#
TRAJ_ID: Final[str] = '_traj_id'[source]#

tanat.store.trajectory.store module#

Trajectory Store: persistent storage for trajectory data.

Layout:

<root>/
├── core.json         ← source of truth: {alias: relative_path}
├── metadata.json         ← trajectory-level metadata (ID dtype + static features)
├── trajectory_index.arrow   ← _traj_id + bool presence columns
├── static_features.arrow    ← (optional) trajectory-level features
├── stores/                  ← materialised stores (from masked pools)
└── tmp/                     ← virtual feature contexts
    └── <virtual_id>/
        └── static_features.arrow
class tanat.store.trajectory.store.TrajectoryStore(root_path: str | Path)[source]#

Bases: BaseStore

Persistent storage for a TrajectoryPool.

Manages on disk:

  • store_links.json: {alias: relative_path} (user-editable)

  • trajectory_index.arrow: presence-map

  • static_features.arrow: trajectory-level features

__init__(root_path: str | Path) None[source]#
copy_to(target: Path, *, exist_ok: bool = False) None[source]#

Copy all trajectory store files to target.

Fast-path used by save() when no transformation is needed (no virtual features, no masks, no casts).

Copies only the physical store files - tmp/ (virtual feature contexts) is intentionally skipped so the destination starts with a clean slate. stores/ (materialised sub-stores) is copied when present so that relative store links in core.json remain valid at the new location.

Parameters:
  • target – Destination directory (must not exist, or exist_ok=True).

  • exist_ok – If True, allows the target directory to already exist.

get_frames_for_save(id_mask: set | None, cast_recipe: TrajectoryCastRecipe | None, virtual_id: str | None, features: list[str] | None = None) tuple[pl.LazyFrame, pl.LazyFrame | None][source]#

Prepare trajectory-level frames ready to be handed to the builder.

Centralises all filtering and casting logic that was previously scattered inside save().

Parameters:
  • id_mask – Optional set of trajectory IDs to keep (None = all).

  • cast_recipeTrajectoryCastRecipe whose id and static fields are applied.

  • virtual_id – Virtual context UUID for merged static features (None if no virtual features).

  • features – Static feature names to materialise. None keeps all available columns. Pass settings.static_features to materialise soft drops (columns absent from the list are excluded from the written frame).

Returns:

  • traj_idx is the trajectory-index LazyFrame with filters and ID cast applied.

  • static_lf includes the TRAJ_ID column (required by build_from_frames() for left-join alignment), or None when no feature columns exist after filtering.

Return type:

(traj_idx, static_lf) where

get_id_lf() LazyFrame[source]#

All trajectory IDs as a single-column lazy frame.

Preserves the physical dtype — stays lazy until collected.

property sequence_stores: dict[str, SequenceStore][source]#

Linked SequenceStore instances, keyed by alias.

property store_aliases: list[str][source]#

List of registered store aliases.

{alias: relative_path} from core.json.

property traj_id_col: str[source]#

Internal name of the trajectory ID column.

property traj_id_dtype: DataType[source]#

Physical dtype of the trajectory ID column (IPC header read, no data scan).

property trajectory_index: LazyFrame[source]#

Navigation index (_traj_id + bool presence columns) - physical, no cast overlay.

static write_core_json(path: Path, links: dict[str, str], *, n_trajectories: int | None = None) None[source]#

Writes core.json to path with the given links.

Module contents#

Package stub.