tanat.store.sequence package#

Subpackages#

Submodules#

tanat.store.sequence.schema module#

Centralised column-name constants for the Sequence Store layout.

Every internal column name used across the store, pool, sequence and entity layers is defined here. Import from this module instead of hard-coding strings.

class tanat.store.sequence.schema.StoreSchema[source]#

Bases: object

Internal column names used by the store layer.

class Files[source]#

Bases: object

Physical file names for a sequence store on disk.

CORE = 'core.json'[source]#
ENTITY_FEATURES = 'entity_features.arrow'[source]#
METADATA = 'metadata.json'[source]#
SEQUENCE_INDEX = 'sequence_index.arrow'[source]#
STATIC_FEATURES = 'static_features.arrow'[source]#
TIME_INDEX = 'time_index.arrow'[source]#
LENGTH: Final[str] = 'length'[source]#
OFFSET: Final[str] = 'offset'[source]#
SEQ_ID: Final[str] = '_seq_id'[source]#
STORE_INDEX: Final[str] = '__store_idx__'[source]#
T_END: Final[str] = '_t_end'[source]#
T_EVENT: Final[str] = '_t_event'[source]#
T_START: Final[str] = '_t_start'[source]#
classmethod internal_columns() frozenset[str][source]#

All internal (non-feature) column names.

classmethod time_index_columns() list[str][source]#

All possible time index column names, in canonical order (start → end → event).

tanat.store.sequence.store module#

Sequence Store Base Class.

class tanat.store.sequence.store.SequenceStore(root_path: str | Path)[source]#

Bases: BaseStore

Sequence store.

Delegates virtual (temporary) feature storage to a VirtualStore and inherits shared I/O helpers from BaseStore (which itself inherits StaticStoreMixin).

__init__(root_path: str | Path) None[source]#

Initialise the store (lazy loading).

Parameters:

root_path – Root directory of the store.

add_entity_features(virtual_id: str, df: DataFrame | LazyFrame | DataFrame) list[str][source]#

Add positional entity features to a virtual store context.

Computes the expected row count from the time index and delegates height validation to VirtualStore.add_entity_features().

Parameters:
  • virtual_id – Must already exist.

  • df – Feature-only DataFrame positionally aligned with entity rows.

Returns:

The list of column names written.

copy_to(target: Path, type_name: str, *, exist_ok: bool = False) None[source]#

Copy all store files to target, writing type_name into core.json.

Fast-path used by save() when no transformation is needed (no virtual features, no masks, no casts). type_name is always provided by the pool via get_registration_name() - never copied verbatim from the physical core.json (avoids the silent-corruption bug where a converted pool would persist the wrong type).

Parameters:
  • target – Destination directory (must already exist or be creatable).

  • type_name – Sequence type to write into core.json.

  • exist_ok – If True, allows the target directory to already exist.

drop_features(features: list[str], is_static: bool = False, virtual_id: str | None = None) None[source]#

Removes feature columns from disk.

The full list is sent to both physical and virtual stores; each one silently ignores columns it doesn’t own.

Parameters:
  • features – Column names to remove.

  • is_static – Static or entity features.

  • virtual_id – Optional virtual context.

entity(virtual_id: str | None = None, *, with_store_index: bool = False) LazyFrame[source]#

Entity feature rows (physical + virtual), without seq_id.

Virtual features take precedence: any physical column whose name is also present in the virtual context is silently shadowed, so the virtual value is always returned.

Parameters:

with_store_index – When True, prepends SCH.STORE_INDEX (the absolute physical row position) to the result.

entity_features(virtual_id: str | None = None) list[str][source]#

List of entity feature column names (physical + virtual when virtual_id is set).

get_id_lf(*, explode: bool = False, with_store_index: bool = False) LazyFrame[source]#

Sequence IDs as a single-column lazy frame.

Parameters:
  • explode – When False (default), returns one row per sequence (unique IDs). When True, expands each ID by its entity count so the result is row-aligned with entity() and get_time_index().

  • with_store_index – When True, prepend a SCH.STORE_INDEX column with the absolute physical row index. Only meaningful when explode=True.

Returns:

A polars.LazyFrame with a single column of sequence IDs (and optionally a leading SCH.STORE_INDEX column).

get_id_time_index(virtual_id: str | None = None, *, with_store_index: bool = False) LazyFrame[source]#

Returns seq_id + time-index columns only (no entity features).

Cheaper than get_temporal_data() when entity features are not needed.

Parameters:

with_store_index – When True, prepends SCH.STORE_INDEX (the absolute physical row position in the store) to the result.

get_sequence_type() str[source]#

Returns the sequence type declared in core.json.

get_temporal_data(virtual_id: str | None = None, *, with_store_index: bool = False) LazyFrame[source]#

Returns the full temporal data: seq_id + time index + entity features.

Parameters:

with_store_index – When True, prepends SCH.STORE_INDEX (the absolute physical row position in the store) to the result.

get_time_index(virtual_id: str | None = None, *, with_store_index: bool = False) LazyFrame[source]#

Returns time-index columns only.

Cheaper than get_id_time_index() when the sequence ID column is not needed.

Parameters:
  • virtual_id – Optional virtual context identifier.

  • with_store_index – When True, appends SCH.STORE_INDEX (the 0-based absolute row position in the physical store).

Returns:

A polars.LazyFrame of the time-index columns (plus SCH.STORE_INDEX when requested).

property n_entities: int[source]#

Total number of entity rows in the physical store.

Computed by summing the length column of the sequence index.

property seq_id_col: str[source]#

Internal name of the sequence ID column.

property seq_id_dtype: DataType[source]#

Physical dtype of the sequence ID column (IPC header read, no data scan).

property sequence_index: LazyFrame[source]#

Navigation index (seq_id, offset, length) - physical, no cast overlay.

structural_columns(is_static: bool = False, virtual_id: str | None = None) list[str][source]#

Returns the structural column names for a data access call.

Always includes the sequence ID column. For entity data (non-static) also includes the time columns actually present in this store’s time index (physical or virtual).

time_index(virtual_id: str | None = None, *, with_store_index: bool = False) LazyFrame[source]#

Time-index rows (_t_event or _t_start / _t_end), with optional virtual override.

When virtual_id is given and tmp/<virtual_id>/time_index.arrow exists, the virtual time index is returned instead of the physical one (full replacement - the whole temporal structure changes during type conversions). Falls back to the physical file when the virtual override is absent.

When virtual_id is None, always returns the physical time index.

Parameters:
  • virtual_id – Optional virtual context identifier.

  • with_store_index – When True, prepends SCH.STORE_INDEX (the absolute physical row position) to the result.

Returns:

A polars.LazyFrame of the time-index rows.

static write_core_json(path: Path, sequence_type: str, n_sequences: int, n_entities: int) None[source]#

Writes core.json - static store facts set once at build time.

write_virtual_time_index(virtual_id: str, time_index_lf: LazyFrame) None[source]#

Write a virtual time-index override for virtual_id.

Creates tmp/<virtual_id>/ if it does not exist, then writes time_index_lf as time_index.arrow there. A subsequent call to time_index() with the same virtual_id will return this override instead of the physical time index.

Parameters:
  • virtual_id – Virtual context identifier (a UUID string).

  • time_index_lf – LazyFrame containing the new time columns (_t_start + _t_end for period types, or _t_event for event types).

Module contents#

Package stub.