tanat.store.sequence package#
Subpackages#
Submodules#
tanat.store.sequence.schema module#
Centralised column-name constants for the Sequence Store layout.
Every internal column name used across the store, pool, sequence and entity layers is defined here. Import from this module instead of hard-coding strings.
tanat.store.sequence.store module#
Sequence Store Base Class.
- class tanat.store.sequence.store.SequenceStore(root_path: str | Path)[source]#
Bases:
BaseStoreSequence store.
Delegates virtual (temporary) feature storage to a
VirtualStoreand inherits shared I/O helpers fromBaseStore(which itself inheritsStaticStoreMixin).- __init__(root_path: str | Path) None[source]#
Initialise the store (lazy loading).
- Parameters:
root_path – Root directory of the store.
- add_entity_features(virtual_id: str, df: DataFrame | LazyFrame | DataFrame) list[str][source]#
Add positional entity features to a virtual store context.
Computes the expected row count from the time index and delegates height validation to
VirtualStore.add_entity_features().- Parameters:
virtual_id – Must already exist.
df – Feature-only DataFrame positionally aligned with entity rows.
- Returns:
The list of column names written.
- copy_to(target: Path, type_name: str, *, exist_ok: bool = False) None[source]#
Copy all store files to target, writing type_name into
core.json.Fast-path used by
save()when no transformation is needed (no virtual features, no masks, no casts). type_name is always provided by the pool viaget_registration_name()- never copied verbatim from the physicalcore.json(avoids the silent-corruption bug where a converted pool would persist the wrong type).- Parameters:
target – Destination directory (must already exist or be creatable).
type_name – Sequence type to write into
core.json.exist_ok – If True, allows the target directory to already exist.
- drop_features(features: list[str], is_static: bool = False, virtual_id: str | None = None) None[source]#
Removes feature columns from disk.
The full list is sent to both physical and virtual stores; each one silently ignores columns it doesn’t own.
- Parameters:
features – Column names to remove.
is_static – Static or entity features.
virtual_id – Optional virtual context.
- entity(virtual_id: str | None = None, *, with_store_index: bool = False) LazyFrame[source]#
Entity feature rows (physical + virtual), without seq_id.
Virtual features take precedence: any physical column whose name is also present in the virtual context is silently shadowed, so the virtual value is always returned.
- Parameters:
with_store_index – When
True, prependsSCH.STORE_INDEX(the absolute physical row position) to the result.
- entity_features(virtual_id: str | None = None) list[str][source]#
List of entity feature column names (physical + virtual when virtual_id is set).
- get_id_lf(*, explode: bool = False, with_store_index: bool = False) LazyFrame[source]#
Sequence IDs as a single-column lazy frame.
- Parameters:
explode – When
False(default), returns one row per sequence (unique IDs). WhenTrue, expands each ID by its entity count so the result is row-aligned withentity()andget_time_index().with_store_index – When
True, prepend aSCH.STORE_INDEXcolumn with the absolute physical row index. Only meaningful whenexplode=True.
- Returns:
A
polars.LazyFramewith a single column of sequence IDs (and optionally a leadingSCH.STORE_INDEXcolumn).
- get_id_time_index(virtual_id: str | None = None, *, with_store_index: bool = False) LazyFrame[source]#
Returns seq_id + time-index columns only (no entity features).
Cheaper than
get_temporal_data()when entity features are not needed.- Parameters:
with_store_index – When
True, prependsSCH.STORE_INDEX(the absolute physical row position in the store) to the result.
- get_temporal_data(virtual_id: str | None = None, *, with_store_index: bool = False) LazyFrame[source]#
Returns the full temporal data: seq_id + time index + entity features.
- Parameters:
with_store_index – When
True, prependsSCH.STORE_INDEX(the absolute physical row position in the store) to the result.
- get_time_index(virtual_id: str | None = None, *, with_store_index: bool = False) LazyFrame[source]#
Returns time-index columns only.
Cheaper than
get_id_time_index()when the sequence ID column is not needed.- Parameters:
virtual_id – Optional virtual context identifier.
with_store_index – When
True, appendsSCH.STORE_INDEX(the 0-based absolute row position in the physical store).
- Returns:
A
polars.LazyFrameof the time-index columns (plusSCH.STORE_INDEXwhen requested).
- property n_entities: int[source]#
Total number of entity rows in the physical store.
Computed by summing the
lengthcolumn of the sequence index.
- property seq_id_dtype: DataType[source]#
Physical dtype of the sequence ID column (IPC header read, no data scan).
- property sequence_index: LazyFrame[source]#
Navigation index (seq_id, offset, length) - physical, no cast overlay.
- structural_columns(is_static: bool = False, virtual_id: str | None = None) list[str][source]#
Returns the structural column names for a data access call.
Always includes the sequence ID column. For entity data (non-static) also includes the time columns actually present in this store’s time index (physical or virtual).
- time_index(virtual_id: str | None = None, *, with_store_index: bool = False) LazyFrame[source]#
Time-index rows (
_t_eventor_t_start/_t_end), with optional virtual override.When virtual_id is given and
tmp/<virtual_id>/time_index.arrowexists, the virtual time index is returned instead of the physical one (full replacement - the whole temporal structure changes during type conversions). Falls back to the physical file when the virtual override is absent.When virtual_id is
None, always returns the physical time index.- Parameters:
virtual_id – Optional virtual context identifier.
with_store_index – When
True, prependsSCH.STORE_INDEX(the absolute physical row position) to the result.
- Returns:
A
polars.LazyFrameof the time-index rows.
- static write_core_json(path: Path, sequence_type: str, n_sequences: int, n_entities: int) None[source]#
Writes
core.json- static store facts set once at build time.
- write_virtual_time_index(virtual_id: str, time_index_lf: LazyFrame) None[source]#
Write a virtual time-index override for virtual_id.
Creates
tmp/<virtual_id>/if it does not exist, then writes time_index_lf astime_index.arrowthere. A subsequent call totime_index()with the same virtual_id will return this override instead of the physical time index.- Parameters:
virtual_id – Virtual context identifier (a UUID string).
time_index_lf – LazyFrame containing the new time columns (
_t_start+_t_endfor period types, or_t_eventfor event types).
Module contents#
Package stub.