tanat.store.sequence.builder package#
Subpackages#
Submodules#
tanat.store.sequence.builder.base module#
SequenceStoreBuilder: fluent base class for constructing a SequenceStore.
Each add_* call declares the source-local column names; the builder
renames them directly to SCH.* internal names before writing.
No intermediate canonical layer exists - column naming for display is
a View-layer concern handled by Pool/Sequence settings.
- class tanat.store.sequence.builder.base.SequenceStoreBuilder[source]#
Bases:
ABC,Registrable,DisplayMixinFluent builder that accumulates data sources then writes a
SequenceStore.Each
add_*call declares:id_column- which source column is the sequence IDtime index kwargs - which column(s) are the time index dimension
features- feature columns to extract
The builder renames every source directly to
SCH.*internal names, concatenates/joins all sources, then runs the write pipeline inline.Obtain an instance via
SequencePool.builder()(recommended).- abstractmethod add_csv(path, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#
Register a CSV file as a source.
- abstractmethod add_dataframe(data, *, id_column: str, features: list[str], is_static: bool = False, **time_index_kwargs) SequenceStoreBuilder[source]#
Register an in-memory Polars / Pandas DataFrame.
- abstractmethod add_parquet(path, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#
Register a Parquet file (glob patterns supported).
- abstractmethod add_sql(connection: str, query: str, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#
Register a SQL query (requires
connectorx).
- build(store_path: str | Path, *, exist_ok: bool = False) Path[source]#
Run the write pipeline and persist the store to store_path.
- Parameters:
store_path – Destination directory for the store.
exist_ok – Overwrite an existing store if
True.
- Returns:
The resolved
Pathto the written store directory.- Raises:
FileExistsError – If the store exists and
exist_ok=False.ValueError – If no entity source is registered.
- build_from_frames(store_path: str | Path, entity_lf: LazyFrame, static_lf: LazyFrame | None = None, *, presorted: bool = False, exist_ok: bool = False) Path[source]#
Write a store directly from pre-prepared LazyFrames.
Unlike
build(), this method bypasses source registration and_to_internal()renaming. The caller provides frames already inSCH.*internal names:entity_lf:SEQ_ID | time_cols | feature_colsstatic_lf:SEQ_ID | static_cols(optional)
Intended for
save()so that the pool can prepare its frames (filter, cast, virtual merge) and delegate all I/O to the builder - keeping stores as read-only objects.- Parameters:
store_path – Destination directory.
entity_lf – Entity LazyFrame in
SCH.*names.static_lf – Optional static LazyFrame in
SCH.*names (withSEQ_IDcolumn included).presorted – Skip the
_prepare_entity()step when frames are already ordered bySEQ_IDthen by time column within each sequence (always the case for frames read from an existing store).exist_ok – Overwrite an existing store if
True.
- Returns:
The resolved
Pathto the written store directory.- Raises:
FileExistsError – If the store exists and
exist_ok=False.
Module contents#
Sequence store builder package.
- class tanat.store.sequence.builder.SequenceStoreBuilder[source]#
Bases:
ABC,Registrable,DisplayMixinFluent builder that accumulates data sources then writes a
SequenceStore.Each
add_*call declares:id_column- which source column is the sequence IDtime index kwargs - which column(s) are the time index dimension
features- feature columns to extract
The builder renames every source directly to
SCH.*internal names, concatenates/joins all sources, then runs the write pipeline inline.Obtain an instance via
SequencePool.builder()(recommended).- abstractmethod add_csv(path, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#
Register a CSV file as a source.
- abstractmethod add_dataframe(data, *, id_column: str, features: list[str], is_static: bool = False, **time_index_kwargs) SequenceStoreBuilder[source]#
Register an in-memory Polars / Pandas DataFrame.
- abstractmethod add_parquet(path, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#
Register a Parquet file (glob patterns supported).
- abstractmethod add_sql(connection: str, query: str, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#
Register a SQL query (requires
connectorx).
- build(store_path: str | Path, *, exist_ok: bool = False) Path[source]#
Run the write pipeline and persist the store to store_path.
- Parameters:
store_path – Destination directory for the store.
exist_ok – Overwrite an existing store if
True.
- Returns:
The resolved
Pathto the written store directory.- Raises:
FileExistsError – If the store exists and
exist_ok=False.ValueError – If no entity source is registered.
- build_from_frames(store_path: str | Path, entity_lf: LazyFrame, static_lf: LazyFrame | None = None, *, presorted: bool = False, exist_ok: bool = False) Path[source]#
Write a store directly from pre-prepared LazyFrames.
Unlike
build(), this method bypasses source registration and_to_internal()renaming. The caller provides frames already inSCH.*internal names:entity_lf:SEQ_ID | time_cols | feature_colsstatic_lf:SEQ_ID | static_cols(optional)
Intended for
save()so that the pool can prepare its frames (filter, cast, virtual merge) and delegate all I/O to the builder - keeping stores as read-only objects.- Parameters:
store_path – Destination directory.
entity_lf – Entity LazyFrame in
SCH.*names.static_lf – Optional static LazyFrame in
SCH.*names (withSEQ_IDcolumn included).presorted – Skip the
_prepare_entity()step when frames are already ordered bySEQ_IDthen by time column within each sequence (always the case for frames read from an existing store).exist_ok – Overwrite an existing store if
True.
- Returns:
The resolved
Pathto the written store directory.- Raises:
FileExistsError – If the store exists and
exist_ok=False.