tanat.store.sequence.builder package#

Subpackages#

Submodules#

tanat.store.sequence.builder.base module#

SequenceStoreBuilder: fluent base class for constructing a SequenceStore.

Each add_* call declares the source-local column names; the builder renames them directly to SCH.* internal names before writing. No intermediate canonical layer exists - column naming for display is a View-layer concern handled by Pool/Sequence settings.

class tanat.store.sequence.builder.base.SequenceStoreBuilder[source]#

Bases: ABC, Registrable, DisplayMixin

Fluent builder that accumulates data sources then writes a SequenceStore.

Each add_* call declares:

  • id_column - which source column is the sequence ID

  • time index kwargs - which column(s) are the time index dimension

  • features - feature columns to extract

The builder renames every source directly to SCH.* internal names, concatenates/joins all sources, then runs the write pipeline inline.

Obtain an instance via SequencePool.builder() (recommended).

__init__() None[source]#
abstractmethod add_csv(path, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#

Register a CSV file as a source.

abstractmethod add_dataframe(data, *, id_column: str, features: list[str], is_static: bool = False, **time_index_kwargs) SequenceStoreBuilder[source]#

Register an in-memory Polars / Pandas DataFrame.

abstractmethod add_parquet(path, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#

Register a Parquet file (glob patterns supported).

abstractmethod add_sql(connection: str, query: str, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#

Register a SQL query (requires connectorx).

build(store_path: str | Path, *, exist_ok: bool = False) Path[source]#

Run the write pipeline and persist the store to store_path.

Parameters:
  • store_path – Destination directory for the store.

  • exist_ok – Overwrite an existing store if True.

Returns:

The resolved Path to the written store directory.

Raises:
  • FileExistsError – If the store exists and exist_ok=False.

  • ValueError – If no entity source is registered.

build_from_frames(store_path: str | Path, entity_lf: LazyFrame, static_lf: LazyFrame | None = None, *, presorted: bool = False, exist_ok: bool = False) Path[source]#

Write a store directly from pre-prepared LazyFrames.

Unlike build(), this method bypasses source registration and _to_internal() renaming. The caller provides frames already in SCH.* internal names:

  • entity_lf: SEQ_ID | time_cols | feature_cols

  • static_lf: SEQ_ID | static_cols (optional)

Intended for save() so that the pool can prepare its frames (filter, cast, virtual merge) and delegate all I/O to the builder - keeping stores as read-only objects.

Parameters:
  • store_path – Destination directory.

  • entity_lf – Entity LazyFrame in SCH.* names.

  • static_lf – Optional static LazyFrame in SCH.* names (with SEQ_ID column included).

  • presorted – Skip the _prepare_entity() step when frames are already ordered by SEQ_ID then by time column within each sequence (always the case for frames read from an existing store).

  • exist_ok – Overwrite an existing store if True.

Returns:

The resolved Path to the written store directory.

Raises:

FileExistsError – If the store exists and exist_ok=False.

Module contents#

Sequence store builder package.

class tanat.store.sequence.builder.SequenceStoreBuilder[source]#

Bases: ABC, Registrable, DisplayMixin

Fluent builder that accumulates data sources then writes a SequenceStore.

Each add_* call declares:

  • id_column - which source column is the sequence ID

  • time index kwargs - which column(s) are the time index dimension

  • features - feature columns to extract

The builder renames every source directly to SCH.* internal names, concatenates/joins all sources, then runs the write pipeline inline.

Obtain an instance via SequencePool.builder() (recommended).

__init__() None[source]#
abstractmethod add_csv(path, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#

Register a CSV file as a source.

abstractmethod add_dataframe(data, *, id_column: str, features: list[str], is_static: bool = False, **time_index_kwargs) SequenceStoreBuilder[source]#

Register an in-memory Polars / Pandas DataFrame.

abstractmethod add_parquet(path, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#

Register a Parquet file (glob patterns supported).

abstractmethod add_sql(connection: str, query: str, *, id_column: str, features: list[str], is_static: bool = False, **kw) SequenceStoreBuilder[source]#

Register a SQL query (requires connectorx).

build(store_path: str | Path, *, exist_ok: bool = False) Path[source]#

Run the write pipeline and persist the store to store_path.

Parameters:
  • store_path – Destination directory for the store.

  • exist_ok – Overwrite an existing store if True.

Returns:

The resolved Path to the written store directory.

Raises:
  • FileExistsError – If the store exists and exist_ok=False.

  • ValueError – If no entity source is registered.

build_from_frames(store_path: str | Path, entity_lf: LazyFrame, static_lf: LazyFrame | None = None, *, presorted: bool = False, exist_ok: bool = False) Path[source]#

Write a store directly from pre-prepared LazyFrames.

Unlike build(), this method bypasses source registration and _to_internal() renaming. The caller provides frames already in SCH.* internal names:

  • entity_lf: SEQ_ID | time_cols | feature_cols

  • static_lf: SEQ_ID | static_cols (optional)

Intended for save() so that the pool can prepare its frames (filter, cast, virtual merge) and delegate all I/O to the builder - keeping stores as read-only objects.

Parameters:
  • store_path – Destination directory.

  • entity_lf – Entity LazyFrame in SCH.* names.

  • static_lf – Optional static LazyFrame in SCH.* names (with SEQ_ID column included).

  • presorted – Skip the _prepare_entity() step when frames are already ordered by SEQ_ID then by time column within each sequence (always the case for frames read from an existing store).

  • exist_ok – Overwrite an existing store if True.

Returns:

The resolved Path to the written store directory.

Raises:

FileExistsError – If the store exists and exist_ok=False.