tanat.store.source.type package#

Submodules#

tanat.store.source.type.dataframe module#

In-memory DataFrame source (Polars or Pandas).

class tanat.store.source.type.dataframe.DataFrameSource(data: pl.DataFrame | pl.LazyFrame | pd.DataFrame)[source]#

Bases: AbstractSource

Wraps an in-memory Polars or Pandas DataFrame as a source.

Parameters:: data – A polars.DataFrame, polars.LazyFrame, or pandas.DataFrame.

__init__(data: pl.DataFrame | pl.LazyFrame | pd.DataFrame) → None[source]#

read() → LazyFrame[source]#: Read the source and return a lazy frame.

schema() → Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

tanat.store.source.type.file module#

File-based sources: CSV and Parquet.

class tanat.store.source.type.file.CsvSource(path: str | Path, **kwargs)[source]#

Bases: AbstractSource

Reads a CSV file via polars.scan_csv().

Parameters:

path – Path to the CSV file.
kwargs – Forwarded verbatim to polars.scan_csv() (e.g. separator, schema_overrides, null_values).

__init__(path: str | Path, **kwargs) → None[source]#

read() → LazyFrame[source]#: Read the source and return a lazy frame.

schema() → Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

class tanat.store.source.type.file.ParquetSource(path: str | Path, **kwargs)[source]#

Bases: AbstractSource

Reads a Parquet file via polars.scan_parquet().

Parameters:

path – Path to the Parquet file (glob patterns supported).
kwargs – Forwarded verbatim to polars.scan_parquet().

__init__(path: str | Path, **kwargs) → None[source]#

read() → LazyFrame[source]#: Read the source and return a lazy frame.

schema() → Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

tanat.store.source.type.sql module#

SQL source via Polars + connectorx (optional dependency).

class tanat.store.source.type.sql.SqlSource(connection: str, query: str, **kwargs)[source]#

Bases: AbstractSource

Executes a SQL query and returns the result as a polars.LazyFrame.

String columns that look like dates/datetimes are automatically cast to polars.Datetime.

Requires connectorx:

pip install tanat[sql]

Parameters:

connection – Connection string (e.g. "postgresql://user:pwd@host/db").
query – SQL SELECT query to execute.
kwargs – Forwarded to polars.read_database_uri().

__init__(connection: str, query: str, **kwargs) → None[source]#

read() → LazyFrame[source]#: Read the source and return a lazy frame.

schema() → Schema[source]#

Probe the SQL schema with a zero-row query.

Wraps the user query as SELECT * FROM (...) AS _q LIMIT 0 so the database resolves column names and types without transferring any data rows.

Module contents#

class tanat.store.source.type.CsvSource(path: str | Path, **kwargs)[source]#

Bases: AbstractSource

Reads a CSV file via polars.scan_csv().

Parameters:

path – Path to the CSV file.
kwargs – Forwarded verbatim to polars.scan_csv() (e.g. separator, schema_overrides, null_values).

__init__(path: str | Path, **kwargs) → None[source]#

read() → LazyFrame[source]#: Read the source and return a lazy frame.

schema() → Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

class tanat.store.source.type.DataFrameSource(data: pl.DataFrame | pl.LazyFrame | pd.DataFrame)[source]#

Bases: AbstractSource

Wraps an in-memory Polars or Pandas DataFrame as a source.

Parameters:: data – A polars.DataFrame, polars.LazyFrame, or pandas.DataFrame.

__init__(data: pl.DataFrame | pl.LazyFrame | pd.DataFrame) → None[source]#

read() → LazyFrame[source]#: Read the source and return a lazy frame.

schema() → Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

class tanat.store.source.type.ParquetSource(path: str | Path, **kwargs)[source]#

Bases: AbstractSource

Reads a Parquet file via polars.scan_parquet().

Parameters:

path – Path to the Parquet file (glob patterns supported).
kwargs – Forwarded verbatim to polars.scan_parquet().

__init__(path: str | Path, **kwargs) → None[source]#

read() → LazyFrame[source]#: Read the source and return a lazy frame.

schema() → Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

class tanat.store.source.type.SqlSource(connection: str, query: str, **kwargs)[source]#

Bases: AbstractSource

Executes a SQL query and returns the result as a polars.LazyFrame.

String columns that look like dates/datetimes are automatically cast to polars.Datetime.

Requires connectorx:

pip install tanat[sql]

Parameters:

connection – Connection string (e.g. "postgresql://user:pwd@host/db").
query – SQL SELECT query to execute.
kwargs – Forwarded to polars.read_database_uri().

__init__(connection: str, query: str, **kwargs) → None[source]#

read() → LazyFrame[source]#: Read the source and return a lazy frame.

schema() → Schema[source]#

Probe the SQL schema with a zero-row query.

Wraps the user query as SELECT * FROM (...) AS _q LIMIT 0 so the database resolves column names and types without transferring any data rows.