tanat.store.source.type package#

Submodules#

tanat.store.source.type.dataframe module#

In-memory DataFrame source (Polars or Pandas).

class tanat.store.source.type.dataframe.DataFrameSource(data: pl.DataFrame | pl.LazyFrame | pd.DataFrame)[source]#

Bases: AbstractSource

Wraps an in-memory Polars or Pandas DataFrame as a source.

Parameters:

data – A polars.DataFrame, polars.LazyFrame, or pandas.DataFrame.

__init__(data: pl.DataFrame | pl.LazyFrame | pd.DataFrame) None[source]#
read() LazyFrame[source]#

Read the source and return a lazy frame.

schema() Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

tanat.store.source.type.file module#

File-based sources: CSV and Parquet.

class tanat.store.source.type.file.CsvSource(path: str | Path, **kwargs)[source]#

Bases: AbstractSource

Reads a CSV file via polars.scan_csv().

Parameters:
  • path – Path to the CSV file.

  • kwargs – Forwarded verbatim to polars.scan_csv() (e.g. separator, schema_overrides, null_values).

__init__(path: str | Path, **kwargs) None[source]#
read() LazyFrame[source]#

Read the source and return a lazy frame.

schema() Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

class tanat.store.source.type.file.ParquetSource(path: str | Path, **kwargs)[source]#

Bases: AbstractSource

Reads a Parquet file via polars.scan_parquet().

Parameters:
  • path – Path to the Parquet file (glob patterns supported).

  • kwargs – Forwarded verbatim to polars.scan_parquet().

__init__(path: str | Path, **kwargs) None[source]#
read() LazyFrame[source]#

Read the source and return a lazy frame.

schema() Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

tanat.store.source.type.sql module#

SQL source via Polars + connectorx (optional dependency).

class tanat.store.source.type.sql.SqlSource(connection: str, query: str, **kwargs)[source]#

Bases: AbstractSource

Executes a SQL query and returns the result as a polars.LazyFrame.

String columns that look like dates/datetimes are automatically cast to polars.Datetime.

Requires connectorx:

pip install tanat[sql]
Parameters:
  • connection – Connection string (e.g. "postgresql://user:pwd@host/db").

  • query – SQL SELECT query to execute.

  • kwargs – Forwarded to polars.read_database_uri().

__init__(connection: str, query: str, **kwargs) None[source]#
read() LazyFrame[source]#

Read the source and return a lazy frame.

schema() Schema[source]#

Probe the SQL schema with a zero-row query.

Wraps the user query as SELECT * FROM (...) AS _q LIMIT 0 so the database resolves column names and types without transferring any data rows.

Module contents#

Register AbstractSource subtypes.

class tanat.store.source.type.CsvSource(path: str | Path, **kwargs)[source]#

Bases: AbstractSource

Reads a CSV file via polars.scan_csv().

Parameters:
  • path – Path to the CSV file.

  • kwargs – Forwarded verbatim to polars.scan_csv() (e.g. separator, schema_overrides, null_values).

__init__(path: str | Path, **kwargs) None[source]#
read() LazyFrame[source]#

Read the source and return a lazy frame.

schema() Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

class tanat.store.source.type.DataFrameSource(data: pl.DataFrame | pl.LazyFrame | pd.DataFrame)[source]#

Bases: AbstractSource

Wraps an in-memory Polars or Pandas DataFrame as a source.

Parameters:

data – A polars.DataFrame, polars.LazyFrame, or pandas.DataFrame.

__init__(data: pl.DataFrame | pl.LazyFrame | pd.DataFrame) None[source]#
read() LazyFrame[source]#

Read the source and return a lazy frame.

schema() Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

class tanat.store.source.type.ParquetSource(path: str | Path, **kwargs)[source]#

Bases: AbstractSource

Reads a Parquet file via polars.scan_parquet().

Parameters:
  • path – Path to the Parquet file (glob patterns supported).

  • kwargs – Forwarded verbatim to polars.scan_parquet().

__init__(path: str | Path, **kwargs) None[source]#
read() LazyFrame[source]#

Read the source and return a lazy frame.

schema() Schema[source]#

Return the column schema without reading the full dataset.

Implementations should be as cheap as possible - reading metadata only (file headers, SQL LIMIT 0 probe, in-memory schema, …).

class tanat.store.source.type.SqlSource(connection: str, query: str, **kwargs)[source]#

Bases: AbstractSource

Executes a SQL query and returns the result as a polars.LazyFrame.

String columns that look like dates/datetimes are automatically cast to polars.Datetime.

Requires connectorx:

pip install tanat[sql]
Parameters:
  • connection – Connection string (e.g. "postgresql://user:pwd@host/db").

  • query – SQL SELECT query to execute.

  • kwargs – Forwarded to polars.read_database_uri().

__init__(connection: str, query: str, **kwargs) None[source]#
read() LazyFrame[source]#

Read the source and return a lazy frame.

schema() Schema[source]#

Probe the SQL schema with a zero-row query.

Wraps the user query as SELECT * FROM (...) AS _q LIMIT 0 so the database resolves column names and types without transferring any data rows.