tanat.dataset.access package#

Subpackages#

Submodules#

tanat.dataset.access.utils module#

User data access utils.

tanat.dataset.access.utils.access(data_type: str, cache_dir: Path | None = None, force: bool = False) Any[source]#

Access a dataset from Zenodo and return a ready-to-use object.

Depending on the type, this may return: - a DataFrame (for file-based datasets: CSV, Parquet, etc.) - a database connection (for SQL-based datasets)

Data is cached locally after the first download unless force=True is set.

Parameters:
  • data_type (str) – Name of the dataset registered in ZenodoAccessor.

  • cache_dir (str, optional) – Directory used for caching. Defaults to system temp directory.

  • force (bool) – If True, forces re-download even if data is already cached.

Returns:

A usable object for interacting with the dataset. The concrete type depends on the accessor implementation (e.g. a Path for SQL-based, datasets such as "mimic4", or a pandas.DataFrame for CSV-based) ones such as "mvad".

Return type:

Any

Raises:

ValueError – If data_type is not registered in the accessor.

Examples

>>> # Access a MVAD CSV dataset as a DataFrame
>>> df = access("mvad")
>>> # Access the mimic4 SQLite database: returns Path to the .db file
>>> db_path: Path = access("mimic4")
>>> DB = f"sqlite:///{db_path}"  # SQLAlchemy-compatible URL

tanat.dataset.access.zenodo module#

Zenodo dataset accessor.

class tanat.dataset.access.zenodo.ZenodoAccessor(record_id: int | str, filename: str, cache_dir: Path | None = None)[source]#

Bases: ABC, Registrable

Zenodo dataset accessor.

__init__(record_id: int | str, filename: str, cache_dir: Path | None = None) None[source]#

Initialize ZenodoAccessor.

Parameters:
  • record_id – Zenodo record ID.

  • filename – Name of the file to download.

  • cache_dir – Cache directory. Defaults to the system temp directory.

property cache_dir: Path[source]#

Cache directory.

download(force: bool = False) Path[source]#

Download file from Zenodo if not cached or invalid.

Parameters:

force – Force download even if file exists.

Returns:

Path to the downloaded file.

property expected_size: int[source]#

Get expected file size from Zenodo API.

get(force: bool = False) Any[source]#

Download and give access to data from Zenodo dataset.

Parameters:

force – Force download even if file exists.

Returns:

The object returned by _access_impl(). depends on the subclass (e.g. Path, pandas.DataFrame).

Return type:

Any

classmethod init(accessor_type: str, cache_dir: Path | None = None) ZenodoAccessor[source]#

Initialize a Zenodo accessor class dynamically.

Parameters:
  • accessor_type – Registered name of the accessor to create.

  • cache_dir – Cache directory. Defaults to the system temp directory.

Returns:

Instance of the requested accessor subclass.

Return type:

ZenodoAccessor

Raises:

ValueError – If accessor_type is not registered.

property local_path: Path[source]#

Local path to the cached file.

Module contents#

Dataset access utilities.

tanat.dataset.access.access(data_type: str, cache_dir: Path | None = None, force: bool = False) Any[source]#

Access a dataset from Zenodo and return a ready-to-use object.

Depending on the type, this may return: - a DataFrame (for file-based datasets: CSV, Parquet, etc.) - a database connection (for SQL-based datasets)

Data is cached locally after the first download unless force=True is set.

Parameters:
  • data_type (str) – Name of the dataset registered in ZenodoAccessor.

  • cache_dir (str, optional) – Directory used for caching. Defaults to system temp directory.

  • force (bool) – If True, forces re-download even if data is already cached.

Returns:

A usable object for interacting with the dataset. The concrete type depends on the accessor implementation (e.g. a Path for SQL-based, datasets such as "mimic4", or a pandas.DataFrame for CSV-based) ones such as "mvad".

Return type:

Any

Raises:

ValueError – If data_type is not registered in the accessor.

Examples

>>> # Access a MVAD CSV dataset as a DataFrame
>>> df = access("mvad")
>>> # Access the mimic4 SQLite database: returns Path to the .db file
>>> db_path: Path = access("mimic4")
>>> DB = f"sqlite:///{db_path}"  # SQLAlchemy-compatible URL