tanat.dataset.access package#
Subpackages#
Submodules#
tanat.dataset.access.utils module#
User data access utils.
- tanat.dataset.access.utils.access(data_type: str, cache_dir: Path | None = None, force: bool = False) Any[source]#
Access a dataset from Zenodo and return a ready-to-use object.
Depending on the type, this may return: - a DataFrame (for file-based datasets: CSV, Parquet, etc.) - a database connection (for SQL-based datasets)
Data is cached locally after the first download unless force=True is set.
- Parameters:
data_type (str) – Name of the dataset registered in ZenodoAccessor.
cache_dir (str, optional) – Directory used for caching. Defaults to system temp directory.
force (bool) – If True, forces re-download even if data is already cached.
- Returns:
A usable object for interacting with the dataset. The concrete type depends on the accessor implementation (e.g. a
Pathfor SQL-based, datasets such as"mimic4", or apandas.DataFramefor CSV-based) ones such as"mvad".- Return type:
Any
- Raises:
ValueError – If
data_typeis not registered in the accessor.
Examples
>>> # Access a MVAD CSV dataset as a DataFrame >>> df = access("mvad")
>>> # Access the mimic4 SQLite database: returns Path to the .db file >>> db_path: Path = access("mimic4") >>> DB = f"sqlite:///{db_path}" # SQLAlchemy-compatible URL
tanat.dataset.access.zenodo module#
Zenodo dataset accessor.
- class tanat.dataset.access.zenodo.ZenodoAccessor(record_id: int | str, filename: str, cache_dir: Path | None = None)[source]#
Bases:
ABC,RegistrableZenodo dataset accessor.
- __init__(record_id: int | str, filename: str, cache_dir: Path | None = None) None[source]#
Initialize ZenodoAccessor.
- Parameters:
record_id – Zenodo record ID.
filename – Name of the file to download.
cache_dir – Cache directory. Defaults to the system temp directory.
- download(force: bool = False) Path[source]#
Download file from Zenodo if not cached or invalid.
- Parameters:
force – Force download even if file exists.
- Returns:
Path to the downloaded file.
- get(force: bool = False) Any[source]#
Download and give access to data from Zenodo dataset.
- Parameters:
force – Force download even if file exists.
- Returns:
The object returned by
_access_impl(). depends on the subclass (e.g.Path,pandas.DataFrame).- Return type:
Any
- classmethod init(accessor_type: str, cache_dir: Path | None = None) ZenodoAccessor[source]#
Initialize a Zenodo accessor class dynamically.
- Parameters:
accessor_type – Registered name of the accessor to create.
cache_dir – Cache directory. Defaults to the system temp directory.
- Returns:
Instance of the requested accessor subclass.
- Return type:
- Raises:
ValueError – If
accessor_typeis not registered.
Module contents#
Dataset access utilities.
- tanat.dataset.access.access(data_type: str, cache_dir: Path | None = None, force: bool = False) Any[source]#
Access a dataset from Zenodo and return a ready-to-use object.
Depending on the type, this may return: - a DataFrame (for file-based datasets: CSV, Parquet, etc.) - a database connection (for SQL-based datasets)
Data is cached locally after the first download unless force=True is set.
- Parameters:
data_type (str) – Name of the dataset registered in ZenodoAccessor.
cache_dir (str, optional) – Directory used for caching. Defaults to system temp directory.
force (bool) – If True, forces re-download even if data is already cached.
- Returns:
A usable object for interacting with the dataset. The concrete type depends on the accessor implementation (e.g. a
Pathfor SQL-based, datasets such as"mimic4", or apandas.DataFramefor CSV-based) ones such as"mvad".- Return type:
Any
- Raises:
ValueError – If
data_typeis not registered in the accessor.
Examples
>>> # Access a MVAD CSV dataset as a DataFrame >>> df = access("mvad")
>>> # Access the mimic4 SQLite database: returns Path to the .db file >>> db_path: Path = access("mimic4") >>> DB = f"sqlite:///{db_path}" # SQLAlchemy-compatible URL