tanat.metadata package#
Submodules#
tanat.metadata.feature module#
Feature metadata definitions and helpers.
- class tanat.metadata.feature.ArrayInfo(name: str, dtype: str, dimension: int | None)[source]#
Bases:
FeatureInfoMetadata for array features (fixed-size Array or variable-size List).
- classmethod from_stats(col_name: str, dtype: Any, stats: dict) ArrayInfo[source]#
Factory: Builds instance from stats.
- class tanat.metadata.feature.BooleanInfo(name: str, dtype: str, true_count: int | None, false_count: int | None)[source]#
Bases:
FeatureInfoMetadata for boolean features.
- classmethod from_stats(col_name: str, dtype: Any, stats: dict) BooleanInfo[source]#
Builds a BooleanInfo from precomputed stats.
- class tanat.metadata.feature.CategoricalInfo(name: str, dtype: str, n_unique: int | None, ordered: bool = False)[source]#
Bases:
FeatureInfoMetadata for categorical/string features.
Note
n_uniqueis computed from the first 10 000 rows sampled bybuild_feature_metadata(). On large datasets this may undercount the actual number of distinct categories.- classmethod from_stats(col_name: str, dtype: Any, stats: dict) CategoricalInfo[source]#
Builds a CategoricalInfo from precomputed stats.
Only
Enumdtype is considered ordered (user-defined order); standardCategoricaluses lexical ordering.
- class tanat.metadata.feature.FeatureInfo(name: str, dtype: str)[source]#
Bases:
ABCBase class for feature metadata.
- abstractmethod classmethod from_stats(col_name: str, dtype: str | DataType, stats: dict) FeatureInfo[source]#
Factory: Builds instance from stats.
- class tanat.metadata.feature.NumericalInfo(name: str, dtype: str, min: float | int | None, max: float | int | None)[source]#
Bases:
FeatureInfoMetadata for numerical features.
- classmethod from_stats(col_name: str, dtype: str | DataType, stats: dict) NumericalInfo[source]#
Builds a NumericalInfo from precomputed stats.
- class tanat.metadata.feature.StringInfo(name: str, dtype: str, min_length: int | None, max_length: int | None)[source]#
Bases:
FeatureInfoMetadata for string features.
- classmethod from_stats(col_name: str, dtype: Any, stats: dict) StringInfo[source]#
Factory: Builds instance from stats.
- class tanat.metadata.feature.TemporalInfo(name: str, dtype: str, min: Any | None, max: Any | None)[source]#
Bases:
FeatureInfoMetadata for temporal features (Date, Time, Datetime, Duration).
- classmethod from_stats(col_name: str, dtype: Any, stats: dict) TemporalInfo[source]#
Factory: Builds instance from stats.
- tanat.metadata.feature.build_feature_metadata(lf: LazyFrame) list[FeatureInfo][source]#
Compute semantic metadata for every column in lf.
A sample of 10 000 rows is used for statistics so it stays fast on large datasets.
- Returns:
List of
FeatureInfoinstances in alphabetical order.
- tanat.metadata.feature.get_feature_info_class(dtype: DataType) type[FeatureInfo][source]#
Returns the appropriate FeatureInfo class for a given Polars DataType.
tanat.metadata.sequence module#
Sequence Metadata definitions.
- class tanat.metadata.sequence.SequenceMetadata(seq_id: DataType, time_index: TimeIndexInfo, entity_features: list[FeatureInfo], static_features: list[FeatureInfo] | None)[source]#
Bases:
objectRich Sequence metadata with semantic profiling.
Feature lists (
entity_features,static_features) are guaranteed to be in alphabetical order.- __init__(seq_id: DataType, time_index: TimeIndexInfo, entity_features: list[FeatureInfo], static_features: list[FeatureInfo] | None) None[source]#
- assert_features_compatible_with(other: SequenceMetadata, alias: str, *, context: str = 'Features must be compatible for merging.') list[str][source]#
Check that other exposes at least all entity features declared in self, with matching dtypes.
- Parameters:
other – Metadata of the sequence or pool being compared.
alias – Label used in error messages to identify other.
context – Sentence appended to each error message.
- Returns:
List of feature names present in other but absent in self (extras).
- Raises:
ValueError – If other is missing a feature present in self.
TypeError – If a shared feature has an incompatible dtype.
- assert_id_compatible_with(other: SequenceMetadata, alias: str, *, context: str = 'ID dtypes must match.') None[source]#
Raises
TypeErrorif other has a different ID column dtype.- Parameters:
other – Metadata of the sequence or pool being compared.
alias – Label used in the error message to identify other.
context – Sentence appended to the error message.
- Raises:
TypeError – If the ID dtypes differ.
- assert_time_index_compatible_with(other: SequenceMetadata, alias: str, *, context: str = 'Temporal schemas must match.') None[source]#
Raises
TypeErrorif other has an incompatible time index schema.Two time index schemas are compatible when they share the same Datetime unit and time_zone (or identical numeric dtype for timestep sequences). Range information (min/max) is not checked.
- Parameters:
other – Metadata of the sequence or pool being compared.
alias – Label used in the error message to identify other.
context – Sentence appended to the error message.
- Raises:
TypeError – If the time index schemas are incompatible.
- entity_features: list[FeatureInfo][source]#
- feature_info(name: str, is_static: bool = False) FeatureInfo | None[source]#
Return the
FeatureInfofor name, orNone.- Parameters:
name – Feature name to look up.
is_static –
Truefor static features,Falsefor entity features.
- Returns:
The matching
FeatureInfoinstance, orNoneif not found.
- classmethod infer_entity_features(entity_lf: LazyFrame | None) list[FeatureInfo][source]#
Returns the list of
FeatureInfofor entity_lf, or[]ifNone.
- classmethod infer_static_features(static_lf: LazyFrame | None) list[FeatureInfo] | None[source]#
Returns the list of
FeatureInfofor static_lf, orNoneifNone.
- classmethod infer_time_index(time_index: LazyFrame) TimeIndexInfo[source]#
Returns a
TimeIndexInfobuilt from the time index LazyFrame.
- is_categorical_feature(name: str, is_static: bool = False) bool[source]#
Returns
Trueif name is aCategoricalorEnumfeature.- Parameters:
name – Feature name to check.
is_static –
Truefor static features,Falsefor entity features.
- Raises:
KeyError – If the feature name is not found.
Examples:
>>> pool.metadata.is_categorical_feature("status") True
- is_datetime_feature(name: str, is_static: bool = False) bool[source]#
Returns
Trueif name is apl.Datetimeorpl.Datefeature (not a Duration).- Parameters:
name – Feature name to check.
is_static –
Truefor static features,Falsefor entity features.
- Raises:
KeyError – If the feature name is not found.
Examples:
>>> pool.metadata.is_datetime_feature("discharge_time") True
- is_duration_feature(name: str, is_static: bool = False) bool[source]#
Returns
Trueif name is apl.Durationfeature.- Parameters:
name – Feature name to check.
is_static –
Truefor static features,Falsefor entity features.
- Raises:
KeyError – If the feature name is not found.
Examples:
>>> pool.metadata.is_duration_feature("los") True
- is_numeric_feature(name: str, is_static: bool = False) bool[source]#
Returns
Trueif name is a numeric (integer or float) feature.- Parameters:
name – Feature name to check.
is_static –
Truefor static features,Falsefor entity features.
- Raises:
KeyError – If the feature name is not found.
Examples:
>>> pool.metadata.is_numeric_feature("duration_hrs") True
- scope(entity_features: list[str] | None = None, static_features: list[str] | None = None) SequenceMetadata[source]#
Return a new metadata restricted to the given feature subsets.
- Parameters:
entity_features – Feature names to keep.
Nonekeeps all.static_features – Feature names to keep.
Nonekeeps all. An empty list producesstatic_features=None.
- Returns:
A filtered copy, or
selfwhen nothing was actually removed. Original feature order is preserved.
- static_features: list[FeatureInfo] | None[source]#
- time_index: TimeIndexInfo[source]#
- class tanat.metadata.sequence.TimeIndexInfo(dtype: str, is_datetime: bool, min: Any | None, max: Any | None, unit: str | None = None, time_zone: str | None = None)[source]#
Bases:
objectMetadata for the sequence time index (time columns).
- __init__(dtype: str, is_datetime: bool, min: Any | None, max: Any | None, unit: str | None = None, time_zone: str | None = None) None[source]#
- classmethod from_lazyframe(lf: LazyFrame) TimeIndexInfo[source]#
Factory: builds a
TimeIndexInfoby inspecting all columns of the time index LazyFrame.Validates that: - Every column is a supported type (
pl.Datetime,pl.Date,or a numeric type, integer or float for discrete timesteps).
All columns share the same base type (no mix of
Datetimestart withDateend, for example).
Also computes the global min/max across all time index columns.
- Raises:
TypeError – If a column has an unsupported type or if column types are inconsistent.
- is_schema_compatible(other: TimeIndexInfo) bool[source]#
Returns
Trueif other has the same schema as this instance.Only the structural fields are compared (
dtype,is_datetime,unit,time_zone). Range fields (min,max) are intentionally ignored. Different stores can cover different time periods and still be joinable.
tanat.metadata.trajectory module#
Trajectory Metadata definitions.
- class tanat.metadata.trajectory.TrajectoryMetadata(traj_id: DataType, time_index: TimeIndexInfo, static_features: list[FeatureInfo] | None)[source]#
Bases:
objectRich Trajectory metadata with semantic profiling.
- traj_id[source]#
Polars DataType of the trajectory ID column.
- Type:
polars.datatypes.classes.DataType
- time_index[source]#
Aggregated time index info across all linked stores. A
TrajectoryStorealways has at least one linked store, so this field is neverNone.
- static_features[source]#
List of
FeatureInfofor each static feature (alphabetical order), orNoneif none exist.- Type:
list[tanat.metadata.feature.FeatureInfo] | None
- __init__(traj_id: DataType, time_index: TimeIndexInfo, static_features: list[FeatureInfo] | None) None[source]#
- classmethod infer_static(lf: LazyFrame | None) list[FeatureInfo] | None[source]#
Returns the list of
FeatureInfofor lf, orNoneif lf isNone.
- classmethod infer_time_index(seq_stores: dict[str, SequenceStore]) TimeIndexInfo[source]#
Aggregates the time index range across all linked stores by taking the global min/max.
- Raises:
ValueError – If seq_stores is empty (should never happen for a valid
TrajectoryStore).
- is_categorical_feature(name: str) bool[source]#
Returns
Trueif name is aCategoricalorEnumstatic feature.- Parameters:
name – Feature name to check.
- Raises:
KeyError – If the feature name is not found.
Examples:
>>> traj.metadata.is_categorical_feature("group") True
- scope(static_features: list[str] | None = None) TrajectoryMetadata[source]#
Return a new metadata restricted to the given static feature subset.
- Parameters:
static_features – Feature names to keep.
Nonekeeps all. An empty list producesstatic_features=None.- Returns:
A filtered copy, or
selfwhen nothing was actually removed. Original feature order is preserved.
- static_features: list[FeatureInfo] | None[source]#
- time_index: TimeIndexInfo[source]#
Module contents#
Package stub.