Metadata#
TanaT automatically infers rich metadata from your data at build time.
Metadata is attached to every pool via pool.metadata and describes
the time index, entity features, and static features.
Metadata Objects#
pool.metadata on a SequencePool
returns a SequenceMetadata instance;
on a TrajectoryPool it returns a
TrajectoryMetadata.
print(pool.metadata) # human-readable summary
pool.metadata.time_index # TimeIndexInfo (dtype, range, tz…)
pool.metadata.entity_features # list[FeatureInfo], alphabetical
pool.metadata.static_features # list[FeatureInfo] | None
Both objects expose is_categorical_feature(name);
SequenceMetadata also has
is_numeric_feature, is_datetime_feature, and is_duration_feature.
All raise KeyError for unknown feature names.
Tip
Printing any TanaT object renders a metadata summary: print(pool),
print(sequence), print(trajectory) and even print(entity) all
display type, features and temporal range in a human-readable form.
Feature Types#
TanaT maps each Polars dtype to a FeatureInfo
subclass with type-specific extra attributes:
Class |
Polars dtypes |
Extra attributes |
|---|---|---|
integers, floats |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
info = pool.metadata.feature_info("status")
print(info.summary) # e.g. "Categorical (5 categories)"
Cast Methods#
Casts are lazy and view-local: they are applied on the fly when data
is materialised, and do not touch the store files. Call pool.save()
to persist them.
Note
Cast methods are only available at Pool level (SequencePool,
TrajectoryPool). Casting directly on a Sequence, Trajectory,
or Entity is intentionally not supported: those objects are views
derived from a pool, and mutating their types independently would
desynchronise them from their siblings in the pool.
SequencePool#
cast_features()
Cast one or more entity or static feature columns.
# Entity features (default)
pool.cast_features({"status": pl.Categorical})
pool.cast_features({"response_time": pl.Duration("ms")})
pool.cast_features({"severity": pl.Enum(["low", "medium", "high"])})
# Static features
pool.cast_features({"age": pl.UInt8, "group": pl.Categorical}, is_static=True)
cast_id()
Cast the sequence ID column.
pool.cast_id(pl.Categorical)
cast_to_datetime() / cast_to_timestep()
Change the type of the time index.
pool.cast_to_datetime() # us, no timezone
pool.cast_to_datetime(unit="ms", time_zone="UTC")
pool.cast_to_timestep(pl.UInt32)
Note
cast_to_timestep() raises TypeError if the time index is already
a Datetime type.
TrajectoryPool#
Entity features live inside each linked sequence store; cast them directly
on tpool.sequence_pools["<alias>"].
cast_static_features()
Cast trajectory-level static features.
tpool.cast_static_features({"group": pl.Categorical})
# For entity features, go through the sequence pool:
tpool.sequence_pools["pharmacy"].cast_features({"medication": pl.Categorical})
cast_id() / cast_to_datetime() / cast_to_timestep()
Same signatures as on SequencePool,
but the cast is automatically propagated to all linked sequence pools.
tpool.cast_id(pl.Categorical) # propagates to all sub-pools
tpool.cast_to_datetime(unit="ms", time_zone="UTC") # idem
tpool.cast_to_timestep(pl.UInt32) # idem
Compatibility Matrix#
Method |
|
|
|---|---|---|
|
✓ |
✗ |
|
✗ |
✓ |
|
✓ |
✓ |
|
✓ |
✓ |
|
✓ |
✓ |
See Also#
tanat.metadata: full API for all metadata classes