Metadata#

TanaT automatically infers rich metadata from your data at build time. Metadata is attached to every pool via pool.metadata and describes the time index, entity features, and static features.


Metadata Objects#

pool.metadata on a SequencePool returns a SequenceMetadata instance; on a TrajectoryPool it returns a TrajectoryMetadata.

print(pool.metadata)              # human-readable summary
pool.metadata.time_index          # TimeIndexInfo (dtype, range, tz…)
pool.metadata.entity_features     # list[FeatureInfo], alphabetical
pool.metadata.static_features     # list[FeatureInfo] | None

Both objects expose is_categorical_feature(name); SequenceMetadata also has is_numeric_feature, is_datetime_feature, and is_duration_feature. All raise KeyError for unknown feature names.

Tip

Printing any TanaT object renders a metadata summary: print(pool), print(sequence), print(trajectory) and even print(entity) all display type, features and temporal range in a human-readable form.


Feature Types#

TanaT maps each Polars dtype to a FeatureInfo subclass with type-specific extra attributes:

Class

Polars dtypes

Extra attributes

NumericalInfo

integers, floats

min, max

CategoricalInfo

Categorical, Enum

n_unique, ordered

BooleanInfo

Boolean

true_count, false_count

StringInfo

String

min_length, max_length

TemporalInfo

Date, Datetime, Duration

min, max, is_duration

ArrayInfo

Array, List

dimension

info = pool.metadata.feature_info("status")
print(info.summary)   # e.g. "Categorical (5 categories)"

Cast Methods#

Casts are lazy and view-local: they are applied on the fly when data is materialised, and do not touch the store files. Call pool.save() to persist them.

Note

Cast methods are only available at Pool level (SequencePool, TrajectoryPool). Casting directly on a Sequence, Trajectory, or Entity is intentionally not supported: those objects are views derived from a pool, and mutating their types independently would desynchronise them from their siblings in the pool.

SequencePool#

cast_features()

Cast one or more entity or static feature columns.

# Entity features (default)
pool.cast_features({"status": pl.Categorical})
pool.cast_features({"response_time": pl.Duration("ms")})
pool.cast_features({"severity": pl.Enum(["low", "medium", "high"])})

# Static features
pool.cast_features({"age": pl.UInt8, "group": pl.Categorical}, is_static=True)

cast_id()

Cast the sequence ID column.

pool.cast_id(pl.Categorical)

cast_to_datetime() / cast_to_timestep()

Change the type of the time index.

pool.cast_to_datetime()                            # us, no timezone
pool.cast_to_datetime(unit="ms", time_zone="UTC")

pool.cast_to_timestep(pl.UInt32)

Note

cast_to_timestep() raises TypeError if the time index is already a Datetime type.


TrajectoryPool#

Entity features live inside each linked sequence store; cast them directly on tpool.sequence_pools["<alias>"].

cast_static_features()

Cast trajectory-level static features.

tpool.cast_static_features({"group": pl.Categorical})

# For entity features, go through the sequence pool:
tpool.sequence_pools["pharmacy"].cast_features({"medication": pl.Categorical})

cast_id() / cast_to_datetime() / cast_to_timestep()

Same signatures as on SequencePool, but the cast is automatically propagated to all linked sequence pools.

tpool.cast_id(pl.Categorical)                      # propagates to all sub-pools
tpool.cast_to_datetime(unit="ms", time_zone="UTC") # idem
tpool.cast_to_timestep(pl.UInt32)                  # idem

Compatibility Matrix#

Method

SequencePool

TrajectoryPool

cast_features(schema, is_static=False)

cast_static_features(schema)

cast_id(dtype)

cast_to_datetime(unit, time_zone)

cast_to_timestep(dtype)


See Also#