tanat.metric package#
Subpackages#
- tanat.metric.entity package
- tanat.metric.sequence package
- Subpackages
- Submodules
- tanat.metric.sequence.base module
- Module contents
- tanat.metric.trajectory package
Submodules#
tanat.metric.matrix module#
DistanceMatrix: thin numpy wrapper with associated IDs.
- class tanat.metric.matrix.DistanceMatrix(data: ndarray, ids: list)[source]#
Bases:
objectPairwise distance matrix with associated IDs.
Wrapper around a square
numpy.ndarrayassociating each row/column with a sequence (or trajectory) identifier.Example:
dm = DistanceMatrix(np.zeros((3, 3), dtype="float32"), ids=[1, 2, 3]) dm.to_frame() # pandas DataFrame (default) dm.to_frame("polars") # polars DataFrame dm.to_numpy() # raw array
- __init__(data: ndarray, ids: list) None[source]#
Create a DistanceMatrix.
- Parameters:
data – Square numpy array of shape
(n, n).ids – List of n identifiers matching the matrix rows/columns. Stored as-is (order preserved from
pool.unique_ids).
- classmethod empty(ids: list, dtype: str = 'float32') DistanceMatrix[source]#
Create a zero-initialised square matrix.
- Parameters:
ids – List of identifiers.
dtype – Numpy dtype string (default
"float32").
- Returns:
A
DistanceMatrixof shape(n, n)filled with zeros.
- classmethod from_path(path: str | Path) DistanceMatrix[source]#
Load a previously computed distance matrix from disk.
Uses
resolve_pathto resolve the storage directory (workspace name or filesystem path), then opens the memmap in read-only mode.- Parameters:
path – Storage directory. Same formats as
StorageOptions.store_path: plain name ("distances"), relative path ("./distances"), or absolute path.- Returns:
A
DistanceMatrixbacked by a read-only memmap.- Raises:
FileNotFoundError – If the directory or required files don’t exist.
ValueError – If progress.json status is not
"complete"(incomplete computation).
- to_frame(fmt: Literal['pandas', 'polars'] = 'pandas') DataFrame | DataFrame[source]#
Return a labelled dataframe with IDs as index/columns.
- Parameters:
fmt –
"pandas"(default) returns apandas.DataFrame;"polars"returns apolars.DataFramewith an extra"id"column (Polars has no row index).- Returns:
Square dataframe of shape
(n, n).- Raises:
ValueError – If fmt is not
"pandas"or"polars".
Module contents#
Metric Module.
- class tanat.metric.AggregationSettings(*, default_metric: SequenceMetric = 'linearpairwise', sequence_metrics: dict[str, SequenceMetric] | None = None, agg_fun: str = 'mean', weights: dict[str, float] | None = None)[source]#
Bases:
objectSettings for
AggregationTrajectoryMetric.agg_funandweightsare orthogonal:"mean"+ weights computes a weighted mean,"sum"+ weights a weighted sum. Aliases absent fromweightsdefault to1.0.- default_metric: SequenceMetric = 'linearpairwise'[source]#
- model_dump(*, mode='python', **dump_kwargs)[source]#
Dump settings to a dict via Pydantic serialization.
- sequence_metrics: dict[str, SequenceMetric] | None = None[source]#
- class tanat.metric.AggregationTrajectoryMetric(default_metric: SequenceMetric | str = 'linearpairwise', sequence_metrics: dict[str, SequenceMetric | str] | None = None, agg_fun: str = 'mean', weights: dict[str, float] | None = None, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
TrajectoryMetricTrajectory distance by per-alias sequence distances, then weighted aggregation.
For each store alias visible on both trajectories, computes the sequence-level distance using the configured
SequenceMetric. The resulting per-alias distances are aggregated (weighted mean/sum) into a scalar trajectory distance.Example:
hamming = HammingEntityMetric(entity_feature="status") lp = LinearPairwiseSequenceMetric(entity_metric=hamming) agg = AggregationTrajectoryMetric( default_metric=lp, agg_fun="mean", weights={"events": 1.0, "states": 0.5}, ) dist = agg(traj_a, traj_b) dm = agg.compute_matrix(traj_pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation.
- SETTINGS_CLASS[source]#
alias of
AggregationSettings
- __init__(default_metric: SequenceMetric | str = 'linearpairwise', sequence_metrics: dict[str, SequenceMetric | str] | None = None, agg_fun: str = 'mean', weights: dict[str, float] | None = None, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- class tanat.metric.Chi2SequenceMetric(entity_feature: str | None = None, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
SequenceMetricChi-squared distance between the state-time distributions of two sequences.
Rather than comparing sequences element-by-element, Chi² compares the proportion of time (or event count) spent in each categorical state.
Event sequences: each event contributes a weight of 1.
Interval / State sequences: each entity contributes
end − startas its weight.
The per-category proportions are computed independently for each sequence, then compared via the Chi-squared distance formula:
\[d(a, b) = \sqrt{\sum_j \frac{(p_{aj} - p_{bj})^2}{p_{aj} + p_{bj}}}\]Note
Chi² does not use an entity metric;
entity_metricis absent from its settings. Thevalidate_compositionmethod checks only that the requested feature is present.Empty-sequence behaviour:
Both empty →
0.0(identical empty distributions).One empty →
1.0(maximally different distributions).
Example:
chi2 = Chi2SequenceMetric(entity_feature="status") d = chi2(seq_a, seq_b) dm = chi2.compute_matrix(pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- SETTINGS_CLASS[source]#
alias of
Chi2Settings
- __init__(entity_feature: str | None = None, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- prepare_batch_data(pool: SequencePool) tuple[source]#
Build histogram arrays for all sequences in pool.
- Returns:
(hists, n_cats)where hists is a float32 numpy array of shape(n, n_cats)containing raw (unnormalised) weights, with rows ordered to matchpool.unique_ids.
- validate_composition(seq_a: Sequence, seq_b: Sequence | None = None) None[source]#
Resolve and validate the target feature.
If
entity_featurewas not specified, resolves to the first entity feature ofseq_aand stores it intarget_feature. Then checks that the feature is present and categorical in every provided sequence.- Parameters:
seq_a – Primary sequence.
seq_b – Optional second sequence.
- Raises:
KeyError – If the feature is absent from a sequence.
TypeError – If the feature is not categorical.
- class tanat.metric.Chi2Settings(*, entity_feature: str | None = None)[source]#
Bases:
objectSettings for
Chi2SequenceMetric.- Parameters:
entity_feature – Categorical feature name used as the histogram key (same semantics as
HammingEntityMetric).None→ resolved from the first entity feature of the sequence atvalidate_composition()time.
- class tanat.metric.DTWSequenceMetric(entity_metric: EntityMetric | str = 'hamming', window: int | None = None, normalize: bool = False, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
SequenceMetricDynamic Time Warping distance between two sequences.
Uses a space-optimised 2-row DP. The Sakoe-Chiba band is applied when
windowis set, limiting the warping path to stay withinwindowcells of the diagonal.Empty-sequence behaviour:
Both empty →
nan(no alignment possible).One empty →
nan(no alignment possible).
When
normalize=True, divides the raw DTW cost bylen_a + len_b(an approximation that does not require path backtracking).Example:
dtw = DTWSequenceMetric(window=3, normalize=True) d = dtw(seq_a, seq_b) dm = dtw.compute_matrix(pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- SETTINGS_CLASS[source]#
alias of
DTWSettings
- __init__(entity_metric: EntityMetric | str = 'hamming', window: int | None = None, normalize: bool = False, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- class tanat.metric.DTWSettings(*, entity_metric: EntityMetric = 'hamming', window: int | None = None, normalize: bool = False)[source]#
Bases:
objectSettings for
DTWSequenceMetric.- Parameters:
entity_metric – Entity-level distance metric. Default:
"hamming".window – Sakoe-Chiba band width (number of cells off the diagonal).
Nonemeans no constraint (full DTW). Must be > 0 when set.normalize – When
True, divide the DTW cost bylen_a + len_b(approximation that avoids O(n×m) backtracking). Default:False.
- entity_metric: EntityMetric = 'hamming'[source]#
- class tanat.metric.DistanceMatrix(data: ndarray, ids: list)[source]#
Bases:
objectPairwise distance matrix with associated IDs.
Wrapper around a square
numpy.ndarrayassociating each row/column with a sequence (or trajectory) identifier.Example:
dm = DistanceMatrix(np.zeros((3, 3), dtype="float32"), ids=[1, 2, 3]) dm.to_frame() # pandas DataFrame (default) dm.to_frame("polars") # polars DataFrame dm.to_numpy() # raw array
- __init__(data: ndarray, ids: list) None[source]#
Create a DistanceMatrix.
- Parameters:
data – Square numpy array of shape
(n, n).ids – List of n identifiers matching the matrix rows/columns. Stored as-is (order preserved from
pool.unique_ids).
- classmethod empty(ids: list, dtype: str = 'float32') DistanceMatrix[source]#
Create a zero-initialised square matrix.
- Parameters:
ids – List of identifiers.
dtype – Numpy dtype string (default
"float32").
- Returns:
A
DistanceMatrixof shape(n, n)filled with zeros.
- classmethod from_path(path: str | Path) DistanceMatrix[source]#
Load a previously computed distance matrix from disk.
Uses
resolve_pathto resolve the storage directory (workspace name or filesystem path), then opens the memmap in read-only mode.- Parameters:
path – Storage directory. Same formats as
StorageOptions.store_path: plain name ("distances"), relative path ("./distances"), or absolute path.- Returns:
A
DistanceMatrixbacked by a read-only memmap.- Raises:
FileNotFoundError – If the directory or required files don’t exist.
ValueError – If progress.json status is not
"complete"(incomplete computation).
- to_frame(fmt: Literal['pandas', 'polars'] = 'pandas') DataFrame | DataFrame[source]#
Return a labelled dataframe with IDs as index/columns.
- Parameters:
fmt –
"pandas"(default) returns apandas.DataFrame;"polars"returns apolars.DataFramewith an extra"id"column (Polars has no row index).- Returns:
Square dataframe of shape
(n, n).- Raises:
ValueError – If fmt is not
"pandas"or"polars".
- class tanat.metric.EditSequenceMetric(entity_metric: EntityMetric | str = 'hamming', indel_cost: float = 1.0, normalize: bool = False, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
SequenceMetricNeedleman-Wunsch edit distance between two sequences.
Computes the minimum-cost alignment between two sequences using a full O(n × m) DP matrix (Needleman-Wunsch). Substitution cost comes from the entity metric; insertions and deletions cost
indel_costeach.When
normalize=True, the raw distance is divided bymax(len_a, len_b)so the result lies in[0, 1].Empty-sequence behaviour:
Both empty →
0.0(no edits needed).One empty →
n × indel_cost(all insertions/deletions).
Example:
edit = EditSequenceMetric(indel_cost=0.5, normalize=True) d = edit(seq_a, seq_b) dm = edit.compute_matrix(pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- SETTINGS_CLASS[source]#
alias of
EditSettings
- __init__(entity_metric: EntityMetric | str = 'hamming', indel_cost: float = 1.0, normalize: bool = False, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- class tanat.metric.EditSettings(*, entity_metric: EntityMetric = 'hamming', indel_cost: float = 1.0, normalize: bool = False)[source]#
Bases:
objectSettings for
EditSequenceMetric.- Parameters:
entity_metric – Entity-level substitution cost metric. Default:
"hamming".indel_cost – Cost per insertion or deletion. Must be > 0. Default: 1.0.
normalize – When
True, divide the raw edit distance bymax(len_a, len_b)to obtain a value in[0, 1]. Default:False.
- entity_metric: EntityMetric = 'hamming'[source]#
- class tanat.metric.EntityMetric(settings: Any = None)[source]#
Bases:
SettingsMixin,Registrable,ABCAbstract base for entity-level distance metrics.
Computes a scalar distance between two
Entityobjects.- IS_SYMMETRIC: bool = True[source]#
Set to
Truewhendist(a, b) == dist(b, a)for all inputs. Subclasses that implement a directional distance must set this toFalseso that the full n² kernel is used instead.
- NUMBA_OPTIM: bool = False[source]#
Subclasses that provide
prepare_batch_data/distance_kernel/ prepare_cross_batch_data set this toTrueto opt into the Numba fast path.
- abstractmethod validate_entity(ent_a: Entity, ent_b: Entity | None = None) None[source]#
Validate one or two entities against this metric’s requirements.
Called from
__call__()and fromvalidate_composition().Implementations should call
_validate_entity_instance()first for the type check, then add metric-specific checks.- Parameters:
ent_a – Primary entity.
ent_b – Optional second entity (
None→ probe single entity only).
- Raises:
TypeError – Wrong argument type or incompatible feature dtype.
KeyError – Required feature absent from the entity.
- class tanat.metric.HammingEntityMetric(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#
Bases:
EntityMetricCategorical Hamming distance between two entities.
Returns
0.0when both entities share the same value for the configured feature, andmismatch_cost(default1.0) when they differ. A customcostdict enables partial costs.Example:
hamming = HammingEntityMetric() hamming(ent_a, ent_b) # 0.0 or 1.0 hamming = HammingEntityMetric( entity_feature="status", cost={("A", "B"): 0.5}, mismatch_cost=0.8, ) hamming(ent_a, ent_b) # looks up in cost dict
- IS_SYMMETRIC: bool = True[source]#
Set to
Truewhendist(a, b) == dist(b, a)for all inputs. Subclasses that implement a directional distance must set this toFalseso that the full n² kernel is used instead.
- NUMBA_OPTIM: bool = True[source]#
Subclasses that provide
prepare_batch_data/distance_kernel/ prepare_cross_batch_data set this toTrueto opt into the Numba fast path.
- SETTINGS_CLASS[source]#
alias of
HammingSettings
- __init__(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0) None[source]#
- property distance_kernel: Callable[source]#
Numba-compiled entity distance kernel (simple or weighted).
- prepare_batch_data(pool: SequencePool) tuple[source]#
Extract and encode the categorical feature for Numba batch computation.
- Returns:
(arrays, lengths, context)
- class tanat.metric.HammingSettings(*, entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#
Bases:
objectSettings for
HammingEntityMetric.- Parameters:
entity_feature – Name of the categorical feature to compare.
None- first entity feature from the pool/entity metadata.cost – Pairwise cost lookup. Keys are
(val_a, val_b)tuples; order does not matter (both(A, B)and(B, A)are checked). Conflicting entries are rejected at construction. Default:None(every mismatch usesmismatch_cost).mismatch_cost – Default cost applied when the pair is not in
costand values differ (default:1.0).
- class tanat.metric.LCPSequenceMetric(entity_metric: EntityMetric | str = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance', *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
SequenceMetricLongest Common Prefix distance between two sequences.
Scans the sequences from the start and counts consecutive positions where the two entities are equal (i.e. their entity distance ≤
equality_threshold). The scan stops at the first mismatch.Three output
modes are available:"length"→ raw prefix length (not a proper distance)."distance"→len_a + len_b − 2·LCP(always ≥ 0)."normalized"→1 − 2·LCP / (len_a + len_b)∈ [0, 1].
Empty-sequence behaviour:
Both empty →
0.0(for all modes).One empty (length n vs 0) → length:
0.0, distance:n, normalized:1.0.
Example:
lcp = LCPSequenceMetric(mode="normalized") d = lcp(seq_a, seq_b) dm = lcp.compute_matrix(pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- SETTINGS_CLASS[source]#
alias of
LCPSettings
- __init__(entity_metric: EntityMetric | str = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance', *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- class tanat.metric.LCPSettings(*, entity_metric: EntityMetric = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance')[source]#
Bases:
objectSettings for
LCPSequenceMetric.- Parameters:
entity_metric – Entity-level metric. Default:
"hamming".equality_threshold – Two entities are considered equal when their entity distance is ≤ this threshold. Must be ≥ 0. Default: 0.0.
mode –
Output mode.
"length"→ raw LCP length (not a distance, can be > 1)."distance"→ additive distance:len_a + len_b − 2·LCP."normalized"``→ Jaccard-like distance: ``1 − 2·LCP / (len_a + len_b), in[0, 1](default).
- entity_metric: EntityMetric = 'hamming'[source]#
- class tanat.metric.LCSSequenceMetric(entity_metric: EntityMetric | str = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance', *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
SequenceMetricLongest Common Subsequence distance between two sequences.
Computes the LCS length using a space-optimised DP (2-row rolling array). Two entities are considered equal when their entity distance ≤
equality_threshold.Three output
modes are available:"length"→ raw LCS length (not a proper distance)."distance"→len_a + len_b − 2·LCS(always ≥ 0)."normalized"→1 − 2·LCS / (len_a + len_b)∈ [0, 1].
Empty-sequence behaviour:
Both empty →
0.0(for all modes).One empty (length n vs 0) → length:
0.0, distance:n, normalized:1.0.
Example:
lcs = LCSSequenceMetric(mode="normalized", equality_threshold=0.1) d = lcs(seq_a, seq_b) dm = lcs.compute_matrix(pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- SETTINGS_CLASS[source]#
alias of
LCSSettings
- __init__(entity_metric: EntityMetric | str = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance', *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- class tanat.metric.LCSSettings(*, entity_metric: EntityMetric = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance')[source]#
Bases:
objectSettings for
LCSSequenceMetric.- Parameters:
entity_metric – Entity-level metric. Default:
"hamming".equality_threshold – Two entities are considered equal when their entity distance is ≤ this threshold. Default: 0.0.
mode –
Output mode.
"length"→ raw LCS length (not a proper distance)."distance"→ additive distance:len_a + len_b − 2·LCS."normalized"→ Jaccard-like:1 − 2·LCS / (len_a + len_b), in [0, 1].
- entity_metric: EntityMetric = 'hamming'[source]#
- class tanat.metric.LinearPairwiseSequenceMetric(entity_metric: EntityMetric | str = 'hamming', agg_fun: str = 'mean', padding_penalty: float | None = None, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
SequenceMetricSequence metric by linear (position-wise) alignment of entities.
Aligns
seq_aandseq_brank-by-rank and applies the configured entity metric to each aligned pair. The resulting vector of entity distances is aggregated (mean, sum, …) to produce a single scalar sequence distance.When sequences differ in length,
padding_penaltyis applied for each unmatched position of the longer sequence. Ifpadding_penaltyisNone, only the overlapping prefix is used.Empty-sequence behaviour:
Both empty →
nan(distance is undefined).One empty, padding_penalty is set → all positions are padded.
One empty, padding_penalty is None →
nanwith a warning suggesting to setpadding_penalty.
Example:
hamming = HammingEntityMetric(entity_feature="status") lp = LinearPairwiseSequenceMetric(entity_metric=hamming) dist = lp(seq_a, seq_b) dm = lp.compute_matrix(pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- SETTINGS_CLASS[source]#
alias of
LinearPairwiseSettings
- __init__(entity_metric: EntityMetric | str = 'hamming', agg_fun: str = 'mean', padding_penalty: float | None = None, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- class tanat.metric.LinearPairwiseSettings(*, entity_metric: EntityMetric = 'hamming', agg_fun: str = 'mean', padding_penalty: float | None = None)[source]#
Bases:
objectSettings for
LinearPairwiseSequenceMetric.- Parameters:
entity_metric – Entity-level metric. Accepts a registration name (string) or an instance. Pydantic auto-resolves strings via
Registrable.__get_pydantic_core_schema__. Default:"hamming".agg_fun – Aggregation function applied to the vector of entity distances. One of
"mean"(default) or"sum".padding_penalty – Distance value used for unmatched positions when sequences have different lengths.
None→ unmatched positions are ignored (only the overlap is aggregated). When the overlap is empty (one sequence has length 0),Nonemakes the distance undefined (nanin a matrix,ValueErroron a direct call).
- entity_metric: EntityMetric = 'hamming'[source]#
- class tanat.metric.SequenceMetric(settings=None, storage: StorageOptions | dict | None = None)[source]#
Bases:
SettingsMixin,Registrable,DisplayMixin,ABCAbstract base for sequence-level distance metrics.
Computes a scalar distance between two
Sequenceobjects and a full pairwiseDistanceMatrixover a pool.- MEMMAP_SUPPORT: bool = False[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- __init__(settings=None, storage: StorageOptions | dict | None = None) None[source]#
- compute_cross_matrix(pool_rows: SequencePool, pool_cols: SequencePool) ndarray[source]#
Compute an asymmetric (n × k) distance matrix between two pools.
Row
i↔ sequenceiin pool_rows; columnj↔ sequencejin pool_cols. The result is not symmetric.Validates both pools (type-check + composition probe), then delegates to
_compute_cross_matrix_impl(). Subclasses override_compute_cross_matrix_impl()to use Numba kernels when available.- Parameters:
pool_rows – Pool whose sequences form the rows (n items).
pool_cols – Pool whose sequences form the columns (k items).
- Returns:
float32 numpy array of shape
(n, k).
- compute_matrix(pool: SequencePool, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') DistanceMatrix[source]#
Compute the full pairwise distance matrix for pool.
- Parameters:
pool – A
SequencePool.store_path – Storage directory (
None→ in-memory).chunk_size – Rows per flush chunk (default 500).
resume – Skip already-computed chunks (default
True).dtype – Numpy dtype for the matrix (default
"float32").
- Returns:
A
DistanceMatrixof shape(n, n).
- property entity_metric: EntityMetric[source]#
Resolve
settings.entity_metric(string → instance or pass-through).- Returns:
The resolved
EntityMetric.- Raises:
AttributeError – If the concrete settings class has no
entity_metricfield.
- abstractmethod validate_composition(seq_a: Sequence, seq_b: Sequence | None = None) None[source]#
Composition compatibility check between this metric and the given sequence(s).
Called from
__call__()with both sequences, and fromcompute_matrix()with a single sample sequence (seq_b=None). Subclasses that compose with anEntityMetricprobe a sample entity to surface compatibility errors early.- Parameters:
seq_a – Primary sequence to probe.
seq_b – Optional second sequence.
- Raises:
TypeError – If the entity feature has an incompatible dtype.
KeyError – If a required feature is absent.
- class tanat.metric.SoftDTWSequenceMetric(entity_metric: EntityMetric | str = 'hamming', gamma: float = 1.0, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
SequenceMetricSoft Dynamic Time Warping distance between two sequences.
Replaces the
minoperator in the DTW recurrence with a differentiable soft-minimum parameterised bygamma:\[\text{soft-min}(a, b, c; \gamma) = -\gamma \log\bigl( e^{-a/\gamma} + e^{-b/\gamma} + e^{-c/\gamma}\bigr)\]As
gamma → 0, SoftDTW converges to standard DTW. Asgamma → ∞, it approaches the mean of all alignment costs.Empty-sequence behaviour:
Both empty →
nan(no alignment possible).One empty →
nan(no alignment possible).
References
Cuturi & Blondel (2017) — Soft-DTW: a Differentiable Loss Function for Time-Series, ICML.
Example:
sdtw = SoftDTWSequenceMetric(gamma=0.1) d = sdtw(seq_a, seq_b) dm = sdtw.compute_matrix(pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- SETTINGS_CLASS[source]#
alias of
SoftDTWSettings
- __init__(entity_metric: EntityMetric | str = 'hamming', gamma: float = 1.0, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- class tanat.metric.SoftDTWSettings(*, entity_metric: EntityMetric = 'hamming', gamma: float = 1.0)[source]#
Bases:
objectSettings for
SoftDTWSequenceMetric.- Parameters:
entity_metric – Entity-level distance metric. Default:
"hamming".gamma – Regularisation parameter for the soft-min operator. Must be > 0. Large values produce a smoother (mean-like) approximation; small values approach standard DTW. Default: 1.0.
- entity_metric: EntityMetric = 'hamming'[source]#
- class tanat.metric.StorageOptions(*, store_path: str | Path, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
objectDisk-backed storage options for distance matrix computation.
- Parameters:
store_path –
Directory where the matrix file and metadata are stored (required). Accepts the same formats as
resolve_path:a plain name (e.g.
"distances") - resolved via workspace store,a relative or absolute path (e.g.
"./distances",Path(...)).
chunk_size – Number of matrix rows computed per chunk before flushing to disk. Larger = fewer I/O ops, smaller = finer resume granularity. Default: 500.
resume – If
True(default), skip chunks already computed. IfFalse, delete and recompute from scratch.dtype – Numpy dtype string for the matrix. Default:
"float32".
- class tanat.metric.TrajectoryMetric(settings=None, storage: StorageOptions | dict | None = None)[source]#
Bases:
SettingsMixin,Registrable,DisplayMixin,ABCAbstract base for trajectory-level distance metrics.
Computes a scalar distance between two
Trajectoryobjects and a full pairwiseDistanceMatrixover aTrajectoryPool.- MEMMAP_SUPPORT: bool = False[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation.
- __init__(settings=None, storage: StorageOptions | dict | None = None) None[source]#
- compute_cross_matrix(pool_rows: TrajectoryPool, pool_cols: TrajectoryPool) ndarray[source]#
Compute an asymmetric (n × k) distance matrix between two pools.
Row
i↔ trajectoryiin pool_rows; columnj↔ trajectoryjin pool_cols.Validates both pools, then delegates to
_compute_cross_matrix_impl(). Subclasses override_compute_cross_matrix_impl()to use optimised kernels.- Parameters:
pool_rows – Pool whose trajectories form the rows (n items).
pool_cols – Pool whose trajectories form the columns (k items).
- Returns:
float32 numpy array of shape
(n, k).
- compute_matrix(pool: TrajectoryPool, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') DistanceMatrix[source]#
Compute full pairwise trajectory distance matrix.
Storage kwargs are forwarded to
StorageOptions.- Parameters:
pool – A
TrajectoryPool.store_path – Storage directory (
None→ in-memory).chunk_size – Rows per flush chunk (default 500).
resume – Skip already-computed chunks (default
True).dtype – Numpy dtype for the matrix (default
"float32").
- Returns: