tanat.metric.sequence.type.lcs package#
Submodules#
tanat.metric.sequence.type.lcs.kernels module#
Numba kernels for LCSSequenceMetric.
All functions are @njit (no Python objects). They operate on int32-encoded
feature arrays produced by the entity metric’s prepare_batch_data.
- Output-mode integer encoding (mirrors
_MODE_MAPin metric.py): 0 → length 1 → distance 2 → normalized
- tanat.metric.sequence.type.lcs.kernels.compute_lcs_matrix(result, start, end, arrays_a, lengths_a, arrays_b, lengths_b, dist_kernel, context, threshold, mode, symmetric)[source]#
Parallel LCS matrix kernel.
Processes rows
[start, end).
- tanat.metric.sequence.type.lcs.kernels.compute_lcs_pair(arr_a, arr_b, len_a, len_b, dist_kernel, context, threshold, mode)[source]#
Compute LCS distance for a single pair of int32-encoded sequences.
Uses a space-optimised 2-row rolling DP. Two entities are considered equal when their distance ≤
threshold.- Parameters:
arr_a – int32-encoded sequence A.
arr_b – int32-encoded sequence B.
len_a – Length of A.
len_b – Length of B.
dist_kernel – Numba entity distance kernel.
context – Opaque context tuple forwarded to
dist_kernel.threshold – Equality threshold (float32).
mode – Integer output mode (0/1/2).
- Returns:
float32 result.
tanat.metric.sequence.type.lcs.metric module#
LCSSequenceMetric: Longest Common Subsequence distance between sequences.
- class tanat.metric.sequence.type.lcs.metric.LCSSequenceMetric(entity_metric: EntityMetric | str = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance', *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
SequenceMetricLongest Common Subsequence distance between two sequences.
Computes the LCS length using a space-optimised DP (2-row rolling array). Two entities are considered equal when their entity distance ≤
equality_threshold.Three output
modes are available:"length"→ raw LCS length (not a proper distance)."distance"→len_a + len_b − 2·LCS(always ≥ 0)."normalized"→1 − 2·LCS / (len_a + len_b)∈ [0, 1].
Empty-sequence behaviour:
Both empty →
0.0(for all modes).One empty (length n vs 0) → length:
0.0, distance:n, normalized:1.0.
Example:
lcs = LCSSequenceMetric(mode="normalized", equality_threshold=0.1) d = lcs(seq_a, seq_b) dm = lcs.compute_matrix(pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- SETTINGS_CLASS[source]#
alias of
LCSSettings
- __init__(entity_metric: EntityMetric | str = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance', *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- class tanat.metric.sequence.type.lcs.metric.LCSSettings(*, entity_metric: EntityMetric = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance')[source]#
Bases:
objectSettings for
LCSSequenceMetric.- Parameters:
entity_metric – Entity-level metric. Default:
"hamming".equality_threshold – Two entities are considered equal when their entity distance is ≤ this threshold. Default: 0.0.
mode –
Output mode.
"length"→ raw LCS length (not a proper distance)."distance"→ additive distance:len_a + len_b − 2·LCS."normalized"→ Jaccard-like:1 − 2·LCS / (len_a + len_b), in [0, 1].
- entity_metric: EntityMetric = 'hamming'[source]#
Module contents#
LCSSequenceMetric package.
- class tanat.metric.sequence.type.lcs.LCSSequenceMetric(entity_metric: EntityMetric | str = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance', *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
SequenceMetricLongest Common Subsequence distance between two sequences.
Computes the LCS length using a space-optimised DP (2-row rolling array). Two entities are considered equal when their entity distance ≤
equality_threshold.Three output
modes are available:"length"→ raw LCS length (not a proper distance)."distance"→len_a + len_b − 2·LCS(always ≥ 0)."normalized"→1 − 2·LCS / (len_a + len_b)∈ [0, 1].
Empty-sequence behaviour:
Both empty →
0.0(for all modes).One empty (length n vs 0) → length:
0.0, distance:n, normalized:1.0.
Example:
lcs = LCSSequenceMetric(mode="normalized", equality_threshold=0.1) d = lcs(seq_a, seq_b) dm = lcs.compute_matrix(pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- SETTINGS_CLASS[source]#
alias of
LCSSettings
- __init__(entity_metric: EntityMetric | str = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance', *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- class tanat.metric.sequence.type.lcs.LCSSettings(*, entity_metric: EntityMetric = 'hamming', equality_threshold: float = 0.0, mode: Literal['length', 'distance', 'normalized'] = 'distance')[source]#
Bases:
objectSettings for
LCSSequenceMetric.- Parameters:
entity_metric – Entity-level metric. Default:
"hamming".equality_threshold – Two entities are considered equal when their entity distance is ≤ this threshold. Default: 0.0.
mode –
Output mode.
"length"→ raw LCS length (not a proper distance)."distance"→ additive distance:len_a + len_b − 2·LCS."normalized"→ Jaccard-like:1 − 2·LCS / (len_a + len_b), in [0, 1].
- entity_metric: EntityMetric = 'hamming'[source]#