tanat.metric.sequence.type.edit package#
Submodules#
tanat.metric.sequence.type.edit.kernels module#
Numba kernels for EditSequenceMetric.
All functions are @njit (no Python objects). They operate on int32-encoded
feature arrays produced by the entity metric’s prepare_batch_data.
Implementation: 2-row rolling Needleman-Wunsch DP. Only two rows of the full (n+1) × (m+1) matrix are kept in memory at any time.
- tanat.metric.sequence.type.edit.kernels.compute_edit_matrix(result, start, end, arrays_a, lengths_a, arrays_b, lengths_b, dist_kernel, context, indel_cost, normalize, symmetric)[source]#
Parallel Edit matrix kernel.
Processes rows
[start, end).
- tanat.metric.sequence.type.edit.kernels.compute_edit_pair(arr_a, arr_b, len_a, len_b, dist_kernel, context, indel_cost, normalize)[source]#
Compute Needleman-Wunsch edit distance for a single pair.
Uses a 2-row rolling DP (O(m) space, O(n×m) time).
- Parameters:
arr_a – int32-encoded sequence A.
arr_b – int32-encoded sequence B.
len_a – Length of A.
len_b – Length of B.
dist_kernel – Numba entity distance kernel (substitution cost).
context – Opaque context tuple forwarded to
dist_kernel.indel_cost – Cost per insertion / deletion (float32).
normalize – When
True, divide result bymax(len_a, len_b).
- Returns:
float32 edit distance.
tanat.metric.sequence.type.edit.metric module#
EditSequenceMetric: Needleman-Wunsch edit distance between sequences.
- class tanat.metric.sequence.type.edit.metric.EditSequenceMetric(entity_metric: EntityMetric | str = 'hamming', indel_cost: float = 1.0, normalize: bool = False, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
SequenceMetricNeedleman-Wunsch edit distance between two sequences.
Computes the minimum-cost alignment between two sequences using a full O(n × m) DP matrix (Needleman-Wunsch). Substitution cost comes from the entity metric; insertions and deletions cost
indel_costeach.When
normalize=True, the raw distance is divided bymax(len_a, len_b)so the result lies in[0, 1].Empty-sequence behaviour:
Both empty →
0.0(no edits needed).One empty →
n × indel_cost(all insertions/deletions).
Example:
edit = EditSequenceMetric(indel_cost=0.5, normalize=True) d = edit(seq_a, seq_b) dm = edit.compute_matrix(pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- SETTINGS_CLASS[source]#
alias of
EditSettings
- __init__(entity_metric: EntityMetric | str = 'hamming', indel_cost: float = 1.0, normalize: bool = False, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- class tanat.metric.sequence.type.edit.metric.EditSettings(*, entity_metric: EntityMetric = 'hamming', indel_cost: float = 1.0, normalize: bool = False)[source]#
Bases:
objectSettings for
EditSequenceMetric.- Parameters:
entity_metric – Entity-level substitution cost metric. Default:
"hamming".indel_cost – Cost per insertion or deletion. Must be > 0. Default: 1.0.
normalize – When
True, divide the raw edit distance bymax(len_a, len_b)to obtain a value in[0, 1]. Default:False.
- entity_metric: EntityMetric = 'hamming'[source]#
Module contents#
EditSequenceMetric package.
- class tanat.metric.sequence.type.edit.EditSequenceMetric(entity_metric: EntityMetric | str = 'hamming', indel_cost: float = 1.0, normalize: bool = False, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32')[source]#
Bases:
SequenceMetricNeedleman-Wunsch edit distance between two sequences.
Computes the minimum-cost alignment between two sequences using a full O(n × m) DP matrix (Needleman-Wunsch). Substitution cost comes from the entity metric; insertions and deletions cost
indel_costeach.When
normalize=True, the raw distance is divided bymax(len_a, len_b)so the result lies in[0, 1].Empty-sequence behaviour:
Both empty →
0.0(no edits needed).One empty →
n × indel_cost(all insertions/deletions).
Example:
edit = EditSequenceMetric(indel_cost=0.5, normalize=True) d = edit(seq_a, seq_b) dm = edit.compute_matrix(pool)
- MEMMAP_SUPPORT: bool = True[source]#
Set to
Truein subclasses that implement disk-backed (memmap) computation. WhenFalse, passingstore_pathor an instance-levelStorageOptionsraisesNotImplementedErrorearly with a clear message.
- SETTINGS_CLASS[source]#
alias of
EditSettings
- __init__(entity_metric: EntityMetric | str = 'hamming', indel_cost: float = 1.0, normalize: bool = False, *, store_path: str | Path | None = None, chunk_size: int = 500, resume: bool = True, dtype: str = 'float32') None[source]#
- class tanat.metric.sequence.type.edit.EditSettings(*, entity_metric: EntityMetric = 'hamming', indel_cost: float = 1.0, normalize: bool = False)[source]#
Bases:
objectSettings for
EditSequenceMetric.- Parameters:
entity_metric – Entity-level substitution cost metric. Default:
"hamming".indel_cost – Cost per insertion or deletion. Must be > 0. Default: 1.0.
normalize – When
True, divide the raw edit distance bymax(len_a, len_b)to obtain a value in[0, 1]. Default:False.
- entity_metric: EntityMetric = 'hamming'[source]#