tanat.metric.entity.type.hamming package#
Submodules#
tanat.metric.entity.type.hamming.kernels module#
Numba kernels for HammingEntityMetric.
These are pure Numba functions (no Python objects, no classes). They receive integer-encoded feature values and a context tuple.
- tanat.metric.entity.type.hamming.kernels.hamming_dist_simple(a, b, context)[source]#
1.0 if different, 0.0 if equal. context is ignored.
- Parameters:
a – Integer code for the first entity’s feature value.
b – Integer code for the second entity’s feature value.
context – Unused (empty tuple for this kernel).
- Returns:
0.0ifa == b,1.0otherwise.
- tanat.metric.entity.type.hamming.kernels.hamming_dist_weighted(a, b, context)[source]#
Lookup in cost matrix: context[0][a, b].
- Parameters:
a – Integer code for the first entity’s feature value.
b – Integer code for the second entity’s feature value.
context – One-element tuple containing the (V × V) float32 cost matrix.
- Returns:
the pre-built pairwise cost.
- Return type:
context[0][a, b]
tanat.metric.entity.type.hamming.metric module#
HammingEntityMetric: categorical feature distance by Hamming equality.
- class tanat.metric.entity.type.hamming.metric.HammingEntityMetric(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#
Bases:
EntityMetricCategorical Hamming distance between two entities.
Returns
0.0when both entities share the same value for the configured feature, andmismatch_cost(default1.0) when they differ. A customcostdict enables partial costs.Example:
hamming = HammingEntityMetric() hamming(ent_a, ent_b) # 0.0 or 1.0 hamming = HammingEntityMetric( entity_feature="status", cost={("A", "B"): 0.5}, mismatch_cost=0.8, ) hamming(ent_a, ent_b) # looks up in cost dict
- IS_SYMMETRIC: bool = True[source]#
Set to
Truewhendist(a, b) == dist(b, a)for all inputs. Subclasses that implement a directional distance must set this toFalseso that the full n² kernel is used instead.
- NUMBA_OPTIM: bool = True[source]#
Subclasses that provide
prepare_batch_data/distance_kernel/ prepare_cross_batch_data set this toTrueto opt into the Numba fast path.
- SETTINGS_CLASS[source]#
alias of
HammingSettings
- __init__(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0) None[source]#
- property distance_kernel: Callable[source]#
Numba-compiled entity distance kernel (simple or weighted).
- prepare_batch_data(pool: SequencePool) tuple[source]#
Extract and encode the categorical feature for Numba batch computation.
- Returns:
(arrays, lengths, context)
- class tanat.metric.entity.type.hamming.metric.HammingSettings(*, entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#
Bases:
objectSettings for
HammingEntityMetric.- Parameters:
entity_feature – Name of the categorical feature to compare.
None- first entity feature from the pool/entity metadata.cost – Pairwise cost lookup. Keys are
(val_a, val_b)tuples; order does not matter (both(A, B)and(B, A)are checked). Conflicting entries are rejected at construction. Default:None(every mismatch usesmismatch_cost).mismatch_cost – Default cost applied when the pair is not in
costand values differ (default:1.0).
Module contents#
HammingEntityMetric package.
- class tanat.metric.entity.type.hamming.HammingEntityMetric(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#
Bases:
EntityMetricCategorical Hamming distance between two entities.
Returns
0.0when both entities share the same value for the configured feature, andmismatch_cost(default1.0) when they differ. A customcostdict enables partial costs.Example:
hamming = HammingEntityMetric() hamming(ent_a, ent_b) # 0.0 or 1.0 hamming = HammingEntityMetric( entity_feature="status", cost={("A", "B"): 0.5}, mismatch_cost=0.8, ) hamming(ent_a, ent_b) # looks up in cost dict
- IS_SYMMETRIC: bool = True[source]#
Set to
Truewhendist(a, b) == dist(b, a)for all inputs. Subclasses that implement a directional distance must set this toFalseso that the full n² kernel is used instead.
- NUMBA_OPTIM: bool = True[source]#
Subclasses that provide
prepare_batch_data/distance_kernel/ prepare_cross_batch_data set this toTrueto opt into the Numba fast path.
- SETTINGS_CLASS[source]#
alias of
HammingSettings
- __init__(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0) None[source]#
- property distance_kernel: Callable[source]#
Numba-compiled entity distance kernel (simple or weighted).
- prepare_batch_data(pool: SequencePool) tuple[source]#
Extract and encode the categorical feature for Numba batch computation.
- Returns:
(arrays, lengths, context)
- class tanat.metric.entity.type.hamming.HammingSettings(*, entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#
Bases:
objectSettings for
HammingEntityMetric.- Parameters:
entity_feature – Name of the categorical feature to compare.
None- first entity feature from the pool/entity metadata.cost – Pairwise cost lookup. Keys are
(val_a, val_b)tuples; order does not matter (both(A, B)and(B, A)are checked). Conflicting entries are rejected at construction. Default:None(every mismatch usesmismatch_cost).mismatch_cost – Default cost applied when the pair is not in
costand values differ (default:1.0).