tanat.metric.entity.type.hamming package#

Submodules#

tanat.metric.entity.type.hamming.kernels module#

Numba kernels for HammingEntityMetric.

These are pure Numba functions (no Python objects, no classes). They receive integer-encoded feature values and a context tuple.

tanat.metric.entity.type.hamming.kernels.hamming_dist_simple(a, b, context)[source]#

1.0 if different, 0.0 if equal. context is ignored.

Parameters:
  • a – Integer code for the first entity’s feature value.

  • b – Integer code for the second entity’s feature value.

  • context – Unused (empty tuple for this kernel).

Returns:

0.0 if a == b, 1.0 otherwise.

tanat.metric.entity.type.hamming.kernels.hamming_dist_weighted(a, b, context)[source]#

Lookup in cost matrix: context[0][a, b].

Parameters:
  • a – Integer code for the first entity’s feature value.

  • b – Integer code for the second entity’s feature value.

  • context – One-element tuple containing the (V × V) float32 cost matrix.

Returns:

the pre-built pairwise cost.

Return type:

context[0][a, b]

tanat.metric.entity.type.hamming.metric module#

HammingEntityMetric: categorical feature distance by Hamming equality.

class tanat.metric.entity.type.hamming.metric.HammingEntityMetric(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#

Bases: EntityMetric

Categorical Hamming distance between two entities.

Returns 0.0 when both entities share the same value for the configured feature, and mismatch_cost (default 1.0) when they differ. A custom cost dict enables partial costs.

Example:

hamming = HammingEntityMetric()
hamming(ent_a, ent_b)                           # 0.0 or 1.0

hamming = HammingEntityMetric(
    entity_feature="status",
    cost={("A", "B"): 0.5},
    mismatch_cost=0.8,
)
hamming(ent_a, ent_b)                           # looks up in cost dict
IS_SYMMETRIC: bool = True[source]#

Set to True when dist(a, b) == dist(b, a) for all inputs. Subclasses that implement a directional distance must set this to False so that the full n² kernel is used instead.

NUMBA_OPTIM: bool = True[source]#

Subclasses that provide prepare_batch_data / distance_kernel / prepare_cross_batch_data set this to True to opt into the Numba fast path.

SETTINGS_CLASS[source]#

alias of HammingSettings

__init__(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0) None[source]#
property distance_kernel: Callable[source]#

Numba-compiled entity distance kernel (simple or weighted).

prepare_batch_data(pool: SequencePool) tuple[source]#

Extract and encode the categorical feature for Numba batch computation.

Returns:

(arrays, lengths, context)

prepare_cross_batch_data(pool_rows, pool_cols) tuple[source]#

Encode two pools with a shared vocabulary for cross-distance.

Returns:

(arrays_rows, lengths_rows, arrays_cols, lengths_cols, context)

validate_entity(ent_a: Entity, ent_b: Entity | None = None) None[source]#

Verify the configured feature exists and is categorical.

class tanat.metric.entity.type.hamming.metric.HammingSettings(*, entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#

Bases: object

Settings for HammingEntityMetric.

Parameters:
  • entity_feature – Name of the categorical feature to compare. None - first entity feature from the pool/entity metadata.

  • cost – Pairwise cost lookup. Keys are (val_a, val_b) tuples; order does not matter (both (A, B) and (B, A) are checked). Conflicting entries are rejected at construction. Default: None (every mismatch uses mismatch_cost).

  • mismatch_cost – Default cost applied when the pair is not in cost and values differ (default: 1.0).

__init__(*args: Any, **kwargs: Any) None[source]#
cost: dict[tuple, float] | None = None[source]#
entity_feature: str | None = None[source]#
mismatch_cost: float = 1.0[source]#
model_dump(*, mode='python', **dump_kwargs)[source]#

Dump settings to a dict via Pydantic serialization.

classmethod validate_cost_symmetry(v)[source]#

Reject cost dicts with conflicting asymmetric entries.

Module contents#

HammingEntityMetric package.

class tanat.metric.entity.type.hamming.HammingEntityMetric(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#

Bases: EntityMetric

Categorical Hamming distance between two entities.

Returns 0.0 when both entities share the same value for the configured feature, and mismatch_cost (default 1.0) when they differ. A custom cost dict enables partial costs.

Example:

hamming = HammingEntityMetric()
hamming(ent_a, ent_b)                           # 0.0 or 1.0

hamming = HammingEntityMetric(
    entity_feature="status",
    cost={("A", "B"): 0.5},
    mismatch_cost=0.8,
)
hamming(ent_a, ent_b)                           # looks up in cost dict
IS_SYMMETRIC: bool = True[source]#

Set to True when dist(a, b) == dist(b, a) for all inputs. Subclasses that implement a directional distance must set this to False so that the full n² kernel is used instead.

NUMBA_OPTIM: bool = True[source]#

Subclasses that provide prepare_batch_data / distance_kernel / prepare_cross_batch_data set this to True to opt into the Numba fast path.

SETTINGS_CLASS[source]#

alias of HammingSettings

__init__(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0) None[source]#
property distance_kernel: Callable[source]#

Numba-compiled entity distance kernel (simple or weighted).

prepare_batch_data(pool: SequencePool) tuple[source]#

Extract and encode the categorical feature for Numba batch computation.

Returns:

(arrays, lengths, context)

prepare_cross_batch_data(pool_rows, pool_cols) tuple[source]#

Encode two pools with a shared vocabulary for cross-distance.

Returns:

(arrays_rows, lengths_rows, arrays_cols, lengths_cols, context)

validate_entity(ent_a: Entity, ent_b: Entity | None = None) None[source]#

Verify the configured feature exists and is categorical.

class tanat.metric.entity.type.hamming.HammingSettings(*, entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#

Bases: object

Settings for HammingEntityMetric.

Parameters:
  • entity_feature – Name of the categorical feature to compare. None - first entity feature from the pool/entity metadata.

  • cost – Pairwise cost lookup. Keys are (val_a, val_b) tuples; order does not matter (both (A, B) and (B, A) are checked). Conflicting entries are rejected at construction. Default: None (every mismatch uses mismatch_cost).

  • mismatch_cost – Default cost applied when the pair is not in cost and values differ (default: 1.0).

__init__(*args: Any, **kwargs: Any) None[source]#
cost: dict[tuple, float] | None = None[source]#
entity_feature: str | None = None[source]#
mismatch_cost: float = 1.0[source]#
model_dump(*, mode='python', **dump_kwargs)[source]#

Dump settings to a dict via Pydantic serialization.

classmethod validate_cost_symmetry(v)[source]#

Reject cost dicts with conflicting asymmetric entries.