tanat.metric.entity package#

Subpackages#

Submodules#

tanat.metric.entity.base module#

EntityMetric ABC: base class for all entity-level distance metrics.

class tanat.metric.entity.base.EntityMetric(settings: Any = None)[source]#

Bases: SettingsMixin, Registrable, ABC

Abstract base for entity-level distance metrics.

Computes a scalar distance between two Entity objects.

IS_SYMMETRIC: bool = True[source]#

Set to True when dist(a, b) == dist(b, a) for all inputs. Subclasses that implement a directional distance must set this to False so that the full n² kernel is used instead.

NUMBA_OPTIM: bool = False[source]#

Subclasses that provide prepare_batch_data / distance_kernel / prepare_cross_batch_data set this to True to opt into the Numba fast path.

abstractmethod validate_entity(ent_a: Entity, ent_b: Entity | None = None) None[source]#

Validate one or two entities against this metric’s requirements.

Called from __call__() and from validate_composition().

Implementations should call _validate_entity_instance() first for the type check, then add metric-specific checks.

Parameters:
  • ent_a – Primary entity.

  • ent_b – Optional second entity (None → probe single entity only).

Raises:
  • TypeError – Wrong argument type or incompatible feature dtype.

  • KeyError – Required feature absent from the entity.

Module contents#

Entity metric sub-package.

class tanat.metric.entity.EntityMetric(settings: Any = None)[source]#

Bases: SettingsMixin, Registrable, ABC

Abstract base for entity-level distance metrics.

Computes a scalar distance between two Entity objects.

IS_SYMMETRIC: bool = True[source]#

Set to True when dist(a, b) == dist(b, a) for all inputs. Subclasses that implement a directional distance must set this to False so that the full n² kernel is used instead.

NUMBA_OPTIM: bool = False[source]#

Subclasses that provide prepare_batch_data / distance_kernel / prepare_cross_batch_data set this to True to opt into the Numba fast path.

abstractmethod validate_entity(ent_a: Entity, ent_b: Entity | None = None) None[source]#

Validate one or two entities against this metric’s requirements.

Called from __call__() and from validate_composition().

Implementations should call _validate_entity_instance() first for the type check, then add metric-specific checks.

Parameters:
  • ent_a – Primary entity.

  • ent_b – Optional second entity (None → probe single entity only).

Raises:
  • TypeError – Wrong argument type or incompatible feature dtype.

  • KeyError – Required feature absent from the entity.

class tanat.metric.entity.HammingEntityMetric(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#

Bases: EntityMetric

Categorical Hamming distance between two entities.

Returns 0.0 when both entities share the same value for the configured feature, and mismatch_cost (default 1.0) when they differ. A custom cost dict enables partial costs.

Example:

hamming = HammingEntityMetric()
hamming(ent_a, ent_b)                           # 0.0 or 1.0

hamming = HammingEntityMetric(
    entity_feature="status",
    cost={("A", "B"): 0.5},
    mismatch_cost=0.8,
)
hamming(ent_a, ent_b)                           # looks up in cost dict
IS_SYMMETRIC: bool = True[source]#

Set to True when dist(a, b) == dist(b, a) for all inputs. Subclasses that implement a directional distance must set this to False so that the full n² kernel is used instead.

NUMBA_OPTIM: bool = True[source]#

Subclasses that provide prepare_batch_data / distance_kernel / prepare_cross_batch_data set this to True to opt into the Numba fast path.

SETTINGS_CLASS[source]#

alias of HammingSettings

__init__(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0) None[source]#
property distance_kernel: Callable[source]#

Numba-compiled entity distance kernel (simple or weighted).

prepare_batch_data(pool: SequencePool) tuple[source]#

Extract and encode the categorical feature for Numba batch computation.

Returns:

(arrays, lengths, context)

prepare_cross_batch_data(pool_rows, pool_cols) tuple[source]#

Encode two pools with a shared vocabulary for cross-distance.

Returns:

(arrays_rows, lengths_rows, arrays_cols, lengths_cols, context)

validate_entity(ent_a: Entity, ent_b: Entity | None = None) None[source]#

Verify the configured feature exists and is categorical.

class tanat.metric.entity.HammingSettings(*, entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#

Bases: object

Settings for HammingEntityMetric.

Parameters:
  • entity_feature – Name of the categorical feature to compare. None - first entity feature from the pool/entity metadata.

  • cost – Pairwise cost lookup. Keys are (val_a, val_b) tuples; order does not matter (both (A, B) and (B, A) are checked). Conflicting entries are rejected at construction. Default: None (every mismatch uses mismatch_cost).

  • mismatch_cost – Default cost applied when the pair is not in cost and values differ (default: 1.0).

__init__(*args: Any, **kwargs: Any) None[source]#
cost: dict[tuple, float] | None = None[source]#
entity_feature: str | None = None[source]#
mismatch_cost: float = 1.0[source]#
model_dump(*, mode='python', **dump_kwargs)[source]#

Dump settings to a dict via Pydantic serialization.

classmethod validate_cost_symmetry(v)[source]#

Reject cost dicts with conflicting asymmetric entries.