tanat.metric.entity package#

Subpackages#

tanat.metric.entity.type package
- Subpackages
- Module contents

Submodules#

tanat.metric.entity.base module#

EntityMetric ABC: base class for all entity-level distance metrics.

class tanat.metric.entity.base.EntityMetric(settings: Any = None)[source]#

Bases: SettingsMixin, Registrable, ABC

Abstract base for entity-level distance metrics.

Computes a scalar distance between two Entity objects.

IS_SYMMETRIC: bool = True[source]#: Set to True when dist(a, b) == dist(b, a) for all inputs. Subclasses that implement a directional distance must set this to False so that the full n² kernel is used instead.

NUMBA_OPTIM: bool = False[source]#: Subclasses that provide prepare_batch_data / distance_kernel / prepare_cross_batch_data set this to True to opt into the Numba fast path.

abstractmethod validate_entity(ent_a: Entity, ent_b: Entity | None = None) → None[source]#

Validate one or two entities against this metric’s requirements.

Called from __call__() and from validate_composition().

Implementations should call _validate_entity_instance() first for the type check, then add metric-specific checks.

Parameters:

ent_a – Primary entity.
ent_b – Optional second entity (None → probe single entity only).

Raises:

TypeError – Wrong argument type or incompatible feature dtype.
KeyError – Required feature absent from the entity.

class tanat.metric.entity.base.EntityMetricSettings[source]#

Bases: ABC

Abstract base for settings of an entity-level metric

__init__(*args: Any, **kwargs: Any) → None[source]#

model_dump(*, mode='python', **dump_kwargs)[source]#: Dump settings to a dict via Pydantic serialization.

Module contents#

Entity metric sub-package.

class tanat.metric.entity.CombinedEntityMetric(metrics_config: list[dict[str, Any]], weights: list | None = None, agg: str = 'sum')[source]#

Bases: EntityMetric

Metric between entities that involves several entity metrics to aggregate. The entity metrics can involve several entity features but also combine different manner to compare the same entity feature.

Parameters:

metrics_config – List of metric configuration (dictionaries). see the example below to define easily the configuration from other existing classes.
agg – Aggregate function name (default: ‘sum’). At the time, we only implemented the sum aggregate function.
weights – List of weights for the aggregation function (optional, default: None). If defined, this list must contains as much real values as the number of metrics

Example:

metric = CombinedEntityMetric(
    metrics_config=[
        L2EntityMetric(entity_feature="value").to_config(),
        HammingEntityMetric(entity_feature="status").to_config(),
    ],
    weights=[0.7, 0.3],
)

metric(ent_a, ent_b)

warning:

It is not possible to use the combined entity metric as an element of the
metrics to combine.
This prevent recursive definition of metrics that may be problematics.

NUMBA_OPTIM: bool = False[source]#: Subclasses that provide prepare_batch_data / distance_kernel / prepare_cross_batch_data set this to True to opt into the Numba fast path.

SETTINGS_CLASS[source]#: alias of CombinedEntityMetricSettings

__init__(metrics_config: list[dict[str, Any]], weights: list | None = None, agg: str = 'sum') → None[source]#

validate_entity(ent_a: Entity, ent_b: Entity | None = None) → None[source]#: Verify the configured feature exists and validate the requirements for each underlined metrics.

class tanat.metric.entity.CombinedEntityMetricSettings(*, metrics_config: list = <factory>, weights: list | None = None, agg: str = 'sum')[source]#

Bases: EntityMetricSettings

Settings for CombinedEntityMetric. The settings defines the metrics, their weights and how to aggregate them.

Parameters:

metrics_config – List of metric configuration (dictionaries). see the example below to define easily the configuration from other existing classes.
agg – Aggregate function name (default: ‘sum’). At the time, we only implemented the sum aggregate function.
weights – List of weights for the aggregation function (optional, default: None). If defined, this list must contains as much real values as the number of metrics

Example:

settings = CombinedEntityMetricSettings(
    metrics_config=[
        L2EntityMetric(entity_feature="value").to_config(),
        HammingEntityMetric(entity_feature="status").to_config(),
    ],
    weights=[0.7, 0.3],
)

warning:

It is not possible to use the combined entity metric as an element of the
metrics to combine.
This prevent recursive definition of metrics that may be problematics.

__init__(*args: Any, **kwargs: Any) → None[source]#

agg: str = 'sum'[source]#

metrics_config: list[source]#

model_dump(*, mode='python', **dump_kwargs)[source]#: Dump settings to a dict via Pydantic serialization.

weights: list | None = None[source]#

class tanat.metric.entity.EntityMetric(settings: Any = None)[source]#

Bases: SettingsMixin, Registrable, ABC

Abstract base for entity-level distance metrics.

Computes a scalar distance between two Entity objects.

IS_SYMMETRIC: bool = True[source]#: Set to True when dist(a, b) == dist(b, a) for all inputs. Subclasses that implement a directional distance must set this to False so that the full n² kernel is used instead.

NUMBA_OPTIM: bool = False[source]#: Subclasses that provide prepare_batch_data / distance_kernel / prepare_cross_batch_data set this to True to opt into the Numba fast path.

abstractmethod validate_entity(ent_a: Entity, ent_b: Entity | None = None) → None[source]#

Validate one or two entities against this metric’s requirements.

Called from __call__() and from validate_composition().

Implementations should call _validate_entity_instance() first for the type check, then add metric-specific checks.

Parameters:

ent_a – Primary entity.
ent_b – Optional second entity (None → probe single entity only).

Raises:

TypeError – Wrong argument type or incompatible feature dtype.
KeyError – Required feature absent from the entity.

class tanat.metric.entity.HammingEntityMetric(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#

Bases: EntityMetric

Categorical Hamming distance between two entities.

Returns 0.0 when both entities share the same value for the configured feature, and mismatch_cost (default 1.0) when they differ. A custom cost dict enables partial costs.

Example:

hamming = HammingEntityMetric()
hamming(ent_a, ent_b)                           # 0.0 or 1.0

hamming = HammingEntityMetric(
    entity_feature="status",
    cost={("A", "B"): 0.5},
    mismatch_cost=0.8,
)
hamming(ent_a, ent_b)                           # looks up in cost dict

IS_SYMMETRIC: bool = True[source]#: Set to True when dist(a, b) == dist(b, a) for all inputs. Subclasses that implement a directional distance must set this to False so that the full n² kernel is used instead.

NUMBA_OPTIM: bool = True[source]#: Subclasses that provide prepare_batch_data / distance_kernel / prepare_cross_batch_data set this to True to opt into the Numba fast path.

SETTINGS_CLASS[source]#: alias of HammingSettings

__init__(entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0) → None[source]#

property distance_kernel: Callable[source]#: Numba-compiled entity distance kernel (simple or weighted).

prepare_batch_data(pool: SequencePool) → tuple[source]#

Extract and encode the categorical feature for Numba batch computation.

Returns:: (arrays, lengths, context)

prepare_cross_batch_data(pool_rows, pool_cols) → tuple[source]#

Encode two pools with a shared vocabulary for cross-distance.

Returns:: (arrays_rows, lengths_rows, arrays_cols, lengths_cols, context)

validate_entity(ent_a: Entity, ent_b: Entity | None = None) → None[source]#: Verify the configured feature exists and is categorical.

class tanat.metric.entity.HammingSettings(*, entity_feature: str | None = None, cost: dict[tuple, float] | None = None, mismatch_cost: float = 1.0)[source]#

Bases: EntityMetricSettings

Settings for HammingEntityMetric.

Parameters:

entity_feature – Name of the categorical feature to compare. None - first categorical entity feature from the pool/entity metadata.
cost – Pairwise cost lookup. Keys are (val_a, val_b) tuples; order does not matter (both (A, B) and (B, A) are checked). Conflicting entries are rejected at construction. Default: None (every mismatch uses mismatch_cost).
mismatch_cost – Default cost applied when the pair is not in cost and values differ (default: 1.0).

__init__(*args: Any, **kwargs: Any) → None[source]#

cost: dict[tuple, float] | None = None[source]#

entity_feature: str | None = None[source]#

mismatch_cost: float = 1.0[source]#

model_dump(*, mode='python', **dump_kwargs)[source]#: Dump settings to a dict via Pydantic serialization.

classmethod validate_cost_symmetry(v)[source]#: Reject cost dicts with conflicting asymmetric entries.

class tanat.metric.entity.L2EntityMetric(entity_feature: str | None = None, nan_cost: float = 1.0, normalize: bool = True)[source]#

Bases: EntityMetric

Numerical distance between two entities evaluated as the squared difference of values (no square root applied).

Returns the L2 distance between feature values when both are defined, and nan_cost value in case there is a NaN.

If no entity_feature provided when the metric is created, a feature will be defined automatically as the first numerical attribute found when the metric is applied a first time to an entity. The updated feature name is then frozen for future usages. If there is no numerical attribute, an error is raised.

Example:

metric = L2EntityMetric(
    entity_feature="value",
    nan_cost=0.8,
)
metric(ent_a, ent_b)

The normalized version of the L2 metric evaluate the quantity $ rac{(f_1-f_2)^2}{f_1^2+f_2^2}$ that is between 0 (when $f_1$ equals $f_2$) and 1 (when $f_1$ is null for instance).

NUMBA_OPTIM: bool = False[source]#: Subclasses that provide prepare_batch_data / distance_kernel / prepare_cross_batch_data set this to True to opt into the Numba fast path.

SETTINGS_CLASS[source]#: alias of L2Settings

__init__(entity_feature: str | None = None, nan_cost: float = 1.0, normalize: bool = True) → None[source]#

validate_entity(ent_a: Entity, ent_b: Entity | None = None) → None[source]#: Verify the configured feature exists and is numerical.

class tanat.metric.entity.L2Settings(*, entity_feature: str | None = None, nan_cost: float = 1.0, normalize: bool = True)[source]#

Bases: EntityMetricSettings

Settings for L2EntityMetric.

Parameters:

entity_feature – Name of the numerical feature to compare. None - first entity feature from the pool/entity metadata.
nan_cost – Default cost applied when at least one value is NaN (default: 1.0).
normalize – Compute the absolute relative difference of the values to ensure to have a value between 0 and 1.

__init__(*args: Any, **kwargs: Any) → None[source]#

entity_feature: str | None = None[source]#

model_dump(*, mode='python', **dump_kwargs)[source]#: Dump settings to a dict via Pydantic serialization.

nan_cost: float = 1.0[source]#

normalize: bool = True[source]#