tanat.clustering package#
Subpackages#
- tanat.clustering.type package
- Submodules
- tanat.clustering.type.clara module
- tanat.clustering.type.hierarchical module
- tanat.clustering.type.pam module
- Module contents
Submodules#
tanat.clustering.base module#
Clusterer ABC: base class for all clustering algorithms.
- class tanat.clustering.base.Clusterer(settings=None)[source]#
Bases:
SettingsMixin,Registrable,DisplayMixin,ABCAbstract base class for all clustering algorithms.
- fit(pool: SequencePool | TrajectoryPool) Self[source]#
Fit the clustering model to pool.
- Returns:
selffor method chaining.
Module contents#
Clustering module.
- class tanat.clustering.CLARAClusterer(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', sampling_ratio: float = 0.1, nb_pam_instances: int = 5, n_clusters: int = 2, max_iter: int = 50, random_state: int | None = None, cluster_column: str = '__CLARA_CLUSTERS__')[source]#
Bases:
MedoidMixin,ClustererCLARA (Clustering Large Applications), sampling-based PAM.
Runs PAM on nb_pam_instances random sub-samples of the pool, evaluates each result on the full pool, and keeps the best medoids.
Example:
clara = CLARAClusterer( metric="linearpairwise", n_clusters=5, sampling_ratio=0.1, nb_pam_instances=5, random_state=42, ) clara.fit(pool) clara.medoids # best medoids across all PAM instances
- SETTINGS_CLASS[source]#
alias of
CLARASettings
- __init__(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', sampling_ratio: float = 0.1, nb_pam_instances: int = 5, n_clusters: int = 2, max_iter: int = 50, random_state: int | None = None, cluster_column: str = '__CLARA_CLUSTERS__') None[source]#
Initialise with the given settings.
- class tanat.clustering.CLARASettings(*, metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', sampling_ratio: float = 0.1, nb_pam_instances: int = 5, n_clusters: int = 2, max_iter: int = 50, random_state: int | None = None, cluster_column: str = '__CLARA_CLUSTERS__')[source]#
Bases:
objectSettings for
CLARAClusterer.- Parameters:
metric – Metric name or instance. Default:
"linearpairwise".sampling_ratio – Fraction of the pool used per PAM instance. Must be in
(0, 1]. Default:0.1.nb_pam_instances – Number of PAM runs on independent random samples. Default:
5.n_clusters – Number of medoids per PAM run. Default:
2.max_iter – Maximum SWAP iterations per PAM run. Default:
50.random_state – Seed for the random number generator.
None→ non-reproducible.cluster_column – Static-feature column name injected after
fit().
- metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise'[source]#
- class tanat.clustering.Cluster(cluster_id: int, items: list)[source]#
Bases:
objectA cluster of items produced by a
Clusterer.Immutable value object: populated once after
fit()and never mutated.- Parameters:
cluster_id – Integer identifier for the cluster.
items – List of item IDs belonging to this cluster.
- class tanat.clustering.Clusterer(settings=None)[source]#
Bases:
SettingsMixin,Registrable,DisplayMixin,ABCAbstract base class for all clustering algorithms.
- fit(pool: SequencePool | TrajectoryPool) Self[source]#
Fit the clustering model to pool.
- Returns:
selffor method chaining.
- class tanat.clustering.HierarchicalClusterer(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, distance_threshold: float | None = None, linkage: str = 'complete', cluster_column: str = '__HCLUSTERS__')[source]#
Bases:
ClustererAgglomerative hierarchical clustering (sklearn
AgglomerativeClustering).Consumes a precomputed
DistanceMatrixwithmetric="precomputed".Example:
clusterer = HierarchicalClusterer(metric="linearpairwise", n_clusters=5) clusterer.fit(pool) clusterer.clusters # list[Cluster]
- SETTINGS_CLASS[source]#
alias of
HierarchicalSettings
- __init__(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, distance_threshold: float | None = None, linkage: str = 'complete', cluster_column: str = '__HCLUSTERS__') None[source]#
Initialise with the given settings.
- class tanat.clustering.HierarchicalSettings(*, metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, distance_threshold: float | None = None, linkage: str = 'complete', cluster_column: str = '__HCLUSTERS__')[source]#
Bases:
objectSettings for
HierarchicalClusterer.- Parameters:
metric – Metric name or instance.
n_clusters – Target number of clusters (ignored when distance_threshold is set).
distance_threshold – Cut-off distance for dendrogram trimming.
linkage –
"complete","average","single", or"ward".cluster_column – Static-feature column injected after
fit().
- metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise'[source]#
- class tanat.clustering.MedoidMixin[source]#
Bases:
objectMixin for clusterers that expose medoids (representative objects).
- class tanat.clustering.PAMClusterer(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, max_iter: int = 50, cluster_column: str = '__PAM_CLUSTERS__')[source]#
Bases:
MedoidMixin,ClustererPartition Around Medoids (PAM) clustering.
Two-phase algorithm:
BUILD: greedy initial medoid selection.
SWAP: iterative improvement by swapping medoid/non-medoid pairs.
Inner loops use Numba-compiled kernels for performance.
Example:
pam = PAMClusterer(metric="linearpairwise", n_clusters=3, max_iter=100) pam.fit(pool) pam.medoids # list of representative item IDs pam.clusters # list[Cluster]
- SETTINGS_CLASS[source]#
alias of
PAMSettings
- __init__(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, max_iter: int = 50, cluster_column: str = '__PAM_CLUSTERS__') None[source]#
Initialise with the given settings.
- class tanat.clustering.PAMSettings(*, metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, max_iter: int = 50, cluster_column: str = '__PAM_CLUSTERS__')[source]#
Bases:
objectSettings for
PAMClusterer.- Parameters:
metric – Metric name or instance. Default:
"linearpairwise".n_clusters – Number of medoids to find. Must be
> 0.max_iter – Maximum number of SWAP iterations. Must be
> 0.cluster_column – Static-feature column name injected after
fit().
- metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise'[source]#