tanat.clustering.type package#
Submodules#
tanat.clustering.type.clara module#
CLARAClusterer: sampling-based PAM for large datasets.
- class tanat.clustering.type.clara.CLARAClusterer(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', sampling_ratio: float = 0.1, nb_pam_instances: int = 5, n_clusters: int = 2, max_iter: int = 50, random_state: int | None = None, cluster_column: str = '__CLARA_CLUSTERS__')[source]#
Bases:
MedoidMixin,ClustererCLARA (Clustering Large Applications), sampling-based PAM.
Runs PAM on nb_pam_instances random sub-samples of the pool, evaluates each result on the full pool, and keeps the best medoids.
Example:
clara = CLARAClusterer( metric="linearpairwise", n_clusters=5, sampling_ratio=0.1, nb_pam_instances=5, random_state=42, ) clara.fit(pool) clara.medoids # best medoids across all PAM instances
- SETTINGS_CLASS[source]#
alias of
CLARASettings
- __init__(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', sampling_ratio: float = 0.1, nb_pam_instances: int = 5, n_clusters: int = 2, max_iter: int = 50, random_state: int | None = None, cluster_column: str = '__CLARA_CLUSTERS__') None[source]#
Initialise with the given settings.
- class tanat.clustering.type.clara.CLARASettings(*, metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', sampling_ratio: float = 0.1, nb_pam_instances: int = 5, n_clusters: int = 2, max_iter: int = 50, random_state: int | None = None, cluster_column: str = '__CLARA_CLUSTERS__')[source]#
Bases:
objectSettings for
CLARAClusterer.- Parameters:
metric – Metric name or instance. Default:
"linearpairwise".sampling_ratio – Fraction of the pool used per PAM instance. Must be in
(0, 1]. Default:0.1.nb_pam_instances – Number of PAM runs on independent random samples. Default:
5.n_clusters – Number of medoids per PAM run. Default:
2.max_iter – Maximum SWAP iterations per PAM run. Default:
50.random_state – Seed for the random number generator.
None→ non-reproducible.cluster_column – Static-feature column name injected after
fit().
- metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise'[source]#
tanat.clustering.type.hierarchical module#
HierarchicalClusterer: agglomerative hierarchical clustering via sklearn.
- class tanat.clustering.type.hierarchical.HierarchicalClusterer(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, distance_threshold: float | None = None, linkage: str = 'complete', cluster_column: str = '__HCLUSTERS__')[source]#
Bases:
ClustererAgglomerative hierarchical clustering (sklearn
AgglomerativeClustering).Consumes a precomputed
DistanceMatrixwithmetric="precomputed".Example:
clusterer = HierarchicalClusterer(metric="linearpairwise", n_clusters=5) clusterer.fit(pool) clusterer.clusters # list[Cluster]
- SETTINGS_CLASS[source]#
alias of
HierarchicalSettings
- __init__(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, distance_threshold: float | None = None, linkage: str = 'complete', cluster_column: str = '__HCLUSTERS__') None[source]#
Initialise with the given settings.
- class tanat.clustering.type.hierarchical.HierarchicalSettings(*, metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, distance_threshold: float | None = None, linkage: str = 'complete', cluster_column: str = '__HCLUSTERS__')[source]#
Bases:
objectSettings for
HierarchicalClusterer.- Parameters:
metric – Metric name or instance.
n_clusters – Target number of clusters (ignored when distance_threshold is set).
distance_threshold – Cut-off distance for dendrogram trimming.
linkage –
"complete","average","single", or"ward".cluster_column – Static-feature column injected after
fit().
- metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise'[source]#
tanat.clustering.type.pam module#
PAMClusterer: Partition Around Medoids clustering with Numba-optimized kernels.
- class tanat.clustering.type.pam.MedoidMixin[source]#
Bases:
objectMixin for clusterers that expose medoids (representative objects).
- class tanat.clustering.type.pam.PAMClusterer(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, max_iter: int = 50, cluster_column: str = '__PAM_CLUSTERS__')[source]#
Bases:
MedoidMixin,ClustererPartition Around Medoids (PAM) clustering.
Two-phase algorithm:
BUILD: greedy initial medoid selection.
SWAP: iterative improvement by swapping medoid/non-medoid pairs.
Inner loops use Numba-compiled kernels for performance.
Example:
pam = PAMClusterer(metric="linearpairwise", n_clusters=3, max_iter=100) pam.fit(pool) pam.medoids # list of representative item IDs pam.clusters # list[Cluster]
- SETTINGS_CLASS[source]#
alias of
PAMSettings
- __init__(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, max_iter: int = 50, cluster_column: str = '__PAM_CLUSTERS__') None[source]#
Initialise with the given settings.
- class tanat.clustering.type.pam.PAMSettings(*, metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, max_iter: int = 50, cluster_column: str = '__PAM_CLUSTERS__')[source]#
Bases:
objectSettings for
PAMClusterer.- Parameters:
metric – Metric name or instance. Default:
"linearpairwise".n_clusters – Number of medoids to find. Must be
> 0.max_iter – Maximum number of SWAP iterations. Must be
> 0.cluster_column – Static-feature column name injected after
fit().
- metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise'[source]#
Module contents#
Clustering subtypes.
- class tanat.clustering.type.CLARAClusterer(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', sampling_ratio: float = 0.1, nb_pam_instances: int = 5, n_clusters: int = 2, max_iter: int = 50, random_state: int | None = None, cluster_column: str = '__CLARA_CLUSTERS__')[source]#
Bases:
MedoidMixin,ClustererCLARA (Clustering Large Applications), sampling-based PAM.
Runs PAM on nb_pam_instances random sub-samples of the pool, evaluates each result on the full pool, and keeps the best medoids.
Example:
clara = CLARAClusterer( metric="linearpairwise", n_clusters=5, sampling_ratio=0.1, nb_pam_instances=5, random_state=42, ) clara.fit(pool) clara.medoids # best medoids across all PAM instances
- SETTINGS_CLASS[source]#
alias of
CLARASettings
- __init__(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', sampling_ratio: float = 0.1, nb_pam_instances: int = 5, n_clusters: int = 2, max_iter: int = 50, random_state: int | None = None, cluster_column: str = '__CLARA_CLUSTERS__') None[source]#
Initialise with the given settings.
- class tanat.clustering.type.CLARASettings(*, metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', sampling_ratio: float = 0.1, nb_pam_instances: int = 5, n_clusters: int = 2, max_iter: int = 50, random_state: int | None = None, cluster_column: str = '__CLARA_CLUSTERS__')[source]#
Bases:
objectSettings for
CLARAClusterer.- Parameters:
metric – Metric name or instance. Default:
"linearpairwise".sampling_ratio – Fraction of the pool used per PAM instance. Must be in
(0, 1]. Default:0.1.nb_pam_instances – Number of PAM runs on independent random samples. Default:
5.n_clusters – Number of medoids per PAM run. Default:
2.max_iter – Maximum SWAP iterations per PAM run. Default:
50.random_state – Seed for the random number generator.
None→ non-reproducible.cluster_column – Static-feature column name injected after
fit().
- metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise'[source]#
- class tanat.clustering.type.HierarchicalClusterer(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, distance_threshold: float | None = None, linkage: str = 'complete', cluster_column: str = '__HCLUSTERS__')[source]#
Bases:
ClustererAgglomerative hierarchical clustering (sklearn
AgglomerativeClustering).Consumes a precomputed
DistanceMatrixwithmetric="precomputed".Example:
clusterer = HierarchicalClusterer(metric="linearpairwise", n_clusters=5) clusterer.fit(pool) clusterer.clusters # list[Cluster]
- SETTINGS_CLASS[source]#
alias of
HierarchicalSettings
- __init__(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, distance_threshold: float | None = None, linkage: str = 'complete', cluster_column: str = '__HCLUSTERS__') None[source]#
Initialise with the given settings.
- class tanat.clustering.type.HierarchicalSettings(*, metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, distance_threshold: float | None = None, linkage: str = 'complete', cluster_column: str = '__HCLUSTERS__')[source]#
Bases:
objectSettings for
HierarchicalClusterer.- Parameters:
metric – Metric name or instance.
n_clusters – Target number of clusters (ignored when distance_threshold is set).
distance_threshold – Cut-off distance for dendrogram trimming.
linkage –
"complete","average","single", or"ward".cluster_column – Static-feature column injected after
fit().
- metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise'[source]#
- class tanat.clustering.type.MedoidMixin[source]#
Bases:
objectMixin for clusterers that expose medoids (representative objects).
- class tanat.clustering.type.PAMClusterer(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, max_iter: int = 50, cluster_column: str = '__PAM_CLUSTERS__')[source]#
Bases:
MedoidMixin,ClustererPartition Around Medoids (PAM) clustering.
Two-phase algorithm:
BUILD: greedy initial medoid selection.
SWAP: iterative improvement by swapping medoid/non-medoid pairs.
Inner loops use Numba-compiled kernels for performance.
Example:
pam = PAMClusterer(metric="linearpairwise", n_clusters=3, max_iter=100) pam.fit(pool) pam.medoids # list of representative item IDs pam.clusters # list[Cluster]
- SETTINGS_CLASS[source]#
alias of
PAMSettings
- __init__(metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, max_iter: int = 50, cluster_column: str = '__PAM_CLUSTERS__') None[source]#
Initialise with the given settings.
- class tanat.clustering.type.PAMSettings(*, metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise', n_clusters: int = 2, max_iter: int = 50, cluster_column: str = '__PAM_CLUSTERS__')[source]#
Bases:
objectSettings for
PAMClusterer.- Parameters:
metric – Metric name or instance. Default:
"linearpairwise".n_clusters – Number of medoids to find. Must be
> 0.max_iter – Maximum number of SWAP iterations. Must be
> 0.cluster_column – Static-feature column name injected after
fit().
- metric: str | SequenceMetric | TrajectoryMetric = 'linearpairwise'[source]#