tanat.zeroing package#

Subpackages#

tanat.zeroing.type package

Submodules#

tanat.zeroing.base module#

T0Setter ABC + column constants shared across all zeroing strategies.

class tanat.zeroing.base.T0Setter(settings: Any)[source]#

Bases: ABC, Registrable

Base class for all T0 strategies.

Subclasses register themselves via register_name and live in the type/ subdirectory so that get_registered() can auto-discover them on first lookup.

The setter is the single T0 attribute on the pool. After set_t0() runs, _df holds the pre-computed [id_col, _T0] DataFrame and anchor exposes the hint from settings. The setter is set-once and never mutated again.

__init__(settings: Any) → None[source]#

property anchor: Literal['start', 'end', 'middle'] | None[source]#: Anchor hint from settings, or None if the strategy has no anchor field.

compute_from_sequence(target: SequencePool | Sequence) → pl.DataFrame[source]#

Compute T0 and store the result in df.

Template method. Handles the shared steps for every strategy:

Normalise anchor against the pool type for strategies that declare that field (via _guard_anchor()).
Collect the full ID list and ID column name from target.
Delegate to _compute_t0() for strategy-specific logic.
Left-join the partial result with all IDs so sequences with no match receive _T0_ = null.
Emit a UserWarning for every null (via _warn_nulls()).
Assign self._df and return it.

When target is a Sequence the result has exactly one row.

compute_from_trajectory(target: TrajectoryPool | Trajectory, on: str | None = None) → pl.DataFrame[source]#

Compute T0 from a trajectory pool or a standalone trajectory.

Parameters:

target – The trajectory pool or standalone trajectory.
on – Alias of the reference sub-pool / sequence. None selects the first visible alias.

Returns:

Complete [id_col, _T0_] DataFrame stored in self._df.

classmethod default(*, is_event: bool = False) → T0Setter[source]#

Return the default T0 strategy: PositionT0Setter at position 0.

Uses the registry so no direct import of the subclass is needed. The position type is auto-discovered on first call.

Parameters:: is_event – True when the target has a single time column. Used to pre-set anchor so that _guard_anchor() does not emit a spurious warning on the implicit default.

property df: DataFrame | None[source]#: Pre-computed [id_col, _T0] DataFrame. None until set_t0() assigns it.

static normalize_anchor(anchor: Literal['start', 'end', 'middle'] | None, is_event: bool, pool_type: str = '', stacklevel: int = 3) → Literal['start', 'end', 'middle'] | None[source]#

Validate and normalise an anchor value against the pool type.

Parameters:

anchor – Raw anchor value to normalise.
is_event – True when the pool has a single time column.
pool_type – Registration name of the pool (used in the warning message). Pass "" when unknown.
stacklevel – Passed to warnings.warn() so the warning points to the user’s call site. Caller must count frames: 3 for a direct t0_data call, 5 when called from _guard_anchor → compute → set_t0 → user.

property strategy_summary: str[source]#: Full strategy description, appending , on='<alias>' when the reference alias has been resolved (i.e. _on is not None).

Module contents#

Zeroing (T0) package.

Bases: T0Setter

T0 = a user-supplied scalar or per-sequence mapping.

Scalar: the same value is assigned to every sequence.
Dict ({seq_id: value}): per-sequence assignment. Keys not present in the pool emit a UserWarning and are ignored. Sequences with no matching key receive _t0 = null.

_t0_nearest_rank is resolved as the last row where t_col ≤ T0 (floor semantics). null when _t0 = null or when no row satisfies the constraint.

__init__(*, direct: datetime | date | int | float | None | dict[Any, datetime | date | int | float | None])[source]#

class tanat.zeroing.FeatureT0Setter(*, feature: str)[source]#

Bases: T0Setter

T0 = value of a static feature column, one value per sequence.

The named feature must exist in the pool’s static features and its dtype must exactly match the pool’s time index dtype.

__init__(*, feature: str)[source]#

compute_from_trajectory(target: TrajectoryPool, on: str | None = None) → pl.DataFrame[source]#

Read T0 from a trajectory-level static feature column.

Uses target.static_data (public API) so no coupling to trajectory store internals is needed.

Parameters:

target – The trajectory pool.
on – Ignored for the feature strategy (T0 comes from the trajectory-level static feature, not a sub-pool).

Returns:

Complete [id_col, _T0_] DataFrame stored in self._df.

Raises:

KeyError – If the feature does not exist in trajectory static features.
TypeError – If the feature dtype does not match the trajectory’s time index dtype.

class tanat.zeroing.PositionT0Setter(*, position: int = 0, anchor: Literal['start', 'end', 'middle'] | None = None)[source]#

Bases: T0Setter

T0 = temporal value of the row at position for each sequence.

Supports 0-based positive indexing (position=0 → first row) and negative indexing (position=-1 → last row). Sequences shorter than abs(position) + 1 rows receive _t0 = null.

__init__(*, position: int = 0, anchor: Literal['start', 'end', 'middle'] | None = None)[source]#

class tanat.zeroing.QueryT0Setter(*, query: Expr, anchor: Literal['start', 'end', 'middle'] | None = None, use_first: bool = True)[source]#

Bases: T0Setter

T0 = timestamp of the first (or last) entity row matching query.

query: any pl.Expr that evaluates to a boolean Series on any column of the sequence data (time columns or entity features), e.g. pl.col("event") == "admission" or pl.col("t_start") > threshold.
use_first=True (default): earliest matching row per sequence.
use_first=False: latest matching row.
anchor: which time column to read ("start" / "end"). For event sequences the anchor is ignored.

Sequences with no matching row receive _t0 = null.

__init__(*, query: Expr, anchor: Literal['start', 'end', 'middle'] | None = None, use_first: bool = True)[source]#

class tanat.zeroing.T0Setter(settings: Any)[source]#

Bases: ABC, Registrable

Base class for all T0 strategies.

Subclasses register themselves via register_name and live in the type/ subdirectory so that get_registered() can auto-discover them on first lookup.

__init__(settings: Any) → None[source]#

property anchor: Literal['start', 'end', 'middle'] | None[source]#: Anchor hint from settings, or None if the strategy has no anchor field.

compute_from_sequence(target: SequencePool | Sequence) → pl.DataFrame[source]#

Compute T0 and store the result in df.

Template method. Handles the shared steps for every strategy:

Normalise anchor against the pool type for strategies that declare that field (via _guard_anchor()).
Collect the full ID list and ID column name from target.
Delegate to _compute_t0() for strategy-specific logic.
Left-join the partial result with all IDs so sequences with no match receive _T0_ = null.
Emit a UserWarning for every null (via _warn_nulls()).
Assign self._df and return it.

When target is a Sequence the result has exactly one row.

compute_from_trajectory(target: TrajectoryPool | Trajectory, on: str | None = None) → pl.DataFrame[source]#

Compute T0 from a trajectory pool or a standalone trajectory.

Parameters:

target – The trajectory pool or standalone trajectory.
on – Alias of the reference sub-pool / sequence. None selects the first visible alias.

Returns:

Complete [id_col, _T0_] DataFrame stored in self._df.

classmethod default(*, is_event: bool = False) → T0Setter[source]#

Return the default T0 strategy: PositionT0Setter at position 0.

Uses the registry so no direct import of the subclass is needed. The position type is auto-discovered on first call.

Parameters:: is_event – True when the target has a single time column. Used to pre-set anchor so that _guard_anchor() does not emit a spurious warning on the implicit default.

property df: DataFrame | None[source]#: Pre-computed [id_col, _T0] DataFrame. None until set_t0() assigns it.

static normalize_anchor(anchor: Literal['start', 'end', 'middle'] | None, is_event: bool, pool_type: str = '', stacklevel: int = 3) → Literal['start', 'end', 'middle'] | None[source]#

Validate and normalise an anchor value against the pool type.

Parameters:

anchor – Raw anchor value to normalise.
is_event – True when the pool has a single time column.
pool_type – Registration name of the pool (used in the warning message). Pass "" when unknown.
stacklevel – Passed to warnings.warn() so the warning points to the user’s call site. Caller must count frames: 3 for a direct t0_data call, 5 when called from _guard_anchor → compute → set_t0 → user.

property strategy_summary: str[source]#: Full strategy description, appending , on='<alias>' when the reference alias has been resolved (i.e. _on is not None).