canari.model#

Hybrid LSTM-SSM model that combines Bayesian Long-short Term Memory (LSTM) Neural Networks and State-Space Models (SSM).

This model supports a flexible architecture where multiple component are assembled to define a structured state-space model.

On time series data, this model can:

  • Provide forecasts with associated uncertainties.

  • Decompose orginal time serires data into unobserved hidden states. Provide mean values and associate uncertainties for these hidden states.

  • Train its Bayesian LSTM network component.

  • Support forecasting, filtering, and smoothing operations.

  • Generate synthetic time series data, including synthetic anomaly injection.

References

Vuong, V.D., Nguyen, L.H. and Goulet, J.-A. (2025). Coupling LSTM neural networks and state-space models through analytically tractable inference. International Journal of Forecasting. Volume 41, Issue 1, Pages 128-140.

class canari.model.Model(*components: BaseComponent)[source]#

Bases: object

Model class for the Hybrid LSTM/SSM model.

Parameters:

*components (BaseComponent) – One or more instances of classes derived from BaseComponent.

Examples

>>> from canari.component import LocalTrend, Periodic, WhiteNoise
>>> from canari import Model
>>> # Components
>>> local_trend = LocalTrend(mu_states=[1,0.5], var_states=[1,0.5])
>>> periodic = Periodic(mu_states=[1,1],var_states=[2,2],period=52)
>>> residual = WhiteNoise(std_error=0.04168)
>>> # Define model
>>> model = Model(local_trend, periodic, residual)
components#

Dictionary to save model components’ configurations.

Type:

Dict[str, BaseComponent]

num_states#

Number of hidden states.

Type:

int

states_names#

Names of hidden states.

Type:

list[str]

mu_states#

Mean vector for the hidden states \(X_{t|t}\) at the time step t.

Type:

np.ndarray

var_states#

Covariance matrix for the hidden states \(X_{t|t}\) at the time step t.

Type:

np.ndarray

mu_states_prior#

Prior mean vector for the hidden states \(X_{t+1|t}\) at the time step t+1.

Type:

np.ndarray

var_states_prior#

Prior covariance matrix for the hidden states \(X_{t+1|t}\) at the time step t+1.

Type:

np.ndarray

mu_states_posterior#

Posteriror mean vector for the hidden states \(X_{t+1|t+1}\) at the time step t+1. In case of missing data (NaN observation), it will have the same values as mu_states_prior.

Type:

np.ndarray

var_states_posterior#

Posteriror covariance matrix for the hidden states \(X_{t+1|t+1}\) at the time step t+1. In case of missing data (NaN observation), it will have the same values as var_states_prior.

Type:

np.ndarray

states#

Container for storing prior, posterior, and smoothed values of hidden states over time.

Type:

StatesHistory

mu_obs_predict#

Means for observation predictions at a time step t+1.

Type:

np.ndarray

var_obs_predict#

Variances for observation predictions at a time step t+1.

Type:

np.ndarray

observation_matrix#

Global observation matrix constructed from all components.

Type:

np.ndarray

transition_matrix#

Global transition matrix constructed from all components.

Type:

np.ndarray

process_noise_matrix#

Global process noise matrix constructed from all components.

Type:

np.ndarray

# LSTM-related attributes

only being used when a LstmNetwork component is found.

lstm_net#

LSTM neural network that is generated from the LstmNetwork component, if present. It is a pytagi.Sequential instance.

Type:

pytagi.Sequential

lstm_output_history#

Container for saving a rolling history of LSTM output over a fixed look-back window.

Type:

LstmOutputHistory

# Early stopping attributes

only being used when training a LstmNetwork component.

early_stop_metric#

Best value associated with the metric being monitored.

Type:

float

early_stop_metric_history#

Logged history of metric values across epochs.

Type:

List[float]

early_stop_lstm_param#

LSTM’s weight and bias parameters at the optimal epoch for pytagi.Sequential.

Type:

Dict

early_stop_init_mu_states#

Copy of mu_states at time step t=0 of the optimal epoch .

Type:

np.ndarray

early_stop_init_var_states#

Copy of var_states at time step t=0 of the optimal epoch .

Type:

np.ndarray

optimal_epoch#

Epoch at which the metric being monitored was best.

Type:

int

stop_training#

Flag indicating whether training has been stopped due to early stopping or by reaching maximum number of epoch.

Type:

bool

# Optimization attribute
metric_optim#

metric used for optimization in model_optimizer

Type:

float

auto_initialize_baseline_states(data: ndarray)[source]#

Automatically assign initial means and variances for baseline hidden states (level, trend, and acceleration) from input data using time series decomposition defined in decompose_data().

Parameters:

data (np.ndarray) – Time series data.

Examples

>>> train_set, val_set, test_set, all_data = dp.get_splits()
>>> model.auto_initialize_baseline_states(train_data["y"][0 : 52])
backward(obs: float) Tuple[ndarray, ndarray, ndarray, ndarray][source]#

Update step in the Kalman filter for one time step.

This function is used at the one-time-step level. Recall backward() from common.

Parameters:

obs (float) – Observation value.

Returns:

A tuple containing:

Return type:

Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]

early_stopping(evaluate_metric: float, current_epoch: int, max_epoch: int, mode: str | None = 'min', patience: int | None = 20, skip_epoch: int | None = 5) Tuple[bool, int, float, list][source]#

Apply early stopping based on a monitored metric when training a LSTM neural network.

This method records evaluate_metric at each epoch, if there is an improvement on it, update early_stop_metric, early_stop_init_mu_states, early_stop_init_var_states, early_stop_lstm_param, and optimal_epoch.

Sets the stop_training to True if optimal_epoch = max_epoch, or (current_epoch - optimal_epoch)>= patience.

When stop_training is True, set mu_states = early_stop_init_mu_states, var_states = early_stop_init_var_states, and set LSTM parameters to early_stop_lstm_param.

Parameters:
  • current_epoch (int) – Current epoch

  • max_epoch (int) – Maximum number of epochs

  • evaluate_metric (float) – Current metric value for this epoch.

  • mode (Optional[str]) – Direction for early stopping: ‘min’ (default).

  • patience (Optional[int]) – Number of epochs without improvement before stopping. Defaults to 20.

  • skip_epoch (Optional[int]) – Number of initial epochs to ignore when looking for improvements. Defaults to 5.

Returns:

  • stop_training: True if training stops.

  • optimal_epoch: Epoch index of when having best metric.

  • early_stop_metric: Best evaluate_metric. .

  • early_stop_metric_history: History of evaluate_metric at all epochs.

Return type:

Tuple[bool, int, float, List[float]]

Examples

>>> model.early_stopping(evaluate_metric=mse, current_epoch=1, max_epoch=50)
filter(data: Dict[str, ndarray], train_lstm: bool | None = True) Tuple[ndarray, ndarray, StatesHistory][source]#

Run the Kalman filter over an entire dataset, i.e., repeatly apply the Kalman prediction and update steps over multiple time steps.

This function is used at the entire-dataset-level. Recall repeatedly the function forward() and backward() at one-time-step level from Model.

Parameters:
  • data (Dict[str, np.ndarray]) – Includes ‘x’ and ‘y’.

  • train_lstm (bool) – Whether to update LSTM’s parameter weights and biases. Defaults to True.

Returns:

A tuple containing:

  • mu_obs_preds (np.ndarray):

    The means for forecasts.

  • std_obs_preds (np.ndarray):

    The standard deviations for forecasts.

  • states:

    The history of hidden states over time.

Return type:

Tuple[np.ndarray, np.ndarray, StatesHistory]

Examples

>>> mu_preds_train, std_preds_train, states = model.filter(train_set)
forecast(data: Dict[str, ndarray]) Tuple[ndarray, ndarray, StatesHistory][source]#

Perform multi-step-ahead forecast over an entire dataset by recursively making one-step-ahead predictions, i.e., reapeatly apply the Kalman prediction step over multiple time steps.

This function is used at the entire-dataset-level. Recall repeatedly the function forward() at one-time-step level from Model.

Parameters:

data (Dict[str, np.ndarray]) – A dictionary containing key ‘x’ as input covariates, if exists ‘y’ (real observations) will not be used.

Returns:

A tuple containing:

  • mu_obs_preds (np.ndarray):

    The means for forecasts.

  • std_obs_preds (np.ndarray):

    The standard deviations for forecasts.

  • states:

    The history of hidden states over time.

Return type:

Tuple[np.ndarray, np.ndarray, StatesHistory]

Examples

>>> mu_preds_val, std_preds_val, states = model.forecast(val_set)
forward(input_covariates: ndarray | None = None, mu_lstm_pred: ndarray | None = None, var_lstm_pred: ndarray | None = None) Tuple[ndarray, ndarray, ndarray, ndarray][source]#

Make a one-step-ahead prediction using the prediction step of the Kalman filter. If no input_covariates for LSTM, use an empty np.ndarray. Recall forward() from common.

This function is used at the one-time-step level.

Parameters:
  • input_covariates (Optional[np.ndarray]) – Input covariates for LSTM at time t.

  • mu_lstm_pred (Optional[np.ndarray]) – Predicted mean from LSTM at time t+1, used when we dont want LSTM to make predictions, but use LSTM predictions already have.

  • var_lstm_pred (Optional[np.ndarray]) – Predicted variance from LSTM at time t+1, used when we dont want LSTM to make predictions, but use LSTM predictions already have.

Returns:

A tuple containing:

  • mu_obs_predict (np.ndarray):

    The predictive mean of the observation at t+1.

  • var_obs_predict (np.ndarray):

    The predictive variance of the observation at t+1.

  • mu_states_prior (np.ndarray):

    The prior mean of the hidden state at t+1.

  • var_states_prior (np.ndarray):

    The prior variance of the hidden state at t+1.

Return type:

Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]

generate_time_series(num_time_series: int, num_time_steps: int, sample_from_lstm_pred=True, time_covariates=None, time_covariate_info=None, add_anomaly=False, anomaly_mag_range=None, anomaly_begin_range=None, anomaly_type='trend') ndarray[source]#

Generate synthetic time series data based on the model components, with optional synthetic anomaly injection.

Parameters:
  • num_time_series (int) – Number of independent series to generate.

  • num_time_steps (int) – Number of timesteps per generated series.

  • sample_from_lstm_pred (bool, optional) – If False, zeroes out LSTM-derived variance so that the generation ignores the LSTM uncertainty. Defaults to True.

  • time_covariates (np.ndarray of shape (num_time_steps, cov_dim), optional) – Time-varying covariates to include in generation. If provided, these will be standardized using time_covariate_info and passed through the model each step. Defaults to None.

  • time_covariate_info (dict, optional) – Required if time_covariates is not None. Must contain: - “initial_time_covariate” (np.ndarray): the starting covariate vector - “mu” (np.ndarray): means for standardization - “std” (np.ndarray): standard deviations for standardization

  • add_anomaly (bool, optional) – Whether to inject a synthetic anomaly into each series. Defaults to False.

  • anomaly_mag_range (tuple of float, optional) – (min, max) range for random anomaly magnitudes. Required if add_anomaly=True. Defaults to None.

  • anomaly_begin_range (tuple of int, optional) – (min, max) range of timestep indices at which anomaly may start. Required if add_anomaly=True. Defaults to None.

  • anomaly_type (str, optional) – Type of injected anomaly: “trend”: a growing linear drift after anomaly starts, “level”: a constant shift after anomaly starts. Defaults to “trend”.

Returns:

  • generated series (np.ndarray):

    Generated series with the shape (num_time_series, num_time_steps).

  • input_covariates (np.ndarray):

    The input covariates used.

  • anomaly magnitudes (List[float]):

    Anomaly magnitudes per series.

  • anomaly start timesteps (List[float]):

    Anomaly start timesteps per series.

Return type:

Tuple[np.ndarray, np.ndarray, List[float], List[int]]

get_dict() dict[source]#

Export model attributes into a serializable dictionary.

Returns:

Serializable model dictionary containing neccessary attributes.

Return type:

dict

Examples

>>> saved_dict = model.get_dict()
get_states_index(states_name: str)[source]#

Retrieve index of a state in the state vector.

Parameters:

states_name (str) – The name of the state.

Returns:

Index of the state, or None if not found.

Return type:

int or None

Examples

>>> lstm_index = model.get_states_index("lstm")
>>> level_index = model.get_states_index("level")
initialize_states_history()[source]#

Reinitialize prior, posterior, and smoothed values for hidden states in states with empty lists.

initialize_states_with_smoother_estimates()[source]#

Set hidden states mu_states and var_states using the smoothed estimates for hidden states at the first time step t=1 stored in states. This new hidden states act as the inital hidden states at t=0 in the next epoch.

static load_dict(save_dict: dict)[source]#

Reconstruct a model instance from a saved dictionary.

Parameters:

save_dict (dict) – Dictionary containing saved model structure and parameters.

Returns:

An instance of Model generated from the input dictionary.

Return type:

Model

Examples

>>> saved_dict = model.get_dict()
>>> loaded_model = Model.load_dict(saved_dict)
lstm_train(train_data: Dict[str, ndarray], validation_data: Dict[str, ndarray], white_noise_decay: bool | None = True, white_noise_max_std: float | None = 5, white_noise_decay_factor: float | None = 0.9) Tuple[ndarray, ndarray, StatesHistory][source]#

Train the LstmNetwork component on the provided training set, then evaluate on the validation set. Optionally apply exponential decay on the white noise standard deviation over epochs.

At the end of this function, use set_memory to set the memory to t=0.

Parameters:
  • train_data (Dict[str, np.ndarray]) – Dictionary with keys ‘x’ and ‘y’ for training inputs and targets.

  • validation_data (Dict[str, np.ndarray]) – Dictionary with keys ‘x’ and ‘y’ for validation inputs and targets.

  • white_noise_decay (bool, optional) – If True, apply an exponential decay on the white noise standard deviation over epochs, if a white noise component exists. Defaults to True.

  • white_noise_max_std (float, optional) – Upper bound on the white-noise standard deviation when decaying. Defaults to 5.

  • white_noise_decay_factor (float, optional) – Multiplicative decay factor applied to the white‐noise standard deviation each epoch. Defaults to 0.9.

Returns:

A tuple containing:

  • mu_obs_preds (np.ndarray):

    The means for multi-step-ahead predictions for the validation set.

  • std_obs_preds (np.ndarray):

    The standard deviations for multi-step-ahead predictions for the validation set.

  • states:

    The history of hidden states over time.

Return type:

Tuple[np.ndarray, np.ndarray, StatesHistory]

Examples

>>> mu_preds_val, std_preds_val, states = model.lstm_train(train_data=train_set,validation_data=val_set)
rts_smoother(time_step: int, matrix_inversion_tol: float | None = 1e-12, tol_type: str | None = 'relative')[source]#

Apply RTS smoothing equations for a specity timestep. As a result of this function, the smoothed estimates for hidden states at the specific time step will be updated in states.

This function is used at the one-time-step level. Recall rts_smoother() from common.

Parameters:
  • time_step (int) – Target smoothing index.

  • matrix_inversion_tol (float) – Numerical stability threshold for matrix pseudoinversion (pinv). Defaults to 1E-12.

set_memory(states: StatesHistory, time_step: int)[source]#

Set mu_states, var_states, and lstm_output_history with smoothed estimates from a specific time steps stored in states. This is to prepare for the next analysis by ensuring the continuity of these variables, e.g., if the next analysis starts from time step t, should set the memory to the time step t.

If t=0, also set the means and variances for cell and hidden states of lstm_net to zeros. If t is not 0, need to set cell and hidden states outside in the code using Model.lstm_net.set_lstm_states(lstm_cell_hidden_states).

Parameters:
  • states (StatesHistory) – Full history of hidden states over time.

  • time_step (int) – Index of timestep to restore.

Examples

>>> # If the next analysis starts from the beginning of the time series
>>> model.set_memory(states=model.states, time_step=0))
>>> # If the next analysis starts from t = 200
>>> model.set_memory(states=model.states, time_step=200))
set_states(new_mu_states: ndarray, new_var_states: ndarray)[source]#

Set new values for states, i.e., mu_states and var_states

Parameters:
  • new_mu_states (np.ndarray) – Mean values to be set.

  • new_var_states (np.ndarray) – Covariance matrix to be set.

smoother(matrix_inversion_tol: float | None = 1e-12, tol_type: str | None = 'relative') StatesHistory[source]#

Run the Kalman smoother over an entire time series data, i.e., repeatly apply the RTS smoothing equation over multiple time steps.

This function is used at the entire-dataset-level. Recall repeatedly the function rts_smoother() at one-time-step level from Model.

Parameters:

matrix_inversion_tol (float) – Numerical stability threshold for matrix pseudoinversion (pinv). Defaults to 1E-12.

Returns:

states: The history of hidden states over time.

Return type:

StatesHistory

Examples

>>> mu_preds_train, std_preds_train, states = model.filter(train_set)
>>> states = model.smoother()
white_noise_decay(epoch: int, white_noise_max_std: float, white_noise_decay_factor: float)[source]#

Apply exponential decay to white noise standard deviation over epochs, and modify the variance for the white noise component in process_noise_matrix. This decaying noise structure is intended to improve the training performance of TAGI-LSTM.

Parameters:
  • epoch (int) – Current training epoch.

  • white_noise_max_std (float) – Maximum allowed noise std.

  • white_noise_decay_factor (float) – Factor controlling decay rate.