Avoiding Timeseries Generator ‘Version Hell’

A Custom Sequence Generator for Keras

The need for a custom sequence generator for Keras arises because the world of AI is constantly evolving. Vendors of building blocks for Machine Learning (ML) models, such as the TensorFlow and the Keras libraries, continue to develop them. TensorFlow is an open-source deep-learning framework developed by the Google Brain team. Keras acts as a high-level API, a user-friendly interface on top of TensorFlow (the name Keras means something like peak or horn in Greek, the tip of the iceberg so to speak ;D).

The Keras Timeseries Generator

In 2024, Keras was partially rewritten from a streaming, event-driven perspective. In terms of versions, this is marked as the transition from Keras 2.x to Keras 3. For modelers working on predictive timeseries LSTM projects in particular, this means a ‘version hell’ of Python and R libraries. Several intensively used modules have been drastically changed in Keras 3, such as keras.preprocessing.image, keras.models.load_model, etc. Keras.preprocessing.sequence.TimeseriesGenerator has even simply disappeared from Keras 3.

However, that does not mean that we should either stick to Keras 2 and the ‘old’ TimeseriesGenerator, or suddenly switch to the new version 3 tf.data.Dataset based approach. If we instead develop our own generator class in a smart way, we will be able to develop our projects more or less version independently of these Keras versions.

What is a Sequence Generator?

What do we mean by a generator? A generator must create time series ‘sequences’. An LSTM model expects data in 3 dimensions: batch_size, sequence_length, and n_features, which we can better call n_targets. In predictive models, it concerns ‘predictors’ that have to predict a ‘target’. In time series of financial assets, this primarily concerns the sequence of past prices that must predict the future price. A time series generator with a sequence length of 3 and a target of 1 then composes sets, batches, of 3 consecutive values ​​each time that must predict the 4th: For a dataset such as: [1, 2, 3, 4, 5, 6, 7] with length=3, this yields:

X (input sequence) y (target)

[1, 2, 3]                        4

[2, 3, 4]                        5

[3, 4, 5]                        6

[4, 5, 6]                        7

The workflow is then as follows. First create a sequence of the specific length at position 0. Then shift 1 position to the right (so to 1) and create the next sequence. Repeat this until all possible positions have been used. Every combination of features with its associated target is now available for training the model.

Using the Timeseries Generator

The Keras TimeseriesGenerator was very convenient when working with predictive timeseries models. With one statement you could turn your data into a neatly defined timeseries sequence needed to present it a supervised learning problem to your model. The following code is taken from LSTM Models: Training the Model. We will adjust it to use our new custom sequence generator for Keras.

from   keras.preprocessing.sequence import TimeseriesGenerator

self.train_generator = TimeseriesGenerator(self.train, self.train, length=self.SEQUENCE_SIZE, batch_size = self.BATCH_SIZE) # create sequences: LSTMs expect data in 3 dimensions: [batch_size, sequence_length, n_features]
self.vali_generator  = TimeseriesGenerator(self.vali,  self.vali,  length=self.SEQUENCE_SIZE, batch_size = self. BATCH_SIZE) # create a sequence of the specified length at position 0, shift one position to the right (e.g. 1) and create another sequence
self.test_generator  = TimeseriesGenerator(self.test,  self.test,  length=self.SEQUENCE_SIZE, batch_size = self.BATCH_SIZE) # the process is repeated until all possible positions are used
Python

Inheriting from Keras’ Sequence class

Keras has, in all versions, a sequencer built in: keras.utils.Sequence. This makes it possible to define our CustomSequenceGenerator class in a ‘smart’ way by inheriting from keras.utils.Sequence. This way we keep the connection with the past, Keras 2.x with the traditional TimeseriesGenerator, while also connecting to the streaming future of Keras 3.

A big advantage of this approach is that Keras can automatically determine the number of steps per training epoch or for validation: by calling the __len__() method of our generator, Keras always ‘knows’ the length of our train, test and validation datasets, and therefore the number of batches to be processed, and can apply this in the fit(), evaluate() and predict() methods of the model!

The CustomSequenceGenerator Class

# create sequences: LSTMs expect data in 3 dimensions: [batch_size, sequence_length, n_features]
# create a sequence of the specified length at position 0, shift one position to the right (e.g. 1) and create another sequence
# the process is repeated until all possible positions are 
# until 2.x Keras: from keras.preprocessing.sequence import TimeseriesGenerator
import tensorflow as tf 
import numpy as np

class CustomSequenceGenerator(tf.keras.utils.Sequence):
    """
    Custom generator to mimic Keras TimeseriesGenerator. Predicts the value at t+1 given a sequence of t values.
    Args:
        data (np.ndarray): 2D array of shape (N, num_features),     length (int): length of each input sequence
        batch_size (int): number of sequences per batch,            shuffle (bool): whether to shuffle batch indices
        drop_last: whether to drop last sequence when incomplete,   verbose (bool): print shapes for debugging
    Yields:
        Tuple (X, y) where:
            X.shape = (batch_size, length, num_features)
            y.shape = (batch_size, num_features)
        __len__() = number of batches per epoch (replaces: calc_steps())
    """
    def __init__(self, data, length, batch_size, shuffle=False, drop_last=True, verbose=False):
        assert data.ndim == 2, f"Expected 2D array, got shape {data.shape}"
        self.data = data
        self.length = length
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.drop_last = drop_last
        self.verbose = verbose
        self.indices = np.arange(len(data) - length)
        self.on_epoch_end()

    def __len__(self):
        # the class inherits from keras.utils.Sequence, so it inherently defines the __len__() method, Number of batches per epoch
        n = len(self.indices)
        return n // self.batch_size if self.drop_last else int(np.ceil(n / self.batch_size))

    def __getitem__(self, index):
        # Compute batch slice
        batch_indices = self.indices[index * self.batch_size : (index + 1) * self.batch_size]
        batch_X = np.array([self.data[i : i + self.length] for i in batch_indices])
        batch_y = np.array([self.data[i + self.length] for i in batch_indices])
        batch_y = batch_y.reshape(-1, self.data.shape[1])  # Ensure shape = (batch_size, n_features)
        if self.verbose:
            print(f"[SEQUENCE] Batch {index} — X: {batch_X.shape}, y: {batch_y.shape}")
        return batch_X, batch_y

    def on_epoch_end(self):
        if self.shuffle:
            np.random.shuffle(self.indices)
Python

Related Stories