LSTM models: Training the Model

Prerequisits: Training Data & Optimized Model

Everything starts with Data! For this we have Binance or Yahoo Finance. In other posts we described already how to gather training data (see for instance this post) and preprocess them into a standardized Pandas Dataframe suited for model training. We also teached a way to use this data to optimize the parameters of a LSTM model for training and forecasting. When these best parameters have been established, we save them to disc for later re-use.

Now it’s time to finally train the our model with these saved parameters and a completely up-to-date dataset. The more complex the model and the bigger the dataset, the more time consuming this process becomes. Luckily there is no need to repeat this step very often. Once we are content with the model, we save it to disk for later use.

Model Training: preprocessing, building & training, displaying results, saving the model

Below is the complete code to the Python TrainLSTM class. We first load the needed libraries for datahandling and plotting the results. Note: We recently updated the class to now use our generic timeseries sequencer instead of the old Keras’ TimeseriesGenerator.

Because the training process is time- and resource-consuming and we want our machine and its user-interface to remain responsive, the class has been made thread-safe and uses callback functions (in a helper class) to report on progress to the main process.

# Copyright (c) 2025 Hans De Weme
# Licensed under the MIT License (https://opensource.org/licenses/M
# Class: TrainLSTM
# Purpose: training a LSTM model using pre-saved hyperparameters on a timeseries dataframe containing the complete price history of a financial asset
#          plotting the results of the training and predictions 
"""
Imports necessary libraries such as NumPy, Pandas, Scikit-learn, TensorFlow, Keras, Plotly
Reads in the data from a csv dataset
Normalizes the price values between 0 and 1 using Scikit-learn's MinMaxScaler.
Divides the data into training and testing sets using a generic alternative for Keras' TimeseriesGenerator.
Defines a Bidirectional LSTM model with three LSTM layers and one Dense layer.
Trains the LSTM model on the training set, using early stopping to prevent overfitting.
Plots the training loss.
Uses the trained LSTM model to make future price predictions.
Plots the predicted prices on a graph.
Saves the predicted prices to a CSV file.
"""
import os
import json
from   pathlib import Path   
from   datetime import datetime
import numpy as np
import pandas as pd
from   pandas.tseries.offsets  import DateOffset
from   sklearn.preprocessing   import MinMaxScaler
from   sklearn.model_selection import train_test_split
import tensorflow as tf
from   tensorflow      import keras
from   custom_keras    import CustomSequenceGenerator
from   keras.layers    import Bidirectional, Dense, LSTM, Dropout
from   keras.callbacks import EarlyStopping
from   keras.callbacks import LearningRateScheduler
from   PyQt6.QtWidgets import QFileDialog
from   PyQt6.QtCore    import QThread, pyqtSignal, pyqtSlot
import plotly.graph_objs as go
import plotly.io as pio
import warnings
warnings.filterwarnings("ignore")

class ProgressCallback(keras.callbacks.Callback):                       # Create a custom Keras Callback class for epoch end updates
    def __init__(self, progress_signal, epochs):
        super().__init__()
        self.progress_signal = progress_signal
        self.epochs = epochs

    def on_epoch_end(self, epoch, logs=None):
        progress = (epoch + 1) / self.epochs * 100
        message = f"Epoch {epoch + 1}/{self.epochs} completed - {progress:.2f}% (often a lot less needed, due to fast learning)"
        self.progress_signal.emit(message)                              # Emit the message through the signal

class TrainLSTM(QThread):       
    progress_signal     = pyqtSignal(str)                               # Signal to communicate progress (string message) back to the main thread        
    request_save_signal = pyqtSignal(str)                               # Signal to request saving the model
    response_signal     = pyqtSignal(bool)                              # Signal to receive the user's response        

    def __init__(self, asset, data, set, update_callback, parent=None): 
        super().__init__()                                              # necessary for QObject, needed for pyqtSignal  
        self.parent = parent
        self.update_callback = update_callback
        self.progress_callback = ProgressCallback(progress_signal=self.progress_signal, epochs=50)
        self.progress_signal.connect(parent.set_status_message)
        self.save_model_flag = None                                     # Variable to store the user's response (Yes/No)
        self.suc = True
        self.df  = pd.DataFrame(data)
        if self.df.empty:
            print('* * * Time Series Data missing  * * * ')
            self.suc = False
        self.MARKT  = asset                                             # USDT spot markt coin-pair to process 
        self.STOCK  = False
        self.settings = set
        stock_cols = 8
        if 'stock_columns' in self.settings:
            value = self.settings['stock_columns'] 
            if isinstance(value, int):
                stock_cols = value
        num_columns = self.df.shape[1]                
        if(num_columns < stock_cols):                                  # the asset is stock, following actions not needed!
            self.STOCK = True
        self.BATCH_SIZE    = 128                                       # number of sequences in a training batch (must be a power of 2)
        self.SEQUENCE_SIZE = 36                                        # number of datapoints in a training sequence (we predict 12 hours, 0.5 day, so let's use a size of 1.5 days)
        self.N_INPUT = 12                                              # number of new datapoints to predict
Python

Processing

Processing takes place in 4 steps. First we get the data ready, splitting it into separate training, validating and testing sets, normalizing numeric values and creating the sequences needed for processing timeseries. We then build the LSTM model layers using the earlier saved parameters. The real action occurs during the actual training of this model. We make use of ‘early stopping’ the training when after a certain amount of training no furhter improvements occur. When the model is ready, we use it to make predictions and compare these to the test set and plot the results. Finally we face the choice to save the model to disc.

The run() method, we call from the Main Window, orchestrates the processing.

    def run(self):
        try:
            self.pre_process()
            self.do_trainLSTM()
            self.do_predictLSTM()
            self.request_save_signal.emit('LSTM')                       # Emit the signal to request model saving
            while self.save_model_flag is None:                         # Wait for the user's response
                self.msleep(100)                                        # Wait until the response is set            
            if self.save_model_flag:
                self.progress_signal.emit("Saving the model to disk...")
                self.save_model()
            else:
                self.progress_signal.emit("Model save skipped.") 
            self.progress_signal.emit("Training and Prediction completeted. Ready!")        # Emit finished signal           
        except Exception as e:
            self.update_callback(f"Error: {str(e)}")
            
    @pyqtSlot(bool)
    def set_save_model_flag(self, flag):                                # Slot to receive the user's response
        self.save_model_flag = flag

    def save_model(self):
            pad = self.settings['models']         
            pad = Path(pad)
            dir = pad.resolve()
            filename = self.MARKT+'_LSTM_'+self.time_stamp()     
            full_path = dir / filename
            self.model.save(full_path)
            # when finally on Keras 3, use following for extra safety
            # self.model.save(full_path, save_format='keras_v3')
            self.progress_signal.emit("Model saved: "+str(full_path))
            
    def time_stamp(self):           #create timestamp as string
        now  = datetime.now() 
        d = now.strftime("%d")
        m = now.strftime("%m")
        j = now.strftime("%Y")
        h = now.strftime("%H")
        n = now.strftime("%M")
        nu = j+m+d+h+n
        return nu

    def pre_process(self):
        TRAIN_SPLIT   = 0.2                                             # size of test data set apart from train data
        train_size = int(len(self.df) * (1-TRAIN_SPLIT)) 
        self.test_df = self.df.iloc[train_size:]
        plot_data = [go.Scatter(x=self.test_df.index, y=self.test_df['close'], name='price' )]
        plot_layout = go.Layout(title=self.MARKT+' Price Info Testset')
        fig = go.Figure(data=plot_data, layout=plot_layout)
        pio.show(fig)             
        self.total  = self.df                                           # normalize price values: total = df scaled 
        self.scaler = MinMaxScaler()
        self.scaler.fit(self.total)
        self.total = self.scaler.transform(self.total)
        print("\n* * * Normalized Data set info: {}".format(self.total))    # split total data in train - test sets
        self.train, self.test = train_test_split(self.total, test_size=TRAIN_SPLIT, shuffle=False)
        self.train, self.vali = train_test_split(self.train, test_size=TRAIN_SPLIT, shuffle=False) 
        # create sequences: LSTMs expect data in 3 dimensions: [batch_size, sequence_length, n_targetss]
        # create a sequence of the specified length at position 0, shift one position to the right (e.g. 1) and create another sequence
        # the process is repeated until all possible positions are used
        self.train_generator = CustomSequenceGenerator(self.train, self.SEQUENCE_SIZE, self.BATCH_SIZE, shuffle=False)        
        self.vali_generator  = CustomSequenceGenerator(self.vali,  self.SEQUENCE_SIZE, self.BATCH_SIZE, shuffle=False)
        self.test_generator  = CustomSequenceGenerator(self.test,  self.SEQUENCE_SIZE, self.BATCH_SIZE, shuffle=False)
        print("\n* * * Preprocessed train set info: {}".format(self.train))
        print("\n* * * Preprocessed Test set and testset size: {}".format(self.test))
        print(len(self.test))
    
    def load_settings(self, pad):
        if not Path(pad).exists():
            print(f"File '{pad}' does not exist.")
            return False
        try:
            with open(pad, 'r') as file:
                self.hp = json.load(file)
                print("\nSettings File loaded successfully.")
            return True
        except json.JSONDecodeError:
            print("\nError: Settings File exists but contains invalid JSON.")
            return False
        except Exception as e:
            print(f"\nAn unexpected error occurred trying to read Settings File: {e}")
            return None            
Python

Training the Model

Once the pre-saved hyperparameters are loaded these are used for training the model; if no such settings are found, the model is trained with more or less standard settings that have proofed useful in practice.

    def do_trainLSTM(self):
        # replaced static learning rate for the optimizer = 0.001 with dynamic lr_schedule callback
        # activation:  'linear', 'elu', 'relu' or 'tanh': elu en tanh most used for crypto or stock; tanh often gives best results
        # loss function: 'mse' or 'mae'
        # optimizer: 'Adam', Nadam, RMSprop, SGD
        LOSS  = 'mse'                                                   # default values for training, replace with settings from hyperparameters tuning
        ACTIVATION  = 'elu'
        OPTIMIZER   = 'Nadam'                                                 
        self.N_TARGETS  = 1                                            # only 1 feature to predict: price (close)  
        hps = False
        pad = self.settings['models']         
        pad = Path(pad)
        dir = str(pad.resolve())    
        l0_units   = 128
        l1_units   = 128
        l2_units   = 192
        l3_units   = 32
        l4_units   = 128
        l0_drop    = 0.5
        l1_drop    = 0.1
        l2_drop    = 0.5
        l3_drop    = 0.3
        l4_drop    = 0.1
        loss       = LOSS
        opti       = OPTIMIZER
        activation = ACTIVATION                                                         
        params = [file for file in os.listdir(dir) if file.startswith('LSTM_hp')]
        if not params:
            print('No saved LSTM hyperparameter file(s) found. Train model using default values!')
        else:
            file_path, _ = QFileDialog.getOpenFileName(None, "Select a LSTM hyperparameter file", dir,"HP Files(LSTM_hp*.json)")
            if file_path:
                if self.load_settings(file_path) == True:
                    if 'units' in self.hp:
                        l0_units   = self.hp['units']
                    if 'dropout_1' in self.hp:
                        l0_drop    = self.hp['dropout_1']
                    if 'lstm_0_units' in self.hp:
                        l1_units   = self.hp['lstm_0_units']
                    if 'dropout_2' in self.hp:
                        l1_drop    = self.hp['dropout_2']
                    if 'lstm_1_units' in self.hp:
                        l2_units   = self.hp['lstm_1_units']
                    if 'dropout_3' in self.hp:
                        l2_drop    = self.hp['dropout_3']     
                    if 'lstm_2_units' in self.hp:
                        l3_units   = self.hp['lstm_2_units'] 
                    if 'dropout_4' in self.hp:
                        l3_drop    = self.hp['dropout_4']                        
                    if self.hp['n_layers'] == 2:               
                        l3_units   = self.hp['lstm_1_units'] 
                        l3_drop    = self.hp['dropout_3']                                         
                    if 'lstm_final_units' in self.hp:
                        l4_units   = self.hp['lstm_final_units']
                    if 'dropout_last' in self.hp:
                        l4_drop    = self.hp['dropout_last']
                    if 'loss' in self.hp:
                        loss       = self.hp['loss']
                    if 'optimizer' in self.hp:
                        opti       = self.hp['optimizer']
                    if 'activation' in self.hp:
                        activation = self.hp['activation']                    
            else:
                print("No file selected")
        # define EarlyStopping callback to stop the training if the loss does not improve for a certain number of epochs, or if the validation loss starts to increase.
        def scheduler(epoch, lr):
            if epoch < 10:
                return lr
            else:
                return lr * tf.math.exp(-0.1)
        lr_schedule = LearningRateScheduler(scheduler)
        earlystop   = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
        progress_callback = ProgressCallback(progress_signal=self.progress_signal, epochs=50)
        
        self.model = keras.Sequential()
        self.model.add(Bidirectional(LSTM(int(l0_units), activation=activation, return_sequences=True, input_shape=(self.SEQUENCE_SIZE, self.N_TARGETS))))
        self.model.add(Dropout(rate=(l0_drop)))
        self.model.add(Bidirectional(LSTM((int(l1_units)), return_sequences=True)))
        self.model.add(Dropout(rate=l1_drop))
        self.model.add(Bidirectional(LSTM(int(l2_units), return_sequences=True)))
        self.model.add(Dropout(rate=l2_drop))
        self.model.add(Bidirectional(LSTM(int(l3_units), return_sequences=True)))
        self.model.add(Dropout(rate=l3_drop))
        self.model.add(Bidirectional(LSTM(int(l4_units), return_sequences=False)))
        self.model.add(Dropout(rate=l4_drop))
        self.model.add(Dense(units=self.N_TARGETS)) 
        self.model.compile(opti, loss) 
        
        try:
            history = self.model.fit(self.train_generator, epochs=50, validation_data=self.vali_generator, callbacks=[earlystop, lr_schedule, progress_callback], verbose=1)
        except Exception as e:
            error_message = f"Error during training: {str(e)}"                              # Print the error message and send it to the callback
            print(error_message)
            self.update_callback(error_message)     
        self.update_callback("Training completed!") 
        
        print("\n* * * Model history: {}".format(history.history.keys()))                   # plot the result loss
        hist = pd.DataFrame(history.history)
        hist['epoch'] = history.epoch
        plot_data = [go.Scatter(x=hist['epoch'], y=hist['loss'], name='loss' ), go.Scatter(x=hist['epoch'], y=hist['val_loss'], name='value_loss')]
        plot_layout = go.Layout(title='Training loss')
        fig = go.Figure(data=plot_data, layout=plot_layout)
        pio.show(fig)
Python

Effectiveness of Training

Here’s is an overview of the increase of effectiveness of the training.

training-loss

Predicting

After training the model on the training data, it is then used to predict the price development for the test period. The prediction is then matched against the actual test data.

    def do_predictLSTM(self):
        # because TimeseriesGenerator knows its own length and our genric code doesn't provide this, we must manually specify the number of steps
        # (number of batches = len(generator)
        # note: Always use integer division //, not /, when computing steps, because: You cannot have "half" a step, Keras expects an integer number of steps.
        self.test_generator  = CustomSequenceGenerator(self.test,  self.SEQUENCE_SIZE, self.BATCH_SIZE, shuffle=False)
        result=self.model.evaluate(self.test_generator)                                     # evaluate against the test set data
        print("\n* * * Evaluate training against the testset, results: {}".format(result))
        # Inspect batches (no len or indexing used!)
        for i, (x, y) in enumerate(self.test_generator):
            print(f"Asset: {self.MARKT} - Batch {i} - Input shape: {x.shape}, Output shape: {y.shape}")        
        print(f"Model output shape: {self.model.output_shape}")

        self.test_generator = CustomSequenceGenerator(self.test,  self.SEQUENCE_SIZE, self.BATCH_SIZE, shuffle=False)
        y_hat = self.model.predict(self.test_generator)                                     # test the model against original test set data 

        assert y_hat.shape[1] == self.N_TARGETS, "Prediction shape mismatch before inverse transform."
        y_hat_inverse = self.scaler.inverse_transform(y_hat)                                # input is scaled, reverse the output
        print("\nShape of testo['close']: ", self.test_df['close'].shape)                          
        print("\nShape of y_hat_inverse: ", y_hat_inverse.shape)
        aligned_test_df = self.test_df.iloc[self.SEQUENCE_SIZE:]                            # Skip the initial points in testo to align with y_hat_inverse
        if len(y_hat_inverse) < len(aligned_test_df):
            aligned_test_df = aligned_test_df.iloc[:len(y_hat_inverse)]
        plot_data = [
            go.Scatter(x=aligned_test_df.index, y=aligned_test_df['close'], mode='lines', name='Actual', line=dict(color='green')),
            go.Scatter(x=aligned_test_df.index, y=y_hat_inverse.flatten(), mode='lines', name='Predicted', line=dict(color='red'))
        ]
        plot_layout = go.Layout(title=self.MARKT+' Testset - Prediction', xaxis_title='Time', yaxis_title='Price')
        fig = go.Figure(data=plot_data, layout=plot_layout)                                 # take the predictions and plot the result
        fig.show()

        future_df = self.make_future_forecast(model=self.model, df=self.df, scaler=self.scaler, n_input=self.SEQUENCE_SIZE, n_steps=self.N_INPUT, n_targets=self.N_TARGETS)
        plot_data = [go.Scatter(x=future_df.index, y=future_df['Prediction'], name='Forecast', line=dict(color='green'))]
        layout = go.Layout(title=self.MARKT + ' Price Projection', width=1200, height=800)
        fig = go.Figure(data=plot_data, layout=layout)
        fig.show()        
        print(future_df)
        
    def make_future_forecast(self, model, df, scaler, n_input, n_steps, n_targets=1):
        """
        Predict n future steps from the end of df using the trained model.
        Args:
            model: Trained LSTM model
            df: Original unscaled dataframe (e.g. self.df)
            scaler: Previously fit MinMaxScaler
            n_input: Number of timesteps per input sequence
            n_steps: Number of steps to forecast
            n_targets: Number of output features (default=1)
        Returns:
            DataFrame with datetime index and predicted values
        """
        last_sequence = df[-n_input:]                                                               # Extract the last known input window (unscaled)                                                         
        last_scaled = scaler.transform(last_sequence)
        sequence = last_scaled.reshape((1, n_input, n_targets))

        pred_list = []
        for _ in range(n_steps):
            prediction = model.predict(sequence, verbose=0)
            pred_list.append(prediction[0])                                                         # Store raw prediction
            sequence = np.concatenate([sequence[:, 1:, :], prediction[:, np.newaxis, :]], axis=1)   # Update input sequence
        pred_array = np.array(pred_list)                                                            # Inverse transform the predictions
        pred_unscaled = scaler.inverse_transform(pred_array)                                    
        last_timestamp = df.index[-1]                                                               # Build a future datetime index            
        future_index = [last_timestamp + pd.DateOffset(hours=i + 1) for i in range(n_steps)]
        forecast_df = pd.DataFrame(pred_unscaled, index=future_index, columns=['Prediction'])
        return forecast_df
Python

Results of Training

Here’s an example of the same training shown above.

actual-predicted

Finally we make a forecast beyond the test-set with the method make_future_forecast.

forecast

Functional and Technical Documentation for TrainLSTM Class

Overview

The TrainLSTM class is a PyQt6-based implementation designed to train a Long Short-Term Memory (LSTM) model on a time-series dataset representing the price history of a financial asset. The class includes functionality for:

  • Preprocessing and normalizing data.
  • Training a bidirectional LSTM model with specified hyperparameters.
  • Making predictions and plotting results.
  • Saving the trained model for future use.
  • Emitting progress updates via PyQt signals.

2. Functional Documentation

2.1 Key Features

  • Data Preprocessing: Loads and normalizes data, divides it into training and testing sets, and formats it for LSTM training.
  • Training Process: Implements a multi-layer bidirectional LSTM model trained using Keras with early stopping to prevent overfitting.
  • Prediction and Visualization: Evaluates the model and generates predictions plotted using Plotly.
  • Model Saving and Loading: Allows users to save the trained model and load hyperparameters from JSON files.
  • Progress Updates: Uses PyQt signals to communicate status updates to the main application.

2.2 Inputs and Outputs

InputDescription
assetThe name of the financial asset being analyzed (e.g., “BTCUSDT”).
dataPandas DataFrame containing time-series price data.
setDictionary containing model settings such as hyperparameters and file paths.
update_callbackCallback function to update status messages in the UI.
OutputDescription
Progress MessagesStatus updates on training progress via PyQt signals.
Model CheckpointsSaved model files for later use.
Predicted PricesGraphical plots of predicted vs. actual prices.
Log MessagesConsole logs detailing preprocessing, training, and prediction steps.

2.3 User Interaction

  • Users are prompted to select a hyperparameter file (if available).
  • Progress updates are displayed during training.
  • Users can choose to save the trained model.

3. Technical Documentation

3.1 Class Definition

class TrainLSTM(QThread)

Inherits from QThread to allow training in a separate thread, preventing UI blocking.

3.2 Dependencies

import os
import json
import numpy as np
import pandas as pd
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Bidirectional, LSTM, Dense, Dropout
from keras.preprocessing.sequence import TimeseriesGenerator
from keras.callbacks import EarlyStopping, LearningRateScheduler
import plotly.graph_objs as go
from sklearn.preprocessing import MinMaxScaler
from PyQt6.QtCore import QThread, pyqtSignal, pyqtSlot

3.3 Attributes

AttributeDescription
progress_signalSignal for progress updates.
request_save_signalSignal to request model saving.
response_signalSignal to receive user’s response for saving the model.
dfThe dataset used for training and testing.
scalerMinMaxScaler object for data normalization.
train, vali, testTrain, validation, and test datasets.
train_generator, vali_generator, test_generatorData generators for feeding data to the model.
modelKeras Sequential model.

3.4 Key Methods

run()

Handles the full pipeline:

  1. Calls pre_process() for data preparation.
  2. Calls do_trainLSTM() for training.
  3. Calls do_predictLSTM() for making predictions.
  4. Requests user confirmation for model saving.
  5. Calls save_model() if approved.

pre_process()

  • Splits the dataset into train, validation, and test sets.
  • Normalizes the dataset using MinMaxScaler.
  • Generates time-series sequences for LSTM training.
  • Displays price history using Plotly.

do_trainLSTM()

  • Loads hyperparameters from a JSON file if available.
  • Defines and compiles the bidirectional LSTM model.
  • Implements early stopping and learning rate scheduling.
  • Trains the model using fit() and visualizes loss progression.

do_predictLSTM()

  • Evaluates the model on the test set.
  • Generates and plots predicted prices against actual values.
  • Uses rolling predictions for forecasting future price movements.

save_model()

  • Saves the trained model to disk in a user-defined directory.

set_save_model_flag(flag: bool)

  • Receives and stores the user’s response to save the model.

load_settings(pad: str) -> bool

  • Loads hyperparameter settings from a JSON file.

3.5 Model Architecture

LayerTypeUnits
1Bidirectional LSTM128
2Dropout0.5
3Bidirectional LSTM128
4Dropout0.1
5Bidirectional LSTM192
6Dropout0.5
7Bidirectional LSTM32
8Dropout0.3
9Bidirectional LSTM128
10Dropout0.1
11Dense1 (Prediction Output)

3.6 Hyperparameters

ParameterDefault Value
BATCH_SIZE128
SEQUENCE_SIZE36
N_INPUT12
LOSS‘mse’
OPTIMIZER‘Nadam’
ACTIVATION‘elu’
EPOCHS50
TRAIN_SPLIT0.2

4. Conclusion

The TrainLSTM class is a robust implementation for training and predicting financial time-series data using LSTM networks. It offers flexibility through hyperparameter tuning, supports interactive model saving, and provides real-time feedback via PyQt signals.

Related Stories