Hyperparameter Optimizing with Keras Tuner for LSTM

Keras Tuner

This Python class uses the Keras Tuner for hyperparameter tuning of a LSTM time series model. See the Tensorflow homepages for background information. For an introduction on this subject see this earlier post: lstm-models-tuning-hyperparameters. It takes as input a pandas dataframe with the price history of a financial asset like Bitcoin (BTC) of Apple (AAPL) that has been preprocessed with preprocess-a-timeseries-csv-as-pandas-dataframe.

LSTMTuner class

Things start with importing the necessary libraries such as NumPy, Pandas, Scikit-learn, TensorFlow, Keras, Plotly. Note: We recently updated the class to now use our generic timeseries sequencer instead of the old Keras’ TimeseriesGenerator.

Also the libraries needed for multi-threading and communicating with the Main Window in a PyQt6 GUI application are loaded. Note: the class can be used as part of a terminal CL application or a Jupyter Notebook as well.

# Copyright (c) 2024, 2025 Hans De Weme
# Licensed under the MIT License (https://opensource.org/licenses/MIT).
# Class MakeTotal
# Purpose: finding the best hyperparameters for a LSTM time series model to train on a pandas dataframe 
# of previous collected and preprocessed historical data 
"""
Imports necessary libraries such as NumPy, Pandas, Scikit-learn, TensorFlow, Keras, Plotly
Recieve the time series historical data (from Binance or Yahoo Finance) as a pandas dataframe
Normalizes the price values between 0 and 1 using Scikit-learn's MinMaxScaler.
Divides the data into training, validating and testing sets using ou generic alternative for Keras' TimeseriesGenerator.
Defines LSTM model with Keras tuner
Search for best parameter setting using Hyperband and BayesianOptimization
Print and save the results as json settings file for later re-use

"""
import json
from   pathlib import Path
import shutil
import pandas as pd
# from   keras.preprocessing.sequence import TimeseriesGenerator
from   custom_keras import CustomSequenceGenerator
from   sklearn.model_selection import train_test_split
from   sklearn.preprocessing  import MinMaxScaler
from   tensorflow      import keras
from   keras.layers    import Bidirectional, Dense, LSTM, Dropout
from   keras.callbacks import EarlyStopping
# not: keras_tuner should be at least at version 1.3.5 to function correctly with both Keras 2.x and Keras 3!
from   keras_tuner     import RandomSearch
from   PyQt6.QtCore    import QThread, pyqtSignal
import plotly.graph_objs as go
import plotly.io as pio
import warnings
warnings.filterwarnings("ignore")
Python

Initializing

The class connects withe the calling parent and initializes things like the global settings and the dataframe to use. The Main Window can then call the run() method, that orchestrates the processing.

class LSTMTuner(QThread):       
    progress_signal  = pyqtSignal(str)                                  # Signal to communicate progress (string message) back to the main thread            
    
    def __init__(self, asset, data, settings, parent=None): 
        super().__init__()                                              # necessary for QObject, needed for pyqtSignal  
        self.parent = parent
        self.progress_signal.connect(parent.set_status_message)       
        self.suc = True
        self.df  = pd.DataFrame(data)
        if self.df.empty:
            print('* * * Time Series Data missing  * * * ')
            self.suc = False
        self.MARKT  = asset                                             # USDT spot markt coin-pair to process 
        self.STOCK  = False
        self.settings = settings
        stock_cols = 8
        if 'stock_columns' in self.settings:
            value = self.settings['stock_columns'] 
            if isinstance(value, int):
                stock_cols = value
        num_columns = self.df.shape[1]                
        if(num_columns < stock_cols):                                  # the asset is stock, following actions not needed!
            self.STOCK = True
        self.N_INPUT       = 12                                        # number of new datapoints to predict
        self.N_FEATURES    = 1                                         # only 1 feature to predict: price (close)  
        self.BATCH_SIZE    = 128                                       # number of sequences in a training batch (must be a power of 2)
        self.TRAIN_SPLIT   = 0.1                                       # size of test data set apart from train data
        self.SEQUENCE_SIZE = 36                                        # number of datapoints in a training sequence (we predict 12 hours, 0.5 day, so let's use a size of 1.5 days)
        
    def run(self):
        try:
            self.pre_process()
            self.progress_signal.emit("Preprocessing completed, starting Hyperparamter Search...")
            self.do_tuning()
        except Exception as e:
            self.progress_signal.emit(f"Error: {str(e)}")        
Python

Processing

The actual processing in the worker thread takes place in 3 steps. Preprocessing the data to be used in optimizing the model is the first step. Then a framework for LSTM model, bidirectional or not, with variable LSTM, Dense layers and Dropout is prepared. Finally the Keras Tuner is used on this framework to find the paramaters best suited for predictive use with the provided time series in the dataset.

    def pre_process(self):
        train_size = int(len(self.df) * (1-self.TRAIN_SPLIT))          # save original test data for later comparison
        self.testo = self.df.iloc[train_size:]
        print("\n* * * Test Data set info: {}".format(self.testo.info()))
        plot_data = [go.Scatter(x=self.testo.index, y=self.testo['close'], name='price' )]
        plot_layout = go.Layout(title=self.MARKT+' Price Info Testset')
        fig = go.Figure(data=plot_data, layout=plot_layout)
        pio.show(fig)
        total = self.df
        scaler = MinMaxScaler()
        scaler.fit(total)
        total = scaler.transform(total)
        print("\n* * * Normalized Data set info: {}".format(total))
        self.train, self.test = train_test_split(total, test_size=self.TRAIN_SPLIT, shuffle=False) # split total data in train - validation - test sets
        self.train, self.vali = train_test_split(self.train, test_size=self.TRAIN_SPLIT, shuffle=False) 
        self.train_generator = CustomSequenceGenerator(self.train, self.SEQUENCE_SIZE, self.BATCH_SIZE, shuffle=False)  
        self.vali_generator  = CustomSequenceGenerator(self.vali,  self.SEQUENCE_SIZE, self.BATCH_SIZE, shuffle=False)   
        self.test_generator  = CustomSequenceGenerator(self.test, self.SEQUENCE_SIZE, self.BATCH_SIZE, shuffle=False)
        print("\n* * * Preprocessed train set info: {}".format(self.train)) 
        print("\n* * * Preprocessed Test set and testset size: {}".format(self.test))
        print(len(self.test))
        self.suc = True 

    def create_lstm_model(self, hp):
        model = keras.Sequential()
        bidirectional = hp.Boolean('bidirectional')         # Define whether the LSTM layer should be bidirectional or not
        if bidirectional:
            model.add(Bidirectional(LSTM(units=hp.Int('units', min_value=32, max_value=256, step=32), 
                    activation=hp.Choice('activation', values=['elu', 'relu', 'tanh', 'sigmoid']),
                    input_shape=(self.SEQUENCE_SIZE, self.N_FEATURES),
                    return_sequences=True)))  # Change to True if more LSTM layers follow
            model.add(Dropout(rate=hp.Float('dropout_1', min_value=0.1, max_value=0.5, step=0.1)))
        else: 
            model.add(LSTM(units=hp.Int('units', min_value=32, max_value=256, step=32), 
                    activation=hp.Choice('activation', values=['elu', 'relu', 'tanh', 'sigmoid']),
                    input_shape=(self.SEQUENCE_SIZE, self.N_FEATURES),
                    return_sequences=True))  # Change to True if more LSTM layers follow
            model.add(Dropout(rate=hp.Float('dropout_1', min_value=0.1, max_value=0.5, step=0.1)))
        
        for i in range(hp.Int('n_layers', 2, 4) - 1):
            if bidirectional:
                model.add(Bidirectional(LSTM(units=hp.Int(f'lstm_{i}_units', min_value=32, max_value=256, step=32),
                            return_sequences=True)))  # Intermediate LSTM layers should pass sequences
                model.add(Dropout(rate=hp.Float(f'dropout_{i+2}', min_value=0.1, max_value=0.5, step=0.1)))
            else:
                model.add(LSTM(units=hp.Int(f'lstm_{i}_units', min_value=32, max_value=256, step=32),
                            return_sequences=True))  # Intermediate LSTM layers should pass sequences
                model.add(Dropout(rate=hp.Float(f'dropout_{i+2}', min_value=0.1, max_value=0.5, step=0.1)))

        # Last LSTM layer
        if bidirectional:
            model.add(Bidirectional(LSTM(units=hp.Int('lstm_final_units', min_value=32, max_value=256, step=32),
                        return_sequences=False)))  # No sequence output needed for the final layer
            model.add(Dropout(rate=hp.Float('dropout_last', min_value=0.1, max_value=0.5, step=0.1)))

            model.add(Dense(units=self.N_FEATURES))
            model.compile(loss=hp.Choice('loss', values=['mean_squared_error', 'mean_absolute_error']),
                        optimizer=hp.Choice('optimizer', values=['SGD', 'RMSprop', 'Adam']),
                        metrics=['mae'])
        else:
            model.add(LSTM(units=hp.Int('lstm_final_units', min_value=32, max_value=256, step=32),
                        return_sequences=False))  # No sequence output needed for the final layer
            model.add(Dropout(rate=hp.Float('dropout_last', min_value=0.1, max_value=0.5, step=0.1)))

            model.add(Dense(units=self.N_FEATURES))
            model.compile(loss=hp.Choice('loss', values=['mean_squared_error', 'mean_absolute_error']),
                        optimizer=hp.Choice('optimizer', values=['SGD', 'RMSprop', 'Adam']),
                        metrics=['mae'])  
        return model

    def do_tuning(self):
        directory = Path(".\\untitled_project")                         # Remove old working directory of Keras Tuner (if not it will resume using it)
        if directory.exists() and directory.is_dir():
            shutil.rmtree(str(directory))  
            
        earlystop   = EarlyStopping(monitor='val_loss', min_delta=0.01, patience=3, restore_best_weights=True)
        tuner = RandomSearch(                                           # or Hyperband or BayesianOptimization 
            self.create_lstm_model,
            objective='val_loss',
            max_trials = 4,                                             # Adjust this depending on the computational resources
            executions_per_trial = 3,                                   # Average the results of n runs to reduce variance
            overwrite=True,                                             # This will ensure the tuner ignores old data
            directory=".",                                              # base dir
            project_name="untitled_project",                            # project folder inside directory            
        )
        tuner.search(
            self.train_generator,
            epochs=20,
            batch_size=self.BATCH_SIZE,
            validation_data= self.vali_generator,

            callbacks=[earlystop]
        )
        for trial in tuner.oracle.get_best_trials(num_trials=4):
            self.progress_signal.emit(f"Trial {trial.trial_id} completed with score: {trial.score}")
        self.progress_signal.emit("Tuning completed!")
        best_hyperparameters = tuner.get_best_hyperparameters(num_trials=1)[0]
        for key, value in best_hyperparameters.values.items():
            print(f"{key}: {value}")
        hp_dict = dict(best_hyperparameters.values.items())
        pad = self.settings['models']         
        pad = Path(pad)
        filename = 'LSTM_hp_'+self.MARKT+'.json' 
        full_path = pad / filename
        full_path.write_text(json.dumps(hp_dict, indent=4))
        print("Hyperparameters saved: "+filename)               
Python

Functional & Technical Documentation

Overview This script leverages Keras Tuner to optimize hyperparameters for an LSTM model used for time series forecasting of financial assets such as cryptocurrencies or stocks. It preprocesses historical price data, performs hyperparameter tuning, and saves the best configuration for future use.

Overview

  • Data preprocessing: Normalizes time series data.
  • Data splitting: Divides data into training, validation, and test sets.
  • Sequence generation: Uses Keras’s TimeseriesGenerator.
  • Model building: Defines LSTM architecture with tunable hyperparameters.
  • Hyperparameter optimization: Uses Keras Tuner with RandomSearch.
  • Result visualization: Displays test set trends.
  • Model saving: Stores optimal hyperparameter settings as a JSON file.

User Workflow

  1. The script receives a standardized Pandas DataFrame containing historical time series data.
  2. It normalizes price values between 0 and 1 using MinMaxScaler.
  3. Data is split into training, validation, and test sets.
  4. Timeseries sequences are created for LSTM training.
  5. The script defines an LSTM model with tunable hyperparameters.
  6. Keras Tuner optimizes hyperparameters using RandomSearch.
  7. The best hyperparameters are printed and saved in a JSON file.

Inputs & Outputs

  • Input: A Pandas DataFrame containing historical time series data (e.g., from Binance or Yahoo Finance).
  • Output: A JSON file storing the best hyperparameter values for future model training.

Dependencies

  • Python libraries: NumPy, Pandas, Scikit-learn, TensorFlow/Keras, Keras Tuner, Plotly
  • PyQt6 for signal-slot communication (if used in a GUI application)

Class: LSTMTuner

Handles data preprocessing, model training, and hyperparameter tuning.

Attributes:

  • df: The Pandas DataFrame containing historical asset price data.
  • N_INPUT: Number of future time steps to predict.
  • N_FEATURES: Number of targets to predict (price only).
  • BATCH_SIZE: Number of sequences in each training batch.
  • TRAIN_SPLIT: Percentage of data reserved for testing.
  • SEQUENCE_SIZE: Number of time steps per input sequence.

Methods:

  • pre_process(): Normalizes the data, splits it into train/test sets, and creates sequences for LSTM.
  • create_lstm_model(hp): Defines an LSTM model with tunable hyperparameters.
  • do_tuning(): Runs Keras Tuner to optimize hyperparameters.
  • run(): Executes preprocessing and hyperparameter tuning.

Hyperparameter Tuning Details

  • Tuning Method: RandomSearch
  • Search Space:
    • LSTM Units: 32 to 256 (step 32)
    • Activation Function: elu, relu, tanh, sigmoid
    • Dropout: 0.1 to 0.5 (step 0.1)
    • Optimizer: SGD, RMSprop, Adam
    • Number of LSTM Layers: 2 to 4
    • Bidirectionality: True/False
  • Early Stopping: Monitors validation loss, stops training if no improvement.
  • Best Parameters Storage: Results saved as JSON.

Output Example (JSON Format)

{
    "bidirectional": true,
    "units": 160,
    "activation": "tanh",
    "dropout_1": 0.1,
    "n_layers": 2,
    "lstm_0_units": 64,
    "dropout_2": 0.30000000000000004,
    "lstm_final_units": 64,
    "dropout_last": 0.4,
    "loss": "mean_squared_error",
    "optimizer": "SGD",
    "lstm_1_units": 128,
    "dropout_3": 0.1
}

Conclusion This script efficiently optimizes LSTM hyperparameters for financial time series forecasting, ensuring robust performance by leveraging Keras Tuner.

Related Stories