Feature engineering is the process of transforming raw data into relevant information for use by machine learning models. In other words, feature engineering is the process of creating predictive model features. With Machine Learning (ML) we practice forms of reinforced learning, feedback based learning, in which we use ‘independent variables’ to model the ‘dependent variable’.
The independent variables are called: features, predictors or explanatory variables. In time series, features can be:
- Lagged variables (past values of the same series or related ones);
- Derived indicators (technical indicators or transformations);
- Exogenous variables (sentiment indices, macroeconomic data).
Though the terms are often used interchangeably, ‘feature’ is more ML-centric, ‘predictor’ is more statistical and ‘explanatory variable’ suggests a causal intuition (though not necessarily proven).
The dependent variable, also called outcome or target is what we’re trying to model. This can either be regression → continuous (e.g., future price, return) or classification → categorical (e.g., up/down, bull/bear, volatility regimes). Time series models often convert continuous future variables (like returns) into discrete classes (e.g., +1 if return > 0, else -1) to simplify classification.
Feature Engineering for Financial Time Series
Feature engineering is both an art and a science—particularly in time series where temporal dependencies, autocorrelation, and non-stationarity dominate. A fundamental distinction in the nature of features is that between internal and external variables.
📈 Internal Features (Endogenous)
These are features derived directly from the asset’s historical price or volume:
Momentum-Based
- RSI (Relative Strength Index) – recent gains vs. losses.
- Stochastic Oscillator (SO) – compares close to high-low range.
- Rate of Change (RoC) – percent change over
nperiods. - Williams %R – similar to SO but inverted scale.
Trend-Following / Smoothing
- Simple Moving Average (SMA) / Exponential Moving Average (EMA)
- MACD – differential of two EMAs.
- STC – combines MACD and cycle concepts.
Volatility-Based
- Bollinger Bands – moving average ± k standard deviations.
- Keltner Channel – uses ATR instead of standard deviation.
These indicators often introduce lags but can be de-noised or combined for richer signals.
🌍 External Features (Exogenous)
Market Sentiment
- Fear & Greed Index – aggregates several behavioral metrics.
- VIX (Volatility Index) – expected volatility of S&P500, often a “risk-off” proxy.
Macro/Meso Indicators
- S&P500 Index – reflects broad market sentiment.
- Interest Rates, Inflation, or Exchange Rates (macro layer).
These help contextualize the local dynamics of an asset within broader economic/psychological regimes. In this post we discuss and show you how the get some of these indices.
📊 Feature Engineering Best Practices
Several techniques can be used to enhance the predictive value of features. These include:
⚖️ Normalization
- Crucial for distance-based models or regularized regression.
- Scale time series features with rolling z-score, min-max, or robust scaling.
🔄 Stationarity Checks
- Differencing or log-transforms help with non-stationary inputs.
- Always check for autocorrelation and unit roots (ADF/KPSS tests).
🧩 Interaction Features
- Combine indicators: e.g.,
RSI * Bollinger Width, or relative ratios between short- and long-term MAs.
🔀 Lookahead Bias & Data Leakage
- Avoid using future information.
- All engineered features should be strictly based on information available up to time
t.
Our post on ARIMA – SARIMAX Models Optimizing for Training uses some of these methods.
🛠 Feature Engineering Example: Binary Classification of Price Movement
Target: Will price go up in next 3 hours?
Feature Vector at time t might include:
RSI_14[t],SO_14[t],BB_Width[t]Price[t] / SMA_50[t]– a normalized trend featureSP500_return[t],VIX[t]FearGreedIndex[t]HourOfDay[t],DayOfWeek[t]
We feed this into:
- Logistic Regression
- Tree-based models (XGBoost, LightGBM)
- LSTM (with temporal dependencies encoded)
The MakeFeatures Class for Feature Engineering
The boundary between a feature and an assumption can be subtle in time series. When you select certain technical indicators, you are also embedding a hypothesis about how markets behave (e.g., trend-following, mean-reversion). Feature engineering, therefore, is not just a mechanical task—it’s an epistemological act that encodes your beliefs about market structure into a mathematical form.
The MakeFeatures class is utility module used in Technical Analyses and in predictive modelling. We use it for Engineering the internal Features derived directly from the asset’s historical price. It is ready for use in a PyQt6 GUI application but you can just as easy us it as part of a terminal CL script or a Jupyther Notebook.
Things start with loading the necessary libraries such as pandas, pandas_ta, and the ta library. When initiating we check that a dataframe is present.
The class has 1 callable method: do_make_features(), that orchestrates the feature engineering and cleans up the resulting dataframe before returning it to the caller.
Class MakeFeatures
# Copyright (c) 2025 Hans De Weme
# Licensed under the MIT License (https://opensource.org/licenses/MIT).
# Class MakeFeatures
# Purpose: Engineering Internal Features derived directly from the asset's historical price
"""
Imports necessary libraries such as pandas, pandas_ta, ta
Loads preprocessed data set
Calculates predictors / features:
- Momentum-Based
• RSI (Relative Strength Index) – recent gains vs. losses.
• Stochastic Oscillator (SO) – compares close to high-low range.
• Rate of Change (RoC) – percent change over n periods.
• Williams %R – similar to SO but inverted scale.
- Trend-Following / Smoothing
• Simple Moving Average (SMA) / Exponential Moving Average (EMA)
• MACD – differential of two EMAs.
• Schaff Trend Cycle (STC) – combines MACD and cycle concepts.
- Volatility-Based
• Bollinger Bands – moving average ± k standard deviations.
• Keltner Channel – uses ATR instead of standard deviation.
Cleans up the data frame before returning the results
"""
import pandas as pd
import pandas_ta as pta
from ta.volatility import BollingerBands
from ta.trend import STCIndicator
from ta import momentum
from PyQt6.QtCore import QObject
import warnings
warnings.filterwarnings("ignore")
class MakeFeatures(QObject):
def __init__(self, data):
super().__init__() # necessary for QObject, needed for pyqtSignal
self.df = pd.DataFrame(data)
if self.df.empty:
print('* * * Time Series Data missing * * * ')
return
def do_make_features(self):
D = self.df
print('calculate RSI over complete dataset')
D['RSI'] = pta.rsi(close=D['close'], window=14) # calculate RSI and SO (Stochastic Oscillator)
print('calculate SO over complete dataset')
D[['SO', 'SO3']] = pta.stoch(D['high'], D['low'], D['close'], k=14, d=3, smooth_k=3)
print('calculate RoC over complete dataset')
D['RoC'] = pta.roc(D['close'], length=14) # calculate Rate of Change
print('calculate Wil over complete dataset')
D['Wil'] = momentum.williams_r(D['high'], D['low'], D['close'], lbp=14) # calculate Williams %R
self.calc_macd()
self.calc_BB()
self.calc_STC()
self.do_kelter()
self.clean()
return self.df
def calc_macd(self):
print('calculate MACD over complete dataset')
self.df['20_day_EM'] = self.df['close'].ewm(span=20, adjust=False).mean()
self.df['50_day_EM'] = self.df['close'].ewm(span=50, adjust=False).mean()
self.df['MACD'] = self.df['20_day_EM'] - self.df['50_day_EM']
self.df['Signal_Line'] = self.df['MACD'].ewm(span=7, adjust=False).mean()
def calc_BB(self):
print('calculate BB over complete dataset')
STD_DEV = 2
SMA_PERIOD = 28 # Exchange is open 5 * 4 = 20 days per month, crypto exchanges 7 * 4 = 28
indicator_bb = BollingerBands(close=self.df['close'], window=SMA_PERIOD, window_dev=STD_DEV)
self.df['BB_mid'] = indicator_bb.bollinger_mavg()
self.df['BB_high'] = indicator_bb.bollinger_hband()
self.df['BB_low'] = indicator_bb.bollinger_lband()
# KELTNER CHANNEL CALCULATION
def get_kc(self, high, low, close, kc_lookback, multiplier, atr_lookback):
tr1 = pd.DataFrame(high - low)
tr2 = pd.DataFrame(abs(high - close.shift()))
tr3 = pd.DataFrame(abs(low - close.shift()))
frames = [tr1, tr2, tr3]
tr = pd.concat(frames, axis = 1, join = 'inner').max(axis = 1)
atr = tr.ewm(alpha = 1/atr_lookback).mean()
kc_middle = close.ewm(kc_lookback).mean()
kc_upper = close.ewm(kc_lookback).mean() + multiplier * atr
kc_lower = close.ewm(kc_lookback).mean() - multiplier * atr
return kc_middle, kc_upper, kc_lower
def do_kelter(self):
print('calculate ATR over complete dataset')
self.df['high'] = pd.to_numeric(self.df['high'], errors='coerce') # Convert columns to numeric to avoid string operations
self.df['low'] = pd.to_numeric(self.df['low'], errors='coerce')
self.df['close'] = pd.to_numeric(self.df['close'], errors='coerce')
self.df.dropna()
self.df['kc_middle'], self.df['kc_upper'], self.df['kc_lower'] = self.get_kc(self.df['high'], self.df['low'], self.df['close'], 20, 2, 10)
def calc_STC(self):
# The Schaff Trend Cycle (STC) indicator to identify market trends and potential buy or sell signals.
stc_window_slow = 50 # window_slow is around 50 periods, is 'smoother' trend, less sensitive to price changes
stc_window_fast = 23 # window_fast is around 23 periods to captures the shorter-term price trends
stc_cycle = 10 # cycle indicates sensitivity fot market trends and cycli, default = 10: higher values volatile market, lower sideways
indicator_stc = STCIndicator(close=self.df['close'], window_slow=stc_window_slow, window_fast=stc_window_fast, cycle=stc_cycle, smooth1=3, smooth2=3)
# Add features
self.df['STC'] = indicator_stc.stc()
def clean(self):
# drop columns not needed
self.df.drop(['20_day_EM', '50_day_EM', 'Signal_Line'], axis = 'columns', inplace = True)
self.df = self.df.dropna() # drop NaN values
Python