Fear & Greed and Sentiment Indices as Exogenous Features

We are all more or less familiar with Sentiment Indices like the CBOE Volatility Index (VIX) or the CNN Fear and Greed Index. When building time series forecasting models, tuning the model’s internal parameters (the ‘hyperparameters’) can significantly improve the model’s performance. Exogenous variables like these kind of indices on the other hand are factors that may significantly influence the target variable (e.g. the asset’s price development we want to predict) as well. By including these kind of variables in a SARIMAX or a regression model, we can enhance the model’s ability to capture relationships that are not apparent in the target series alone.

Why Exogenous Features Are Important

Incorporating relevant exogenous features such sentiment indices can improve forecast accuracy, particularly when external factors significantly impact the time series. For example, in a sales forecasting model, promotional events, holidays, and economic indicators can provide valuable context. The same is true for all kinds of financial markets sentiments that can and do influence the price development of financial assets like stock or crypto coins.

How to Select and Use Sentiment Indices

Considering which external factors may influence the target variable, and, also not unimportant, if and how these data is available, is called ‘feature selection’.  Use domain knowledge to identify external factors and leverage techniques like correlation analysis to guide your selection.

Data preparation. In order to be able to use them we must ensure that the exogenous features are aligned with the target variable in terms of (our hourly) time frequency. This might involve resampling or aggregating the data to a common temporal resolution. These kind of data manipulations are sometimes called ‘feature engineering’.

In the getSentiment class discussed below we use two methods. One to obtain the well-known Crypto Fear & Greed Index that reflects the overall sentiment on the crypto market with values between 0 – 100 indicating the bottom of a bear market versus an overheated bull market.

The other one queries Yahoo Finance and can be used to obtain the S&P 500 index or the CBOE Volatility Index that provide the same kind of information for the overall stock market.

Below we present first the getSentiment class followed by an code fragment illustrating how it can be used to enrich an existing data frame that allready contains the time series data for the target asset.

The getSentiment Class

# Copyright (c) 2024 Hans De Weme
# Licensed under the MIT License (https://opensource.org/licenses/MIT).
# Class getSentiment 
# Purpose: collect sentiment data (Crypto Fear & Greed Index, S&P Index, S&P VIX  to be used as extragenaous variables (extra features) 
# when forcasting the price (development) of aan assett with LSTM, SARIMAX or classification/regression Machine Learning models
import requests
import pandas as pd
import yfinance as yf
from   PyQt6.QtCore import QObject
from   time_handle import handleTime

# class init arguments: None

class getSentiment(QObject):       
    def __init__(self):    
        super().__init__()                                                  # necessary for QObject, needed for pyqtSignal (currently not used!) 
        self.df       = None
         self.time_handle = handleTime('settings.json')                      # needed for time-zone conversion
              
    def get_fagi_data(self):                                                # Crypto Fear & Greed data
        url = "https://api.alternative.me/fng/?limit=0"
        try:
            response = requests.get(url)
        except:
            print("Failed to retrieve Fear and Greed data!")
            return None
        df = pd.DataFrame(response.json()['data'])
        df.head(12)
        df['timestamp'] = pd.to_datetime(pd.to_numeric(df['timestamp']), unit='s').dt.tz_localize('UTC')
        #value	value_classification	timestamp	time_until_update
        df.drop(columns=['time_until_update'], axis = 'columns', inplace = True)
        df.columns = ['value', 'label', 'date']
        df['value'] = df['value'].astype(float)
        df['date'] = pd.to_datetime(df['date'])
        df.set_index('date', inplace=True)
        new_index = pd.date_range(start=df.index.min(), end=pd.Timestamp.now().floor('H'), freq='h')
        df = df.reindex(new_index, method=None)
        df['value'] = df['value'].interpolate(method='linear')
        df['label'] = df['label'].ffill()  # Forward fill the labels        
        df.index.name = 'dt'
        df = self.time_handle.convert_dataframe_timezone(df, self.time_handle.tzone, original_tz='UTC')  # convert time-zone to preferred time-zone
        print(df.tail())
        return df

    def get_indices_data(self, asset, colnm):
        self.asset = asset 
        interval = '1h'        
        data = yf.download(self.asset, interval = interval, period = '2y') # historical data
        df1 = pd.DataFrame(data)
        if df1 is None or df1.empty:
            print("Failed to retrieve historical data")
            return None
        data = yf.download(self.asset, period='10d', interval='1h')
        df2 = pd.DataFrame(data)
        if df2 is None or df2.empty:
            print("Failed to retrieve current data")
            return None
        
        df_concat = pd.concat([df1, df2])                                 # combine and drop duplicates
        df = df_concat[~df_concat.index.duplicated(keep='first')].copy() 
        df.index.name = 'Datetime'
        
        columns_to_drop = ['Adj Close', 'Open', 'High', 'Low', 'Volume']
        df.drop(columns=[col for col in columns_to_drop if col in df.columns], axis='columns', inplace=True)
        if 'Close' in df.columns:
            df.rename(columns={'Close': colnm}, inplace=True)
        if colnm in df.columns:
            df[colnm] = df[colnm].astype(float)
        
        df = df.reset_index().rename(columns={'Datetime': 'dt'})
        df.set_index('dt', inplace=True)
        df.dropna(inplace=True)
        df.drop_duplicates(inplace=True)
        df.index = df.index + pd.DateOffset(minutes=30)                 # set index to full hour
        df.index = df.index.tz_convert(self.time_handle.tzone)          # Step 3: Convert to preferred time-zone
        df.index = df.index.tz_localize(None)                           # remove timezone from timestamp
        df = df.sort_index()    
      
        print(self.asset+' Index data: ')
        print(df)
        return df

Here’s a short example how the above class’ methods might be used.

        # load extra sentiment data to use as exogenous features
        if self.EXO == True:
            self.progress_signal.emit('Getting Fear and Greed Sentiment Data')
            self.getdata= getSentiment()
            self.fag = True
            fg = self.getdata.get_fagi_data()
            if fg is None or fg.empty:
                self.progress_signal.emit('Failed to get Fear and Greed Sentiment Data')
                self.fag = False
            else:
                fg = fg.reindex(self.df.index, method='ffill')         # add fear and greed to features
                self.df = self.df.join(fg)      
                self.df.drop(columns=['label'], axis = 'columns', inplace = True) 
            self.progress_signal.emit('Getting Stock Index Sentiment Data') 
            index = '^GSPC'                                            # '^GSPC' S&P 500 index 
            colnm = 'gspc'
            self.gspc = True
            sp = self.getdata.get_indices_data(index, colnm)           # add S&P500 index to features
            if sp is None or sp.empty:
                self.progress_signal.emit('Failed to get S&P500 Sentiment Data')
                self.gspc = False
            else:           
                sp = sp.reindex(self.df.index, method='ffill')         # add GSPC to features
                self.df = self.df.join(sp, how='left')
                self.df['gspc'] = self.df['gspc'].ffill()
            index = '^VIX'                                             # '^VIX'  CBOE Volatility Index 
            colnm = 'vix'
            self.vix = True
            sp = self.getdata.get_indices_data(index, colnm)           # add volatility index to features
            if sp is None or sp.empty:
                self.progress_signal.emit('Failed to get CBOE Volatility Index Sentiment Data')
                self.vix = False
            else:             
                sp = sp.reindex(self.df.index, method='ffill')         # add VIX to features                                
                self.df = self.df.join(sp, how='left')
                self.df['vix'] = self.df['vix'].ffill()      
            if self.fag == False and self.gspc == False and self.vix == False:
                self.EXO = False
                self.progress_signal.emit('No Additional Features Available for Exogenous Factor with SARIMAX')
            else:
                print(self.df)

Class Overview

The GetSentiment class enriches a time series dataset by collecting sentiment data and financial indices as exogenous variables. These external factors improve the performance of forecasting models such as SARIMAX or classification/regression models by adding additional context to the target asset’s price movements.

The two main sentiment sources are:

Crypto Fear & Greed Index: Measures overall sentiment in the cryptocurrency market (0 = extreme fear, 100 = extreme greed).

Stock Market Indices:

  • S&P 500 Index (^GSPC): Represents stock market performance.
  • CBOE Volatility Index (^VIX): Measures stock market volatility.

The class is implemented in Python and uses external libraries such as requests, pandas, and yfinance.


Class Implementation

Initialization

The GetSentiment class extends QObject for potential PyQt6 integration. Currently, the QObject functionality is not active (e.g., no pyqtSignal in use).


Methods

 get_fagi_data()

This method retrieves the Crypto Fear & Greed Index data from the alternative.me API.

Process:

  • Makes an API request to fetch historical sentiment data.
  • Converts timestamps to human-readable format.
  • Interpolates missing values for the value column (sentiment scores) and forward-fills label values.
  • Resamples the data to hourly intervals.

Returns:

A pandas.DataFrame with the sentiment scores (float) indexed by datetime.

Error Handling:

Prints an error message and returns None if the API request fails.

get_indices_data(asset, colnm)

This method downloads historical and current data for a specified financial index from Yahoo Finance.

Parameters:

  • asset (str): The financial index ticker (e.g., ^GSPC for the S&P 500).
  • colnm (str): Column name for the data in the output DataFrame.

Process:

  • Fetches hourly data from the past two years and 10 days to ensure the dataset is up-to-date.
  • Drops unnecessary columns (Open, High, Low, Adj Close, Volume) and renames Close to the user-defined colnm.
  • Aligns timestamps to CET (Central European Time) and adjusts to full-hour intervals.
  • Combines historical and current data, removes duplicates, and sorts by datetime.

Returns:

A pandas.DataFrame with the specified index data (colnm) and timestamp (dt). If retrieval fails, prints an error message and returns None.


Integration with Exogenous Features

When combined with an asset’s time series data:

  • Fear & Greed Index is forward-filled to match the asset’s timestamps.
  • S&P 500 Index and VIX are re-indexed and aligned with the same timestamps, ensuring compatibility.

The enriched dataset can be used for time series forecasting, allowing the model to account for market sentiment and broader financial indicators.


Technical Notes

  • Dependencies:
  • requests: For API calls to fetch the Fear & Greed Index.
  • pandas: Data handling and manipulation.
  • yfinance: For fetching stock market index data.
  • PyQt6.QtCore.QObject: Placeholder for potential PyQt6 integration.
  • Error Handling:
  • Graceful fallback with printed error messages if any data source fails.
  • Time Zones:
  • Converts timestamps to CET and removes timezone information to simplify downstream processing.

Related Stories