Current Bitcoin or NVIDEA Data for Price Prediction

Current Data for Price Prediction

Stock price prediction is hot, so is trying to predict the Bitcoin price for the coming hours or days. You just have to look at GeeksforGeeks, Medium or search on GitHub. Training Machine Learning models, such as advanced LSTM (Long Short Term Memory) models, or the slightly less complicated but very effective models from the ARIMA (Autoregressive Integrated Moving Average) family, or classification (Gradient Boost, Random Forest) and regression (Logistic or Ridge) models requires large amounts of data: time series consisting of many tens of thousands of data points containing years of the hourly price history of a financial asset such as a Ethereum (ETH), Microsoft (MSFT), gold or silver are the rule rather than the exception. For an introduction into using AI models for Technical Analysis see this post.

Current Bitcoin or NVIDEA Data for Price Prediction

Once we have trained a model on its history from the past years up to the present day, we can save this model and then use it repeatedly for a certain period, as long as the circumstances do not change significantly, to predict short-term price developments with a limited set of the most recent data. We download current Bitcoin or NVIDEA data for price prediction, load a model previously trained and saved on this or a related asset and use it to predict the price development of this asset for the coming hours. Compared to the original training of the model, this can be done very quickly, a matter of a few minutes instead of (sometimes several) hours!

Complete up-to-data price data

For this, we need a module that quickly downloads the most current data and makes it available in the form of a standardized Pandas Dataframe, such as the model in question, that we want to reuse, expects. That is the purpose of the GetCurrent class. It is designed to be used in a PyQt GUI application but can easily be used as a stand alone Python script.

The getCurrent Class

# Copyright (c) 2024 Hans De Weme
# Licensed under the MIT License (https://opensource.org/licenses/MIT).
# Class getCurrent 
# Purpose: collect most recent 500 hourly datapoints for crypto or stock assets, 
# trim fiels not needed by calling module, offer to manually add most recent value if missing using callback function
# return the obtained time series as a Pandas Dataframe
import requests
import pandas as pd
import numpy as np
import os
from   datetime     import datetime, timedelta
import yfinance as yf
from   PyQt6.QtCore import QObject
from time_handle import handleTime

# class init arguments:
# asset    - crypto asset to collect data from Binance / stock ticker to collect from Yahoo Finance
# settings - json object used as Python dictionary
# Binance doc:  https://binance-docs.github.io/apidocs/spot/en/#kline-candlestick-data
# All timestamps from Binance's REST API are in UTC (milliseconds since epoch):
'''
[
  1499040000000,      // Open time (UTC in ms)
  "0.01634790",       // Open
  "0.80000000",       // High
  "0.01575800",       // Low
  "0.01577100",       // Close
  "148976.11427815",  // Volume
  1499644799999,      // Close time (UTC in ms)
  ...
]
'''


class getCurrent(QObject):       
    def __init__(self, kind, asset, trim, input_callback=None):    
        super().__init__()                                                  # necessary for QObject, needed for pyqtSignal (currently not used!)         
        self.callback = input_callback
        self.kind = kind
        self.trim = trim                                                    # trim data set down to just 'close' or keep open, high, low, volume, number trades
        self.FREQ = '1h'
        self.asset    = asset
        self.df       = None
        self.time_handle = handleTime('settings.json')
        
    def get_data(self, kind):
        if kind == 'C':
            self.MARKT    = self.asset+'USDT'
            suc, data = self.download_data(self.MARKT, self.FREQ)
            if suc == False:
                print('* * * No Data obtained for this asset from Binance * * *')
                return suc
        elif kind == 'S':
            suc, data = self.download_stock_data(self.asset, self.FREQ)
            if suc == False:
                print('* * * No Data obtained for this asset from Yahoo Finance * * *')
                return suc            
        else:
            print('Kind unknow: '+kind)
            return False
        self.df  = pd.DataFrame(data)
        if self.df.empty:
            print('* * * Time Series Data missing  * * * ')
            return False
        suc = True
        if self.trim == True:
            self.df.drop(['open', 'high', 'low', 'volume', 'number_of_trades'], axis = 'columns', inplace = True)  # keep only 'close' price for LSTM and SARIMAX training
        return suc

    def download_data(self, markt, interval):                           # Download Binance most recent data and store in dataframe - last 500 datapoints
        columns = ['open_time', 'open', 'high', 'low', 'close', 'volume', 'close_time', 'quote_asset_volume', 'number_of_trades', 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume', 'ignore']
        print(f'Downloading data for {markt}. Interval {interval}.')
        tick_interval = '1h'                                            # get most recent hourly data from Binance and save in the current dir
        url  = 'https://api.binance.com/api/v3/klines?symbol='+self.MARKT+'&interval='+tick_interval
        try:
            data = requests.get(url).json()
        except:
            suc = False
            return
        suc = True
        df   = pd.DataFrame(data)
        now  = datetime.now() 
        d = now.strftime("%d")
        m = now.strftime("%m")
        j = now.strftime("%Y")
        NU = j+'-'+m+'-'+d
        current = self.MARKT+'-'+NU+'.csv'
        df.to_csv(current, header=columns, index=False)
        print('\n* * * latest data collected from Binance')
        df = pd.read_csv(current) 
        os.remove(current)
        df.columns = columns
        df['dt'] = pd.to_datetime(df['open_time'], unit='ms', origin='unix') 
        df['dt'] = df['dt'].dt.tz_localize('UTC')  # <- this line is critical for time-zone conversion 
        df.drop(['open_time', 'close_time', 'quote_asset_volume', 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume','ignore'], axis = 'columns', inplace = True)                  
        # set index
        df=df[~np.isnan(df)]
        df=df.drop_duplicates()
        df.set_index('dt', inplace=True)
        df = df.sort_index()            
        df = self.time_handle.convert_dataframe_timezone(df, self.time_handle.tzone, original_tz='UTC')  # convert                         
        full_range = pd.date_range(start=df.index.min(), end=df.index.max(), freq='H')  # reindex with full hourly range and check for missing hours
        df_full = df.reindex(full_range)
        missing_times = df_full[df_full.isnull().any(axis=1)].index
        if len(missing_times) > 3:
            print(f"\n* * * MISSING DATA AT: {missing_times}")
            aantal = len(missing_times)
            print('Number Data Points missing: '+str(aantal)) 
            df['close'] = df['close'].interpolate(method='spline', order=3)
            print('* * * Missing values filled in with spline order 3 * * * ')
        else:
            print("\n* * * No missing data detected. * * *")  
        return suc, df
    
    def download_stock_data(self, markt, interval):                     # Download Yahoo Finance most recent data and store in dataframe - last 500 datapoints
        print(f'Downloading data for {self.asset}. Interval {interval}.')
        suc = False    
        data =yf.download(self.asset, period='100d', interval='1h')
        data = pd.DataFrame(data)
        if data.empty:
            print("Failed to retrieve current data from Yahoo Finance")
        if data.index.name is None:
            data.index.name = 'Datetime'
        if data.index.name != 'Datetime':
            data.index.name = 'Datetime'
        data.drop(['Adj Close'], axis = 'columns', inplace = True)      # drop not needed column
        data.rename(columns={'Open': 'open', 'High': 'high', 'Low': 'low', 'Close': 'close', 'Volume': 'volume'}, inplace=True) # rename columns to generic names 
        data['number_of_trades'] = pd.Series(0.0, index=data.index, dtype='float64')
        data.reset_index(inplace=True)                                  # reset index                
        data.rename(columns={'Datetime': 'dt'}, inplace=True)
        data['dt'] = pd.to_datetime(data['dt']).dt.tz_localize('UTC')   # localize time-zone
        data['close'] = data['close'].astype(float)          
        data.set_index('dt', inplace=True)
        df=data[~np.isnan(data)]                                        # clean up
        df=df.drop_duplicates()
        df = self.time_handle.convert_dataframe_timezone(df, self.time_handle.tzone, original_tz='UTC')  # convert
        df = df.sort_index()
        suc = True
        return suc, df

GetCurrent Overview

The getCurrent class is designed to retrieve the most recent current 500 hourly datapoints for cryptocurrency or stock assets. It trims unnecessary fields, handles missing data, and can invoke a callback function to manually add the most recent value if it is missing. The resulting dataset is returned as a Pandas DataFrame for use with pre-trained ML models.


Key Features

Supports Multiple Asset Types: Fetch data for cryptocurrency assets from Binance or stock assets from Yahoo Finance.

Field Trimming: Retain only essential fields such as the close price, or keep all fields for advanced analysis.

Missing Data Handling: Detects and fills missing data using spline interpolation.

Callback for User Input: Allows manual entry of the most recent value when API latency causes missing records.

Output Format: Returns a cleaned and indexed Pandas DataFrame for easy integration with data uses in ML models.


Functional Details

Initialization

__init__(kind, asset, trim, input_callback=None)

Arguments:

kind (str): Type of asset (‘C’ for cryptocurrency, ‘S’ for stock).

asset (str): Asset identifier (e.g., “BTC” for cryptocurrency, “AAPL” for Apple stock).

trim (bool): Whether to trim the dataset to include only the ‘close’ column.

input_callback (callable, optional): A function to invoke for manual entry of the latest value if missing.


Methods

get_data(kind)

Fetches data for the specified asset type.

Arguments:

  • kind (str): Asset type (‘C’ or ‘S’).

Returns:

  • suc (bool): Success status of the data retrieval.
    • self.df (DataFrame): Cleaned and indexed time series data.

Process Flow:

  • Calls download_data for cryptocurrency or download_stock_data for stocks.
    • Handles trimming if self.trim is True.

download_data(markt, interval)

Fetches the most recent 500 hourly data points from Binance.

Arguments:

  • markt (str): The market symbol (e.g., “BTCUSDT”).
  • interval (str): Data frequency (e.g., “1h”).

Returns:

  • suc (bool): Success status.
    • df (DataFrame): Processed time series data.

Additional Features:

  • Saves data temporarily as a CSV file for processing.
    • Detects missing hourly records and interpolates values using spline order 3.

download_stock_data(markt, interval)

Fetches the most recent 500 hourly data points from Yahoo Finance.

Arguments:

  • markt (str): Asset ticker symbol.
    • interval (str): Data frequency (e.g., “1h”).

Returns:

  • suc (bool): Success status.
    • df (DataFrame): Processed time series data.
  • Additional Features:
    • Renames columns to standardized names.
    • Drops irrelevant fields such as ‘Adj Close’.

Functional Details

Dependencies

requests: For API calls to Binance.

pandas: For data manipulation and cleaning.

numpy: For numerical operations and handling missing values.

os: For file management.datetime: For handling timestamps.

yfinance: For retrieving stock data from Yahoo Finance.

PyQt6.QtCore: For integrating with PyQt applications.

Class Attributes

FREQ (str): Frequency of the data, set to “1h”.

df (DataFrame): Holds the fetched and processed data.

Error Handling

API Failures: Prints error messages if data retrieval fails.

Missing Data: Identifies missing hourly records and fills them using spline interpolation.

User Input: Invokes input_callback to manually input missing values when necessary.

Output

The class processes data into a Pandas DataFrame with:

Index: Datetime.

Columns: ‘open’, ‘high’, ‘low’, ‘close’, ‘volume’, ‘number_of_trades’ (if not trimmed).


Example Usage

# Example callback function

def user_input_callback():

    return float(input(“Enter the most recent value: “))

# Initialize the class

getter = getCurrent(kind=’C’, asset=’BTC’, trim=True, input_callback=user_input_callback)

# Fetch data

data_retrieved = getter.get_data(kind=’C’)

if data_retrieved:

    print(getter.df.head())

else:

    print(“Failed to retrieve data.”)

 

Related Stories