Synchronizing Data Frames from Different Time Zones

Synchronizing data frames from different time zones is a requirement to be able to work consistently with a uniform time indication especially with (price) data from different sources. Imagine, you download the current price data of Bitcoin via the Binance API, you load it into a spreadsheet and look at the most recent data after you have converted the Unix ‘timestamp’ (the number of seconds since January 1, 1970) into a comprehensible date and time. Then you suddenly notice that the numbers you are looking at seem to be at least an hour old. You open CoinGecko, zoom in on Bitcoin and then it starts to dawn on you: you do have the latest numbers, but the time is ‘wrong’. A moment later you realize, the time is not ‘wrong’, it is the time of a different time zone!

Risk of Disinformation

Binance works with UTC by default and you use CET, or CEST, and that makes a difference of 1 or, in the case of summer time, 2 hours. This problem becomes really serious when you automatically connect different data frames with each a datetime index without considering that the data can come from different time zones. Then the combined data does not provide information but disinformation! This must be prevented and can be done without too much effort.

A Script to Sync

What we need is a Python script that can perform the following tasks:

  • Get the ‘preferred_time_zone’ from a settings.json or an in-memory settings dict;
  • Check the current system time and time zone of the computer;
  • Check these with a time server (NTP or similar) and reset if necessary;
  • If the preferred_time_zone is missing, use the current system zone and save back;
  • convert a DataFrame’s datetime index (e.g., from UTC) to the preferred timezone (for example CET), respecting DST (Daylight Saving Time)!

The handleTime Class for Synchronizing data frames from different time zones

A dedicated class for synchronizing data frames from different time zones. As always we start with importing the needed libraries: json, to read/write the settings, os, to communicate with the operating system, pytz, knows about time-zones, ntplib, provides access to a timeserver, platform, extends os, datetime, is obvious and PyQt to integrate into a GUI application, but the class can be used with Jupyter or in a CL terminal application just as well.

We use this class in all our data gathering and preprocessing.

import json
import os
import pytz
import ntplib
import platform
from   datetime       import datetime, timezone
import pandas as pd
from   PyQt6.QtCore import QObject, pyqtSignal
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning, message="sipPyTypeDict.*")
Python

Initializing

The script must be called with settings information, also a calling ‘parent’ can be added. The class loads the preferred_time_zone from the settings or, if missing, sets it to the current -system- time-zone.

class handleTime(QObject):
    progress_signal = pyqtSignal(str)                               # Signal to communicate progress (string message) back to the main thread               
    def __init__(self, set, parent=None):
        super().__init__()       
        if set is None:
            print("Error, no settings recieved")
            return
        self.parent =None
        if parent != None:    
            self.parent = parent
        self.tzone, self.settings = self.load_timezone_from_settings(set)
        print(f"preferred_time_zone is set to:  {self.tzone}")
        if self.check_and_sync_system_time() == True:
            print('The current time settings are correct!')
        
    def load_timezone_from_settings(self, settings):
        if isinstance(settings, str) and os.path.isfile(settings):
            with open(settings, 'r') as f:
                settings = json.load(f)
        preferred_tz = settings.get("preferred_time_zone")
        if preferred_tz is None:
            preferred_tz = datetime.now().astimezone().tzinfo.zone
            settings["preferred_time_zone"] = preferred_tz
            print(f"Preferred time zone not found. Set to current: {preferred_tz}")
        return preferred_tz, settings
Python

Check, Sync and Convert

You can now check & sync your system against a timeserver.

What really matters is the convert_dataframe_timezone method you can use in your code to make dataframes with datetime indices compatible. This is usually done in two steps:

  1. explicitly assign the current time zone to the index;
  2. convert it to wanted time zone.
    def check_and_sync_system_time(self, allowed_skew_seconds=2):
        allowed_skew_seconds = float(allowed_skew_seconds)
        try:
            client = ntplib.NTPClient()
            response = client.request('pool.ntp.org', version=3)
            ntp_time = datetime.fromtimestamp(response.tx_time, tz=timezone.utc)
            system_time = datetime.now(timezone.utc)            
            preferred_tz = pytz.timezone(self.tzone)  # self.tzone = 'CET' or 'Europe/Amsterdam', etc.
            now_local = datetime.now(preferred_tz)
            skew = abs((ntp_time - system_time).total_seconds())
            print(f"NTP time: {ntp_time}, System time: {system_time}, Skew: {skew:.3f} sec, Local time: {now_local}")
            if skew > allowed_skew_seconds:
                if platform.system() == "Windows":
                    print("Warning: Automatic time sync not supported via Python on Windows. Please sync manually.")
                elif os.geteuid() != 0:
                    print("Warning: Need to be root to set system time. Skipping.")
                else:
                    os.system(f'date -s "@{int(response.tx_time)}"')
                    print("System time updated.")
            return True
        except Exception as e:
            print(f"Time synchronization failed: {e}")
            return False

    def convert_dataframe_timezone(self, df, preferred_tz=None, original_tz=None):
        if preferred_tz is None:
            preferred_tz = self.tzone
        if not isinstance(df.index, pd.DatetimeIndex):
            raise ValueError("DataFrame index must be a DatetimeIndex")
        df = df.copy()
        if df.index.tz is None:
            df.index = df.index.tz_localize('UTC')
        df.index = df.index.tz_convert(preferred_tz)
        return df
Python

Example usage

Here’s a short piece of code that exactly shows how to use this class.

from time_handle import handleTime
.
.
      self.time_handle = handleTime('settings.json')
.
.
        df = pd.read_csv(current) 
        os.remove(current)
        df.columns = columns
        df['dt'] = pd.to_datetime(df['open_time'], unit='ms', origin='unix')  
        df['dt'] = df['dt'].dt.tz_localize('UTC')  # <- this line is critical for time-zone conversion 
        df.drop(['open_time', 'close_time', 'quote_asset_volume', 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume','ignore'], axis = 'columns', inplace = True)                  
        df=df[~np.isnan(df)]
        df=df.drop_duplicates()
        df.set_index('dt', inplace=True)                                # set index
        df = df.sort_index()                                            # reindex with full hourly range and check for missing hours
        df = self.time_handle.convert_dataframe_timezone(df, self.time_handle.tzone, original_tz='UTC')  # convert
Python

Related Stories