Synchronizing data frames from different time zones is a requirement to be able to work consistently with a uniform time indication especially with (price) data from different sources. Imagine, you download the current price data of Bitcoin via the Binance API, you load it into a spreadsheet and look at the most recent data after you have converted the Unix ‘timestamp’ (the number of seconds since January 1, 1970) into a comprehensible date and time. Then you suddenly notice that the numbers you are looking at seem to be at least an hour old. You open CoinGecko, zoom in on Bitcoin and then it starts to dawn on you: you do have the latest numbers, but the time is ‘wrong’. A moment later you realize, the time is not ‘wrong’, it is the time of a different time zone!
Risk of Disinformation
Binance works with UTC by default and you use CET, or CEST, and that makes a difference of 1 or, in the case of summer time, 2 hours. This problem becomes really serious when you automatically connect different data frames with each a datetime index without considering that the data can come from different time zones. Then the combined data does not provide information but disinformation! This must be prevented and can be done without too much effort.
A Script to Sync
What we need is a Python script that can perform the following tasks:
- Get the ‘preferred_time_zone’ from a settings.json or an in-memory settings dict;
- Check the current system time and time zone of the computer;
- Check these with a time server (NTP or similar) and reset if necessary;
- If the preferred_time_zone is missing, use the current system zone and save back;
- convert a DataFrame’s datetime index (e.g., from UTC) to the preferred timezone (for example CET), respecting DST (Daylight Saving Time)!
The handleTime Class for Synchronizing data frames from different time zones
A dedicated class for synchronizing data frames from different time zones. As always we start with importing the needed libraries: json, to read/write the settings, os, to communicate with the operating system, pytz, knows about time-zones, ntplib, provides access to a timeserver, platform, extends os, datetime, is obvious and PyQt to integrate into a GUI application, but the class can be used with Jupyter or in a CL terminal application just as well.
We use this class in all our data gathering and preprocessing.
import json
import os
import pytz
import ntplib
import platform
from datetime import datetime, timezone
import pandas as pd
from PyQt6.QtCore import QObject, pyqtSignal
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning, message="sipPyTypeDict.*")PythonInitializing
The script must be called with settings information, also a calling ‘parent’ can be added. The class loads the preferred_time_zone from the settings or, if missing, sets it to the current -system- time-zone.
class handleTime(QObject):
progress_signal = pyqtSignal(str) # Signal to communicate progress (string message) back to the main thread
def __init__(self, set, parent=None):
super().__init__()
if set is None:
print("Error, no settings recieved")
return
self.parent =None
if parent != None:
self.parent = parent
self.tzone, self.settings = self.load_timezone_from_settings(set)
print(f"preferred_time_zone is set to: {self.tzone}")
if self.check_and_sync_system_time() == True:
print('The current time settings are correct!')
def load_timezone_from_settings(self, settings):
if isinstance(settings, str) and os.path.isfile(settings):
with open(settings, 'r') as f:
settings = json.load(f)
preferred_tz = settings.get("preferred_time_zone")
if preferred_tz is None:
preferred_tz = datetime.now().astimezone().tzinfo.zone
settings["preferred_time_zone"] = preferred_tz
print(f"Preferred time zone not found. Set to current: {preferred_tz}")
return preferred_tz, settings
PythonCheck, Sync and Convert
You can now check & sync your system against a timeserver.
What really matters is the convert_dataframe_timezone method you can use in your code to make dataframes with datetime indices compatible. This is usually done in two steps:
- explicitly assign the current time zone to the index;
- convert it to wanted time zone.
def check_and_sync_system_time(self, allowed_skew_seconds=2):
allowed_skew_seconds = float(allowed_skew_seconds)
try:
client = ntplib.NTPClient()
response = client.request('pool.ntp.org', version=3)
ntp_time = datetime.fromtimestamp(response.tx_time, tz=timezone.utc)
system_time = datetime.now(timezone.utc)
preferred_tz = pytz.timezone(self.tzone) # self.tzone = 'CET' or 'Europe/Amsterdam', etc.
now_local = datetime.now(preferred_tz)
skew = abs((ntp_time - system_time).total_seconds())
print(f"NTP time: {ntp_time}, System time: {system_time}, Skew: {skew:.3f} sec, Local time: {now_local}")
if skew > allowed_skew_seconds:
if platform.system() == "Windows":
print("Warning: Automatic time sync not supported via Python on Windows. Please sync manually.")
elif os.geteuid() != 0:
print("Warning: Need to be root to set system time. Skipping.")
else:
os.system(f'date -s "@{int(response.tx_time)}"')
print("System time updated.")
return True
except Exception as e:
print(f"Time synchronization failed: {e}")
return False
def convert_dataframe_timezone(self, df, preferred_tz=None, original_tz=None):
if preferred_tz is None:
preferred_tz = self.tzone
if not isinstance(df.index, pd.DatetimeIndex):
raise ValueError("DataFrame index must be a DatetimeIndex")
df = df.copy()
if df.index.tz is None:
df.index = df.index.tz_localize('UTC')
df.index = df.index.tz_convert(preferred_tz)
return df
PythonExample usage
Here’s a short piece of code that exactly shows how to use this class.
from time_handle import handleTime
.
.
self.time_handle = handleTime('settings.json')
.
.
df = pd.read_csv(current)
os.remove(current)
df.columns = columns
df['dt'] = pd.to_datetime(df['open_time'], unit='ms', origin='unix')
df['dt'] = df['dt'].dt.tz_localize('UTC') # <- this line is critical for time-zone conversion
df.drop(['open_time', 'close_time', 'quote_asset_volume', 'taker_buy_base_asset_volume', 'taker_buy_quote_asset_volume','ignore'], axis = 'columns', inplace = True)
df=df[~np.isnan(df)]
df=df.drop_duplicates()
df.set_index('dt', inplace=True) # set index
df = df.sort_index() # reindex with full hourly range and check for missing hours
df = self.time_handle.convert_dataframe_timezone(df, self.time_handle.tzone, original_tz='UTC') # convert
Python