Pandas: Timestamp subtraction should work for differing timezones

Created on 28 Jan 2017  路  13Comments  路  Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd
In [2]: pd.Timestamp("2016-01-01", tz="Europe/Berlin") - pd.Timestamp("now", tz="UTC")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-691a8df26ecd> in <module>()
----> 1 pd.Timestamp("2016-01-01", tz="Europe/Berlin") - pd.Timestamp("now", tz="UTC")

pandas\tslib.pyx in pandas.tslib._Timestamp.__sub__ (pandas\tslib.c:23697)()

TypeError: Timestamp subtraction must have the same timezones or no timezones

Problem description

If both timestamps have a timezone specified, the result of this operation is perfectly well-defined. It's quite surprising that I have to riddle my code with lhs.tz_convert("UTC") - rhs.tz_convert("UTC") lines to get the difference of timestamps.

Expected Output

Timedelta('-393 days +06:29:07.057926')

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: 2.4.0
xlrd: None
xlwt: None
xlsxwriter: 0.9.6
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8.1
boto: None
pandas_datareader: None

API Design Timedelta Timeseries Timezones

Most helpful comment

these should all raise

I would say that is up for debate. We currently allow it, and I don't think there is anything ambiguous about what the result should be. Why breaking backwards compatibility to start erroring on this?

All 13 comments

we could certainly make this work

generally though if someone is trying to subtract different time zones it's an error (they meant to convert)

but maybe that's not typical

@jreback should it be implemented for 'addition operation' too?
Because it will raise the same error while adding two timestamps of different timezones
AND
which side of the operator should be given preference here, i.e. to subtract we ultimately need them to be in same timezones. So should tz=Europe/Berlin should be converted to UTC or UTC should be converted to Europe/Berlin?

addition of timestamps is not a meaningful operation

you will have to handle both sub and rsub
both are converted to UTC before subtraction

I am actually now a bit -0 on this as this is a touch too magical. This should be an explicit operation. In reality it doesn't come up that much, so making this a user-explicit operation should not be much of a burden.

But will leave the issue open for discussion / implementation.

closing this, we are moving towards things being very explicit.

I really don't see any magic here. Every non-naive Timestamp specifies a unique point on a common time-axis, completely independent of the timezone being used, so their difference is unambiguously defined as the distance on that time-axis.

Contrary to what you said before, subtracting timestamps with different timezones /is/ a valid action that can happen in particular in library code, e.g. if I provide a function time_to_some_event(ts), where event is given by some backend system's value (i.e. machine to machine communication, so usually UTC) when ts comes from user code, so it will probably have a non-UTC timezone attached.

If you want to be explicit, what you should not allow is subtraction of naive timestamps, as this gives you essentially random values if the naive timestamps don't happen to be UTC.

we raise for all comparisons between differing tz's (whether tz is UTC or naive or another zone).

this is complicated by the fact that we generally turn strings into naive timestamps, xref

xref #18435

subtracting in different timezones would be valid, but is just plain confusing, forcing folks to put things in the same timezone to subtract is not very burdensome and is much much more explicit.

we raise for all comparisons between differing tz's (whether tz is UTC or naive or another zone).

That is not really true, for comparisons we allow this:

In [13]: pd.Timestamp('2016-01-01', tz='Europe/Brussels') > pd.Timestamp('2017-01-01', tz='UTC')
Out[13]: False

In [14]: pd.DatetimeIndex([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Timestamp('2017-01-01', tz='UTC')
Out[14]: array([False], dtype=bool)

In [15]: pd.Series([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Timestamp('2017-01-01', tz='UTC')
Out[15]: 
0    False
dtype: bool

In [16]: pd.Series([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Series([pd.Timestamp('2017-01-01', tz='UTC')])
Out[16]: 
0    False
dtype: bool

see my comment in the linked issue
these should all raise (on the list)

these should all raise

I would say that is up for debate. We currently allow it, and I don't think there is anything ambiguous about what the result should be. Why breaking backwards compatibility to start erroring on this?

Also, the actual issue in the linked comment is about tz-naive vs tz-aware timestamps. Of course neither comparison nor difference make sense if tz-naive timestamps are involved (even naive vs naive dubious, cf. pd.Timestamp("DST-Day 02:30") < pd.Timestamp("DST-Day 02:31")).

They are just time..and should always contain tz info natively,and so they could compare.

I know this is long closed, but if all of your Timestamps are in the same timezone, why are you squandering cycles on time zones at all? They're only relevant if you have different events in different time zones.

Was this page helpful?
0 / 5 - 0 ratings