In [1]: import pandas as pd
In [2]: pd.Timestamp("2016-01-01", tz="Europe/Berlin") - pd.Timestamp("now", tz="UTC")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-691a8df26ecd> in <module>()
----> 1 pd.Timestamp("2016-01-01", tz="Europe/Berlin") - pd.Timestamp("now", tz="UTC")
pandas\tslib.pyx in pandas.tslib._Timestamp.__sub__ (pandas\tslib.c:23697)()
TypeError: Timestamp subtraction must have the same timezones or no timezones
If both timestamps have a timezone specified, the result of this operation is perfectly well-defined. It's quite surprising that I have to riddle my code with lhs.tz_convert("UTC") - rhs.tz_convert("UTC") lines to get the difference of timestamps.
Timedelta('-393 days +06:29:07.057926')
pd.show_versions()commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: 2.4.0
xlrd: None
xlwt: None
xlsxwriter: 0.9.6
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8.1
boto: None
pandas_datareader: None
we could certainly make this work
generally though if someone is trying to subtract different time zones it's an error (they meant to convert)
but maybe that's not typical
@jreback should it be implemented for 'addition operation' too?
Because it will raise the same error while adding two timestamps of different timezones
AND
which side of the operator should be given preference here, i.e. to subtract we ultimately need them to be in same timezones. So should tz=Europe/Berlin should be converted to UTC or UTC should be converted to Europe/Berlin?
addition of timestamps is not a meaningful operation
you will have to handle both sub and rsub
both are converted to UTC before subtraction
I am actually now a bit -0 on this as this is a touch too magical. This should be an explicit operation. In reality it doesn't come up that much, so making this a user-explicit operation should not be much of a burden.
But will leave the issue open for discussion / implementation.
closing this, we are moving towards things being very explicit.
I really don't see any magic here. Every non-naive Timestamp specifies a unique point on a common time-axis, completely independent of the timezone being used, so their difference is unambiguously defined as the distance on that time-axis.
Contrary to what you said before, subtracting timestamps with different timezones /is/ a valid action that can happen in particular in library code, e.g. if I provide a function time_to_some_event(ts), where event is given by some backend system's value (i.e. machine to machine communication, so usually UTC) when ts comes from user code, so it will probably have a non-UTC timezone attached.
If you want to be explicit, what you should not allow is subtraction of naive timestamps, as this gives you essentially random values if the naive timestamps don't happen to be UTC.
we raise for all comparisons between differing tz's (whether tz is UTC or naive or another zone).
this is complicated by the fact that we generally turn strings into naive timestamps, xref
xref #18435
subtracting in different timezones would be valid, but is just plain confusing, forcing folks to put things in the same timezone to subtract is not very burdensome and is much much more explicit.
we raise for all comparisons between differing tz's (whether tz is UTC or naive or another zone).
That is not really true, for comparisons we allow this:
In [13]: pd.Timestamp('2016-01-01', tz='Europe/Brussels') > pd.Timestamp('2017-01-01', tz='UTC')
Out[13]: False
In [14]: pd.DatetimeIndex([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Timestamp('2017-01-01', tz='UTC')
Out[14]: array([False], dtype=bool)
In [15]: pd.Series([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Timestamp('2017-01-01', tz='UTC')
Out[15]:
0 False
dtype: bool
In [16]: pd.Series([pd.Timestamp('2016-01-01', tz='Europe/Brussels')]) > pd.Series([pd.Timestamp('2017-01-01', tz='UTC')])
Out[16]:
0 False
dtype: bool
see my comment in the linked issue
these should all raise (on the list)
these should all raise
I would say that is up for debate. We currently allow it, and I don't think there is anything ambiguous about what the result should be. Why breaking backwards compatibility to start erroring on this?
Also, the actual issue in the linked comment is about tz-naive vs tz-aware timestamps. Of course neither comparison nor difference make sense if tz-naive timestamps are involved (even naive vs naive dubious, cf. pd.Timestamp("DST-Day 02:30") < pd.Timestamp("DST-Day 02:31")).
They are just time..and should always contain tz info natively,and so they could compare.
I know this is long closed, but if all of your Timestamps are in the same timezone, why are you squandering cycles on time zones at all? They're only relevant if you have different events in different time zones.
Most helpful comment
I would say that is up for debate. We currently allow it, and I don't think there is anything ambiguous about what the result should be. Why breaking backwards compatibility to start erroring on this?