Pandas: ENH: Timedelta cannot span more than 293 years => implementation limitation

Created on 12 Aug 2020  路  5Comments  路  Source: pandas-dev/pandas

The Pandas datetime arithmetic module is indeed very useful and it can be used for archeological research studies as well. Unfortunately, the limitation that Timedelta object cannot span more than 293 years puts a huge shame on this wonderful piece of data science library. With this limitation, it is not possible to go back for thousands or even more than 300 years in history.

>>> pd.to_timedelta('1Y')*5
Timedelta('1826 days 05:06:00')
>>> pd.to_timedelta('1Y')*292
Timedelta('106650 days 19:26:24')
>>> pd.to_timedelta('1Y')*293
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslibs/timedeltas.pyx", line 1347, in pandas._libs.tslibs.timedeltas.Timedelta.__mul__
  File "pandas/_libs/tslibs/timedeltas.pyx", line 1230, in pandas._libs.tslibs.timedeltas.Timedelta.__new__
  File "pandas/_libs/tslibs/timedeltas.pyx", line 180, in pandas._libs.tslibs.timedeltas.convert_to_timedelta64
  File "pandas/_libs/tslibs/timedeltas.pyx", line 308, in pandas._libs.tslibs.timedeltas.cast_from_unit
OverflowError: Python int too large to convert to C long

In terms of practicality, time resolution and time span are always contradicting with each other. Nowadays, quantum physics often deals with time objects at a scale of nano (10-9), pico (10-12), or even femto(10**-15)-second. While archeology often deals with time objects at a span of thousands, millions, or even billions of years. If I remember correctly, pandas set the time counter base unit at nano-seconds, thus, the span will be short. The solution to cater for both high resolution and large span is to use floating point rather than a large integer, as the time counter base unit. The speed will be slightly slower for floating point. But if you take a look at Intel Architecture, on modern CPU, floating point arithmetic is almost the same as integer especially when SIMD is used.

Enhancement ExtensionArray Timedelta

Most helpful comment

But to re-iterate, I find this limitation for timedeltas really no big deal. What exactly is the usecase?

Astronomy, archeology, and geology are three places where the natural limit will be regularly hit.

All 5 comments

Criticism is good, but the last sentence goes beyond fair criticism imho... So may I ask you in return, why you haven't thought about that earlier, considering how foreseeing you are?

I have removed criticism and proposed a solution. Please take a look!

timedeltas are backed by an int64 which give a reasonable tradeoff between resolution (ns) and time range. This is the same issue w.r.t. Timestamp ranges as well where there have been many discussions. (search the repo).

If someone wants to implement a timedelta64[ms] extension array (and/or [s]) then this limitation can be avoided. This is a non-trivial project and would require some effort.

But to re-iterate, I find this limitation for timedeltas really no big deal. What exactly is the usecase?

But to re-iterate, I find this limitation for timedeltas really no big deal. What exactly is the usecase?

I was doing automated carbon dating at some archeological site, for every item the machine has collected, the computer computes its history by measuring Carbon-13 isotope, so the range can go from a few years to a few thousand years back from now ("now" is different for different runs). Most people in the same field are still using MS-Excel to do manual computation. Honestly speaking, I do not mind that timedelta has a small span, but a span of only 292 years is really too small for many scientific works.

IMO a long-term solution would be to move to a default 128bit (or 96 bit) type which could then have both very precise resolution and a long time span. This is likely even more challenging than that @jreback suggested. FWIW this is not unprecedented as MATLAB has gone down this route.

But to re-iterate, I find this limitation for timedeltas really no big deal. What exactly is the usecase?

Astronomy, archeology, and geology are three places where the natural limit will be regularly hit.

Was this page helpful?
0 / 5 - 0 ratings