Pandas: BUG: to_timedelta drops decimals from input if precision is greater than nanoseconds

Created on 30 Sep 2020  路  9Comments  路  Source: pandas-dev/pandas

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


I am trying to convert a time to timedelta. The input data is in string format. When number of decimals is too big, it resets all decimals to zero.

import pandas as pd
d = {'Time':['8:53:08.26','8:53:08.71800000001', '8:53:09.729']}
df = pd.DataFrame(data=d)
pd.to_timedelta(df["Time"])

The result of the above pice of code gives:

0   0 days 08:53:08.260000
1          0 days 08:53:08
2   0 days 08:53:09.729000
Name: Time, dtype: timedelta64[ns]

As it can be seen, all the decimals from second data is lost.

Problem description

With the present behavior, a sorted array of data returns a wrong and unsorted array.

Expected Output

0   0 days 08:53:08.260000
1   0 days 08:53:08.718000
2   0 days 08:53:09.729000
Name: Time, dtype: timedelta64[ns]

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f2ca0a2665b2d169c97de87b8e778dbed86aea07
python : 3.8.5.final.0
python-bits : 64
OS : Windows
OS-release : 7
Version : 6.1.7601
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : es_ES.cp1252

pandas : 1.1.1
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.6.0.post20200925

Bug Timedelta

Most helpful comment

I think the issue is the following:

In [1]: import pandas as pd

In [2]: d ={'Time':['8:53:08.26','8:53:08.71800000001', '8:53:09.729']}

In [3]: df = pd.DataFrame(data=d)

In [4]: pd.to_timedelta(df["Time"])
Out[4]:
0   08:53:08.260000
1          08:53:08
2   08:53:09.729000
Name: Time, dtype: timedelta64[ns]

All 9 comments

I can't replicate the problem (see Githug Gist https://gist.github.com/Tebinski/4900d3593bb0b9e9fd82ca983e68b200)

I have installed pandas version 1.1.1

Input

d ={'Time':['8:53:08.26','8:53:08.71800000001', '8:53:09.729']}
pd.DataFrame(data=d)

Output

                  Time
0           8:53:08.26
1  8:53:08.71800000001
2          8:53:09.729

Your example and your output don't match. Your output seems to be a Series while your code would produce a DataFrame. Furthermore, the dtype when running this code is object, not timedelta. Could you please look over your example?

I think the issue is the following:

In [1]: import pandas as pd

In [2]: d ={'Time':['8:53:08.26','8:53:08.71800000001', '8:53:09.729']}

In [3]: df = pd.DataFrame(data=d)

In [4]: pd.to_timedelta(df["Time"])
Out[4]:
0   08:53:08.260000
1          08:53:08
2   08:53:09.729000
Name: Time, dtype: timedelta64[ns]

Thanks @AlexS12 that is the complete example I was trying! I named the issue after the function I was using... and I left it out of the example.

Thx, could you please edit your initial post for clarity?

Looking into https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html#timedelta-limitations this is not a bug. It is a limitation of the TimeDelta implementation

Thanks @phofl. I see that the documentation specifies the resolution and gives an example of the range. However, what's the reason for returning 8:53:08 instead of 8:53:08.718 for index one? I think that's the closest result inside the Timedelta resolution. Otherwise, this function becomes not useful (more difficult to use) when dealing with decimal seconds, which will normally suffer from floating point rounding error.

Moreover, to_datetime produces our expected result even being constrained to the same limitation: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations

Thanks @AlexS12 , to_datetime result is what I should expect from to_timedelta.

import pandas as pd
d = {'Time':['8:53:08.26','8:53:08.71800000001', '8:53:09.729']}
df = pd.DataFrame(data=d)
pd.to_datetime(df["Time"])

Output

0   2020-10-01 08:53:08.260
1   2020-10-01 08:53:08.718
2   2020-10-01 08:53:09.729
Name: Time, dtype: datetime64[ns]

Sorry, have not looked closely enough. Linked a PR to fix this.

Was this page helpful?
0 / 5 - 0 ratings