[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
I am trying to convert a time to timedelta. The input data is in string format. When number of decimals is too big, it resets all decimals to zero.
import pandas as pd
d = {'Time':['8:53:08.26','8:53:08.71800000001', '8:53:09.729']}
df = pd.DataFrame(data=d)
pd.to_timedelta(df["Time"])
The result of the above pice of code gives:
0 0 days 08:53:08.260000
1 0 days 08:53:08
2 0 days 08:53:09.729000
Name: Time, dtype: timedelta64[ns]
As it can be seen, all the decimals from second data is lost.
With the present behavior, a sorted array of data returns a wrong and unsorted array.
0 0 days 08:53:08.260000
1 0 days 08:53:08.718000
2 0 days 08:53:09.729000
Name: Time, dtype: timedelta64[ns]
pd.show_versions()commit : f2ca0a2665b2d169c97de87b8e778dbed86aea07
python : 3.8.5.final.0
python-bits : 64
OS : Windows
OS-release : 7
Version : 6.1.7601
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : es_ES.cp1252
pandas : 1.1.1
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.6.0.post20200925
I can't replicate the problem (see Githug Gist https://gist.github.com/Tebinski/4900d3593bb0b9e9fd82ca983e68b200)
I have installed pandas version 1.1.1
Input
d ={'Time':['8:53:08.26','8:53:08.71800000001', '8:53:09.729']}
pd.DataFrame(data=d)
Output
Time
0 8:53:08.26
1 8:53:08.71800000001
2 8:53:09.729
Your example and your output don't match. Your output seems to be a Series while your code would produce a DataFrame. Furthermore, the dtype when running this code is object, not timedelta. Could you please look over your example?
I think the issue is the following:
In [1]: import pandas as pd
In [2]: d ={'Time':['8:53:08.26','8:53:08.71800000001', '8:53:09.729']}
In [3]: df = pd.DataFrame(data=d)
In [4]: pd.to_timedelta(df["Time"])
Out[4]:
0 08:53:08.260000
1 08:53:08
2 08:53:09.729000
Name: Time, dtype: timedelta64[ns]
Thanks @AlexS12 that is the complete example I was trying! I named the issue after the function I was using... and I left it out of the example.
Thx, could you please edit your initial post for clarity?
Looking into https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html#timedelta-limitations this is not a bug. It is a limitation of the TimeDelta implementation
Thanks @phofl. I see that the documentation specifies the resolution and gives an example of the range. However, what's the reason for returning 8:53:08 instead of 8:53:08.718 for index one? I think that's the closest result inside the Timedelta resolution. Otherwise, this function becomes not useful (more difficult to use) when dealing with decimal seconds, which will normally suffer from floating point rounding error.
Moreover, to_datetime produces our expected result even being constrained to the same limitation: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
Thanks @AlexS12 , to_datetime result is what I should expect from to_timedelta.
import pandas as pd
d = {'Time':['8:53:08.26','8:53:08.71800000001', '8:53:09.729']}
df = pd.DataFrame(data=d)
pd.to_datetime(df["Time"])
Output
0 2020-10-01 08:53:08.260
1 2020-10-01 08:53:08.718
2 2020-10-01 08:53:09.729
Name: Time, dtype: datetime64[ns]
Sorry, have not looked closely enough. Linked a PR to fix this.
Most helpful comment
I think the issue is the following: