Hello,
interpolate doesn't work with NaT
see http://stackoverflow.com/questions/33921795/fill-timestamp-nat-with-a-linear-interpolation/33922824#33922824
Here is a trivial example to show the situation:
s = pd.Series(pd.date_range('2015-01-01' , '2015-01-30'), name='t')
s[3], s[4], s[5] = pd.NaT, pd.NaT, pd.NaT
s[13], s[14], s[15] = pd.NaT, pd.NaT, pd.NaT
print(s)
0 2015-01-01
1 2015-01-02
2 2015-01-03
3 NaT
4 NaT
5 NaT
6 2015-01-07
7 2015-01-08
8 2015-01-09
9 2015-01-10
10 2015-01-11
11 2015-01-12
12 2015-01-13
13 NaT
14 NaT
15 NaT
16 2015-01-17
17 2015-01-18
18 2015-01-19
19 2015-01-20
20 2015-01-21
21 2015-01-22
22 2015-01-23
23 2015-01-24
24 2015-01-25
25 2015-01-26
26 2015-01-27
27 2015-01-28
28 2015-01-29
29 2015-01-30
Name: t, dtype: datetime64[ns]
print(s.interpolate())
0 2015-01-01
1 2015-01-02
2 2015-01-03
3 NaT
4 NaT
5 NaT
6 2015-01-07
7 2015-01-08
8 2015-01-09
9 2015-01-10
10 2015-01-11
11 2015-01-12
12 2015-01-13
13 NaT
14 NaT
15 NaT
16 2015-01-17
17 2015-01-18
18 2015-01-19
19 2015-01-20
20 2015-01-21
21 2015-01-22
22 2015-01-23
23 2015-01-24
24 2015-01-25
25 2015-01-26
26 2015-01-27
27 2015-01-28
28 2015-01-29
29 2015-01-30
Name: t, dtype: datetime64[ns]
assert s.interpolate().isnull().sum() == 0
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-150-8a59e397a174> in <module>()
----> 1 assert s.interpolate().isnull().sum() == 0
AssertionError:
Kind regards
this is not implemented ATM on datetimes. pull-requests are welcome.
What is your opinion about the last solution proposed by CT Zhu on StackOverflow ?
df.ix[df.t.isnull(), 't'] = pd.to_datetime(pd.to_numeric(df.t).interpolate())[df.t.isnull()]
isn't there a method to support NaN with integers without converting to float (which lead to precision issue) ?
Shouldn't we look for example to
np.int64(pd.NaT)
which is -9223372036854775808
I have the impression that interpolating NaT is still not possible in v20.3.
Any updates on this issue?
This is not very hard to actually do directly (and what .interpolate() should basically do, PRs welcome)
In [12]: s2 = s.astype('i8').astype('f8')
In [13]: s2[s.isnull()] = np.nan
In [14]: pd.to_datetime(s2.interpolate())
Out[14]:
0 2015-01-01
1 2015-01-02
2 2015-01-03
3 2015-01-04
4 2015-01-05
5 2015-01-06
6 2015-01-07
7 2015-01-08
8 2015-01-09
9 2015-01-10
10 2015-01-11
11 2015-01-12
12 2015-01-13
13 2015-01-14
14 2015-01-15
15 2015-01-16
16 2015-01-17
17 2015-01-18
18 2015-01-19
19 2015-01-20
20 2015-01-21
21 2015-01-22
22 2015-01-23
23 2015-01-24
24 2015-01-25
25 2015-01-26
26 2015-01-27
27 2015-01-28
28 2015-01-29
29 2015-01-30
Name: t, dtype: datetime64[ns]
@rinoc did some work on this issue in https://github.com/rinoc/pandas/commit/e77e4c8566db68c0ec144f9aeb01dc5225c971d6
But no PR have been send.
Any news?
Looks to be a duplicate of https://github.com/pandas-dev/pandas/issues/11312
Most helpful comment
This is not very hard to actually do directly (and what
.interpolate()should basically do, PRs welcome)