Describe the bug
Hi Guys, cudf.Series.isna() does not recognize np.nan as null value .
Steps/Code to reproduce bug
In [5]: s = cudf.DataFrame({'a':[12,3.0,4]})
...: s['a'][1] = np.nan
...: s.a
Out[5]:
0 12.0
1 NaN
2 4.0
Name: a, dtype: float64
In [6]: s.a.isna()
Out[6]:
0 False
1 False
2 False
Name: a, dtype: bool
Expected behavior
In [5]: s = cudf.DataFrame({'a':[12,3.0,4]})
...: s['a'][1] = np.nan
...: s.a
Out[5]:
0 12.0
1 NaN
2 4.0
Name: a, dtype: float64
In [6]: s.a.isna()
Out[6]:
0 False
1 True
2 False
Name: a, dtype: bool
Environment overview (please complete the following information)
In [6]: import cudf
In [7]: cudf.__version__
Out[7]: '0.16.0a+1961.g84557eac52'
Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details
Additional context
Add any other context about the problem here.
@MikeChenfu Thanks for reporting. In general we've been using null instead of NaN for indicating NA values and correspondingly handling in operations like fillna, dropna, etc.
As a workaround if you do s['a'][1] = None does that solve your issues?
cc @galipremsagar do we have a function to convert NaN --> null for a Series?
Thanks @kkraus14 for the quick reply. In my case, I use apply_grouped to implement a rolling window. None seems to not be supported in the kernel function.
cc @galipremsagar do we have a function to convert
NaN-->nullfor a Series?
Yes, we have nans_to_nulls:
>>> import cudf
>>> import numpy as np
>>> s = cudf.DataFrame({'a':[12,3.0,4]})
>>> s['a'][1] = np.nan
>>> s.a.isna()
0 False
1 False
2 False
Name: a, dtype: bool
>>> s.to_pandas().a.isna()
0 False
1 True
2 False
Name: a, dtype: bool
>>> s.a.nans_to_nulls()
0 12.0
1 <NA>
2 4.0
Name: a, dtype: float64
>>> s.a.nans_to_nulls().isna()
0 False
1 True
2 False
Name: a, dtype: bool
@MikeChenfu Can you try this as a work-around?
Thanks @galipremsagar @kkraus14 . It works. Appreciate it !
Going to leave this open as something we should probably just transparently handle underneath isna().
Most helpful comment
Thanks @galipremsagar @kkraus14 . It works. Appreciate it !