Cudf: [BUG] isna() doesn't recognize np.nan

Created on 14 Oct 2020  路  5Comments  路  Source: rapidsai/cudf

Describe the bug
Hi Guys, cudf.Series.isna() does not recognize np.nan as null value .

Steps/Code to reproduce bug

In [5]: s = cudf.DataFrame({'a':[12,3.0,4]}) 
   ...: s['a'][1] = np.nan 
   ...: s.a                                                                                  
Out[5]: 
0    12.0
1     NaN
2     4.0
Name: a, dtype: float64

In [6]: s.a.isna()                                                                           
Out[6]: 
0    False
1    False
2    False
Name: a, dtype: bool

Expected behavior

In [5]: s = cudf.DataFrame({'a':[12,3.0,4]}) 
   ...: s['a'][1] = np.nan 
   ...: s.a                                                                                  
Out[5]: 
0    12.0
1     NaN
2     4.0
Name: a, dtype: float64

In [6]: s.a.isna()                                                                           
Out[6]: 
0    False
1    True
2    False
Name: a, dtype: bool

Environment overview (please complete the following information)

  • Environment location: Docker
  • Method of cuDF install: conda
In [6]: import cudf

In [7]: cudf.__version__
Out[7]: '0.16.0a+1961.g84557eac52'

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Additional context
Add any other context about the problem here.

bug cuDF (Python)

Most helpful comment

Thanks @galipremsagar @kkraus14 . It works. Appreciate it !

All 5 comments

@MikeChenfu Thanks for reporting. In general we've been using null instead of NaN for indicating NA values and correspondingly handling in operations like fillna, dropna, etc.

As a workaround if you do s['a'][1] = None does that solve your issues?

cc @galipremsagar do we have a function to convert NaN --> null for a Series?

Thanks @kkraus14 for the quick reply. In my case, I use apply_grouped to implement a rolling window. None seems to not be supported in the kernel function.

cc @galipremsagar do we have a function to convert NaN --> null for a Series?

Yes, we have nans_to_nulls:

>>> import cudf
>>> import numpy as np
>>> s = cudf.DataFrame({'a':[12,3.0,4]}) 
>>> s['a'][1] = np.nan 
>>> s.a.isna()  
0    False
1    False
2    False
Name: a, dtype: bool
>>> s.to_pandas().a.isna()  
0    False
1     True
2    False
Name: a, dtype: bool
>>> s.a.nans_to_nulls()
0    12.0
1    <NA>
2     4.0
Name: a, dtype: float64
>>> s.a.nans_to_nulls().isna()
0    False
1     True
2    False
Name: a, dtype: bool

@MikeChenfu Can you try this as a work-around?

Thanks @galipremsagar @kkraus14 . It works. Appreciate it !

Going to leave this open as something we should probably just transparently handle underneath isna().

Was this page helpful?
0 / 5 - 0 ratings

Related issues

randerzander picture randerzander  路  3Comments

galipremsagar picture galipremsagar  路  3Comments

razajafri picture razajafri  路  3Comments

Polarbeargo picture Polarbeargo  路  3Comments

saifrahmed picture saifrahmed  路  3Comments