Cudf: [BUG] Comparison based binaryops return nulls instead of True/False in comparing with nulls

Created on 9 Jul 2019 · 6Comments · Source: rapidsai/cudf

In [4]: print(cudf.Series([1,2,3,4,5]) == cudf.Series([None, None, None, None, None]))                                                                        
0     
1     
2     
3     
4     
dtype: bool

In [5]: pd.Series([1,2,3,4,5]) == pd.Series([None, None, None, None, None])                                                                                   
Out[5]: 
0    False
1    False
2    False
3    False
4    False
dtype: bool

bug cuDF (Python)

Source

kkraus14

All 6 comments

@harrism We can easily handle this at the Python side if you think it's appropriate to do so.

kkraus14 on 9 Jul 2019

👍1

One more example.

>>> pd.Series([None, None, None, None, None]) == pd.Series([None, None, None, None, None])
0    False
1    False
2    False
3    False
4    False
>>> pd.Series([None, None, None, None, None]) != pd.Series([None, None, None, None, None])
0    True
1    True
2    True
3    True
4    True

Basically it should be always True for __ne__ only and always False for __eq__ and every other operator, irrespective of what None compares against.

daxiongshu on 9 Jul 2019

@kkraus14 is this urgent for 0.9? Assigning @devavret but waiting to hear about urgency before putting it on the 0.9 board.

harrism on 22 Jul 2019

This is one of the places where SQL and pandas differ.

SELECT foo = NULL from bar

will return one NULL for each row in bar. I personally would prefer to see this be fixed on the python side. We could also have separate operations or separate options for NULL handling in bin-ops.