Cudf: [BUG] Inconsistent null handling in cudf binary operations

Created on 14 Oct 2019 · 1Comment · Source: rapidsai/cudf

One of these things is not like the other.

cudf.Series([None]) == cudf.Series([None])
0    False
dtype: bool

>>> cudf.Series([None]).eq(cudf.Series([None]))
0    False
dtype: bool



md5-7c0256c62e15a6d2761cf3da928b7c26



>>> cudf.Series([None]).eq(cudf.Series([None]), fill_value=None)
0    False
dtype: bool



md5-7c0256c62e15a6d2761cf3da928b7c26



>>> cudf.Series([None]).eq(cudf.Series([None]), fill_value=5.0)
0    null
dtype: bool

When using a fill_value, the result is null. Which is the opposite of what you might expect, especially considering that in other cases, null comparisons are just False (which, ironically, feels like it should be None`).

False is consistent with Pandas if we consider np.nan as null, but inconsistent if we consider np.nan as an existent, but "invalid", value.

The question is, which one is the bug?

In my opinion, we should give null comparisons null results.

bug cuDF (Python)

Source

cwharris

>All comments

For

>>> pd.Series([np.nan]).eq(pd.Series([np.nan]), fill_value=5.0)
0    False
dtype: bool

As per the pandas doc it should be returning missing/none but the actual behavior is different. I have created a bug in pandas to check what should be the expected behavior.