Cudf: [BUG] String binop doesn't update nulls

Created on 27 Feb 2020 · 6Comments · Source: rapidsai/cudf

Describe the bug
When performing a string binop such as == cudf leaves the null rows untouched while pandas here returns True/False based on the condition.

Steps/Code to reproduce bug

import cudf
ser = cudf.Series(['a',None,'a',None])
ser == 'a'

0    True
1    null
2    True
3    null
dtype: bool

Expected behavior

import pandas as pd
ser = pd.Series(['a',None,'a',None])
ser == 'a'

0     True
1    False
2     True
3    False
dtype: bool

Environment overview (please complete the following information)

Environment location: Docker
Method of cuDF install: Dev-nightly
- Cudf @ 4460093650932fbf476dfd16194c51b07755325e

Additional context
This might probably go away once string ops are ported to the new api.

Simple workaround is to fill nulls with True/False based on the condition.

bug cuDF (Python)

Source

ayushdg

All 6 comments

Note that Spark expects the current behavior of returning null for a binary op on a null input. If libcudf will be changed to implement this behavior directly then the null input behavior needs to be controllable by the caller.

jlowe on 27 Feb 2020

👍1

@jlowe this would be handled on the Python side. It was decided on the C++ side nulls will always propagate through binary ops.

kkraus14 on 27 Feb 2020

Was this fixed by #4503 ?

kkraus14 on 18 Mar 2020

Was this fixed by #4503 ?

This hasn't been fixed by #4503. Tested against cudf at 9827c095e0eb2eec5f863ca83a3f84a2dee0b5e6.

ayushdg on 18 Mar 2020

👍1

Thanks @ayushdg!

kkraus14 on 18 Mar 2020

👍1

@ayushdg @kkraus14 We need to handle only EQUAL, NOT EQUAL, LESS THAN, LESS THAN OR EQUAL, GREATER and GREATER THAN OR EQUAL scenarios only if I am not wrong.