Describe the bug
When performing a string binop such as == cudf leaves the null rows untouched while pandas here returns True/False based on the condition.
Steps/Code to reproduce bug
import cudf
ser = cudf.Series(['a',None,'a',None])
ser == 'a'
0 True
1 null
2 True
3 null
dtype: bool
Expected behavior
import pandas as pd
ser = pd.Series(['a',None,'a',None])
ser == 'a'
0 True
1 False
2 True
3 False
dtype: bool
Environment overview (please complete the following information)
Additional context
This might probably go away once string ops are ported to the new api.
Simple workaround is to fill nulls with True/False based on the condition.
Note that Spark expects the current behavior of returning null for a binary op on a null input. If libcudf will be changed to implement this behavior directly then the null input behavior needs to be controllable by the caller.
@jlowe this would be handled on the Python side. It was decided on the C++ side nulls will always propagate through binary ops.
Was this fixed by #4503 ?
Was this fixed by #4503 ?
This hasn't been fixed by #4503. Tested against cudf at 9827c095e0eb2eec5f863ca83a3f84a2dee0b5e6.
Thanks @ayushdg!
@ayushdg @kkraus14 We need to handle only EQUAL, NOT EQUAL, LESS THAN, LESS THAN OR EQUAL, GREATER and GREATER THAN OR EQUAL scenarios only if I am not wrong.