Cudf: [BUG] RunTimeError in `cudf::strings::starts_with`, `cudf::strings::ends_with` and `cudf::strings::find` when `target=''`

Created on 13 Mar 2020  路  3Comments  路  Source: rapidsai/cudf

Describe the bug
The cudf::strings::starts_with and cudf::strings::ends_with api's will need to accept empty string("") input in target. It currently gives a RuntimeError, Following is a code sample of exception.

Steps/Code to reproduce bug

>>> import cudf
>>> s = cudf.Series(["", '', "abc", " \t"])
>>> s.str.startswith("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/column/string.py", line 1505, in startswith
    cpp_startswith(self._column, Scalar(pat, "str")), **kwargs
  File "cudf/_libxx/strings/find.pyx", line 70, in cudf._libxx.strings.find.startswith
RuntimeError: cuDF failure at: /cudf/cpp/src/strings/find.cu:174: Parameter target must not be empty.
>>> s.to_pandas().str.startswith("")
0    True
1    True
2    True
3    True
dtype: bool



>>> s.str.endswith("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/column/string.py", line 1477, in endswith
    cpp_endswith(self._column, Scalar(pat, "str")), **kwargs
  File "cudf/_libxx/strings/find.pyx", line 51, in cudf._libxx.strings.find.endswith
RuntimeError: cuDF failure at: /cudf/cpp/src/strings/find.cu:174: Parameter target must not be empty.
>>> s.to_pandas().str.endswith("")
0    True
1    True
2    True
3    True
dtype: bool




>>> s = cudf.Series(["abc", "1213753765", "       ", ""])
>>> s.to_pandas().str.find("")
0    0
1    0
2    0
3    0
dtype: int64
>>> s.str.find("")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/column/string.py", line 1509, in find
    cpp_find(self._column, Scalar(sub, "str"), start, end), **kwargs
  File "cudf/_libxx/strings/find.pyx", line 94, in cudf._libxx.strings.find.find
RuntimeError: cuDF failure at: /cudf/cpp/src/strings/find.cu:61: Parameter target must not be empty.

Additionally, would like to know if we can support None as input, here is a sample behavior incase of pandas:

>>> s.to_pandas().str.endswith(None)
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64
>>> s.str.endswith(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/column/string.py", line 1477, in endswith
    cpp_endswith(self._column, Scalar(pat, "str")), **kwargs
  File "cudf/_libxx/strings/find.pyx", line 51, in cudf._libxx.strings.find.endswith
RuntimeError: cuDF failure at: /cudf/cpp/src/strings/find.cu:174: Parameter target must not be empty.

Expected behavior
We should be aligning with pandas behavior.

Environment overview (please complete the following information)

  • Environment location: Docker
  • Method of cuDF install: from source

Environment details
Output of the cudf/print_env.sh script here: env.txt

Additional context
This can be tested with current changes in branch-0.13

bug cuDF (Python) libcudf strings

All 3 comments

@galipremsagar in the short term I believe those should always return True and we can special case on the Python side.

@galipremsagar I believe all of this is fixed now. But none of these libcudf functions will be returning float columns. And only target=None/invalid should return a runtime error now.

@davidwendt Yes, this seems to be fixed from libcudf side. However, I'll now have to remove the python stop-gaps previously added to handle these scenarios. So assigning this to myself.

Was this page helpful?
0 / 5 - 0 ratings