Describe the bug
The cudf::strings::starts_with and cudf::strings::ends_with api's will need to accept empty string("") input in target. It currently gives a RuntimeError, Following is a code sample of exception.
Steps/Code to reproduce bug
>>> import cudf
>>> s = cudf.Series(["", '', "abc", " \t"])
>>> s.str.startswith("")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/column/string.py", line 1505, in startswith
cpp_startswith(self._column, Scalar(pat, "str")), **kwargs
File "cudf/_libxx/strings/find.pyx", line 70, in cudf._libxx.strings.find.startswith
RuntimeError: cuDF failure at: /cudf/cpp/src/strings/find.cu:174: Parameter target must not be empty.
>>> s.to_pandas().str.startswith("")
0 True
1 True
2 True
3 True
dtype: bool
>>> s.str.endswith("")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/column/string.py", line 1477, in endswith
cpp_endswith(self._column, Scalar(pat, "str")), **kwargs
File "cudf/_libxx/strings/find.pyx", line 51, in cudf._libxx.strings.find.endswith
RuntimeError: cuDF failure at: /cudf/cpp/src/strings/find.cu:174: Parameter target must not be empty.
>>> s.to_pandas().str.endswith("")
0 True
1 True
2 True
3 True
dtype: bool
>>> s = cudf.Series(["abc", "1213753765", " ", ""])
>>> s.to_pandas().str.find("")
0 0
1 0
2 0
3 0
dtype: int64
>>> s.str.find("")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/column/string.py", line 1509, in find
cpp_find(self._column, Scalar(sub, "str"), start, end), **kwargs
File "cudf/_libxx/strings/find.pyx", line 94, in cudf._libxx.strings.find.find
RuntimeError: cuDF failure at: /cudf/cpp/src/strings/find.cu:61: Parameter target must not be empty.
Additionally, would like to know if we can support None as input, here is a sample behavior incase of pandas:
>>> s.to_pandas().str.endswith(None)
0 NaN
1 NaN
2 NaN
3 NaN
dtype: float64
>>> s.str.endswith(None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/conda/envs/cudf/lib/python3.7/site-packages/cudf/core/column/string.py", line 1477, in endswith
cpp_endswith(self._column, Scalar(pat, "str")), **kwargs
File "cudf/_libxx/strings/find.pyx", line 51, in cudf._libxx.strings.find.endswith
RuntimeError: cuDF failure at: /cudf/cpp/src/strings/find.cu:174: Parameter target must not be empty.
Expected behavior
We should be aligning with pandas behavior.
Environment overview (please complete the following information)
Environment details
Output of the cudf/print_env.sh script here: env.txt
Additional context
This can be tested with current changes in branch-0.13
@galipremsagar in the short term I believe those should always return True and we can special case on the Python side.
@galipremsagar I believe all of this is fixed now. But none of these libcudf functions will be returning float columns. And only target=None/invalid should return a runtime error now.
@davidwendt Yes, this seems to be fixed from libcudf side. However, I'll now have to remove the python stop-gaps previously added to handle these scenarios. So assigning this to myself.