Describe the bug
The new cudf::strings::slice_strings api seems to returning incorrect results. I have listed out the combinations in which I see issues.
Steps/Code to reproduce bug
>>> import cudf
>>> s = cudf.Series(['abc', 'xyz', 'a', 'ab', '123', '097'])
>>> s.str.slice(start=1)
0 bc
1 yz
2 a
3 b
4 23
5 97
dtype: object
>>> s.to_pandas().str.slice(start=1)
0 bc
1 yz
2
3 b
4 23
5 97
dtype: object
>>> s.str.slice(start=3)
0 c
1 z
2 a
3 b
4 3
5 7
dtype: object
>>> s.to_pandas().str.slice(start=3)
0
1
2
3
4
5
dtype: object
>>> import cudf
>>> s = cudf.Series(['koala', 'fox', 'chameleon'])
>>> s.str.slice(start=10)
0 a
1 x
2 n
dtype: object
>>> s.to_pandas().str.slice(start=10)
0
1
2
dtype: object
>>> import cudf
>>> s = cudf.Series(['abc', 'xyz', 'a', 'ab', '123', '097'])
>>> s.str.slice(start=1, stop=10)
0 bc
1 yz
2 a
3 b
4 23
5 97
dtype: object
>>> s.to_pandas().str.slice(start=1, stop=10)
0 bc
1 yz
2
3 b
4 23
5 97
dtype: object
>>> s.str.slice(start=10, stop=19)
0 c
1 z
2 a
3 b
4 3
5 7
dtype: object
>>> s.to_pandas().str.slice(start=10, stop=19)
0
1
2
3
4
5
dtype: object
>>> import cudf
>>> s = cudf.Series(['abcdefghij', '0123456789', '9876543210', None, 'acc茅nted', ''])
>>> s.str.slice(start=10, stop=19, step=9)
0 j
1 9
2 0
3 None
4 d
5
dtype: object
>>> s.to_pandas().str.slice(start=10, stop=19, step=9)
0
1
2
3 None
4
5
dtype: object
>>>
Expected behavior
Behavior has to be in-line with pandas outputs.
Environment overview (please complete the following information)
Environment details
Output of the cudf/print_env.sh script here:
env.txt
Additional context
.str.get internally uses cudf::strings::slice_strings, it's python side also needs to be enabled.Fortunately these appear to be all the same problem.
You can retry this now that #4324 is merged.
You can retry this now that #4324 is merged.
cc @galipremsagar 馃槃
You can retry this now that #4324 is merged.
This works now, enabled the code in #4407
Resolved by #4324