Cudf: [BUG] slice libcudf incorrect results for some scenarios

Created on 4 Mar 2020  路  5Comments  路  Source: rapidsai/cudf

Describe the bug
The new cudf::strings::slice_strings api seems to returning incorrect results. I have listed out the combinations in which I see issues.

Steps/Code to reproduce bug

>>> import cudf
>>> s = cudf.Series(['abc', 'xyz', 'a', 'ab', '123', '097'])
>>> s.str.slice(start=1)
0    bc
1    yz
2     a
3     b
4    23
5    97
dtype: object
>>> s.to_pandas().str.slice(start=1)
0    bc
1    yz
2      
3     b
4    23
5    97
dtype: object

>>> s.str.slice(start=3)
0    c
1    z
2    a
3    b
4    3
5    7
dtype: object
>>> s.to_pandas().str.slice(start=3)
0    
1    
2    
3    
4    
5    
dtype: object
>>> import cudf
>>> s = cudf.Series(['koala', 'fox', 'chameleon'])
>>> s.str.slice(start=10)
0    a
1    x
2    n
dtype: object
>>> s.to_pandas().str.slice(start=10)
0    
1    
2    
dtype: object
>>> import cudf
>>> s = cudf.Series(['abc', 'xyz', 'a', 'ab', '123', '097'])
>>> s.str.slice(start=1, stop=10)
0    bc
1    yz
2     a
3     b
4    23
5    97
dtype: object
>>> s.to_pandas().str.slice(start=1, stop=10)
0    bc
1    yz
2      
3     b
4    23
5    97
dtype: object



>>> s.str.slice(start=10, stop=19)
0    c
1    z
2    a
3    b
4    3
5    7
dtype: object
>>> s.to_pandas().str.slice(start=10, stop=19)
0    
1    
2    
3    
4    
5    
dtype: object
>>> import cudf
>>> s = cudf.Series(['abcdefghij', '0123456789', '9876543210', None, 'acc茅nted', ''])
>>> s.str.slice(start=10, stop=19, step=9)
0       j
1       9
2       0
3    None
4       d
5        
dtype: object
>>> s.to_pandas().str.slice(start=10, stop=19, step=9)
0        
1        
2        
3    None
4        
5        
dtype: object
>>> 

Expected behavior
Behavior has to be in-line with pandas outputs.

Environment overview (please complete the following information)

  • Environment location: Docker
  • Method of cuDF install: from source

Environment details
Output of the cudf/print_env.sh script here:
env.txt

Additional context

  • [x] Needs fix from strings libcudf side.
  • [x] Once the libcudf side api is fixed please update me so that I can also enable the disabled python plumbings and tests.
    Note: Since .str.get internally uses cudf::strings::slice_strings, it's python side also needs to be enabled.
bug cuDF (Python) libcudf strings

All 5 comments

Fortunately these appear to be all the same problem.

You can retry this now that #4324 is merged.

You can retry this now that #4324 is merged.

cc @galipremsagar 馃槃

You can retry this now that #4324 is merged.

This works now, enabled the code in #4407

Resolved by #4324

Was this page helpful?
0 / 5 - 0 ratings