Cudf: [BUG] Can't pass StringMethods object to string methods

Created on 14 Jun 2019  Â·  4Comments  Â·  Source: rapidsai/cudf

Not too pressing, but cuDF should either allow users to use .str methods on the .str StringMethods object, or raise an informative error. This is possible in pandas. We fail silently.

import cudf
import pandas as pd
​
df = cudf.DataFrame({'a':['s','t','r'], 'b':['a','b','c']})
pdf = df.to_pandas()
​
print(df.a.str.cat(df.b))
print(df.a.str.cat(df.b.str))
​
print(pdf.a.str.cat(pdf.b))
print(pdf.a.str.cat(pdf.b.str))
0    sa
1    tb
2    rc
dtype: object
<empty Series of dtype=float64>
0    sa
1    tb
2    rc
Name: a, dtype: object
0    sa
1    tb
2    rc
Name: a, dtype: object
bug cuDF (Python)

All 4 comments

I was looking into this issue and I observed that pandas concatenates only the largest string in series while after making required changes cudf concatenates all the strings in a series

Which is most appropriate behaviour?

import pandas as pd
import cudf

arr = ["AbC", "de", "FGHI", "j", "kLLLm"]

ps = pd.Series(arr)
expect = ps.str.cat(others=ps.str)
print(expect)

gs = cudf.Series(arr)
got=gs.str.cat(others=gs.str)
print(got)


Out[16]:
0           NaN
1           NaN
2           NaN
3           NaN
4    kLLLmkLLLm
dtype: object

Out[18]:
0        AbCAbC
1          dede
2      FGHIFGHI
3            jj
4    kLLLmkLLLm
dtype: object

Hmm, that behavior discrepancy feels like a bug in pandas to me. If we decide to support passing stringmethods objects, we should probably not silently only concatenate the largest string

I've submitted a issue in pandas-dev github repo:
https://github.com/pandas-dev/pandas/issues/28277

Thanks @AK-ayush . Given @TomAugspurger 's comment in the pandas thread linked above, I'm inclined toward changing this issue later today to be about raising an exception here if there are no additional opinions. @kkraus14 , let me know if you think we should still support this usage pattern.

Was this page helpful?
0 / 5 - 0 ratings