Cudf: [BUG] Serializing StringColumn's slows when offset encountered

Created on 22 Apr 2020 · 7Comments · Source: rapidsai/cudf

Describe the bug

When serializing a StringColumn with an offsets, am noticing a fair bit of time is spent adjusting for the offset. In particular this line shows up during profiling. Though the lines after are likely also expensive.

https://github.com/rapidsai/cudf/blob/65bbee7920182f709a269b8f0aba4b2434f4beb3/python/cudf/cudf/core/column/string.py#L1906

Digging a bit deeper it appears a fair bit of time is spent here where a device-to-host transfer occurs. Though likely a host-to-device transfer is needed for later operations. So avoid these transfers would like help, but there may be other ways to pursue this.

https://github.com/rapidsai/cudf/blob/65bbee7920182f709a269b8f0aba4b2434f4beb3/python/cudf/cudf/core/column/column.py#L417

Steps/Code to reproduce bug

import cudf

s = cudf.Series(["abc", "def", None, "ghi"])
s = s[1:]

s.serialize()

Expected behavior
A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

Environment location: DGX-1 or DGX-2
Method of cuDF install: Conda

Environment details
Standard cuDF install with Conda.

Additional context
NA

cc @VibhuJawa @rgsl888prabhu @kkraus14

bug cuDF (Python) libcudf

Source

jakirkham

All 7 comments

FYI just to give some context here, this is only expensive for sending sliced columns where we need to grab the scalar value of the offset based on the slice, and run a binaryop against a device scalar constructed from that scalar.

I believe there's ongoing work to allow building a device scalar without having to do a D --> H copy or synchronizing the stream here: #4900

kkraus14 on 23 Apr 2020

🎉1

Awesome! Thanks for sharing Keith 😄

jakirkham on 23 Apr 2020

@kkraus14 but you would need to do D->H at the end while creating the char column , to get size of the char column and to offset the start of char column

rgsl888prabhu on 23 Apr 2020

I believe there's ongoing work to allow building a device scalar without having to do a D --> H copy or synchronizing the stream here: #4900

D --> H copy happens for string_view and dictionary columns in get_element #4900
For fixed_width scalars, there is no D --> H copy.
In all cases, there is a kernel launch (to copy both data and validity).

karthikeyann on 24 Apr 2020

These are always integer columns, so that kernel launch should be a lot cheaper than a D --> H --> D that we currently have 😄.

Getting the size of the char column is still going to require a D --> H though.

kkraus14 on 24 Apr 2020

👍1

Tested locally on an internal benchmark with latest cudf build as of now and along with PR #4900, the time consumption reduces from 22.42 s to 11.03 seconds.

rgsl888prabhu on 1 May 2020

🚀1

Closing this as we have merged #5072

rgsl888prabhu on 8 May 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings