Cudf: [BUG] `read_orc_metadata` docstring example is incorrect

Created on 5 Dec 2020  路  6Comments  路  Source: rapidsai/cudf

Describe the bug
The example code in read_orc_metadata docstring is incorrect and silently returns unexpected results.
https://github.com/rapidsai/cudf/blob/598a14d820d47b7c3bfcb2bb3341b97a85317646/python/cudf/cudf/utils/ioutils.py#L259

Steps/Code to reproduce bug

import cudf
cudf.DataFrame({'a':[1,2,3]*10000000}).to_orc('test.orc')

path = 'test.orc'
num_rows, stripes, names = cudf.io.read_orc_metadata(path)
stripes
6

cudf.read_orc(path, stripe=1)
    a
0   1
1   2
2   3
3   1
4   2
... ...
29999995    2
29999996    3
29999997    1
29999998    2
29999999    3
30000000 rows 脳 1 columns

df = [cudf.read_orc(path, stripe=i) for i in range(stripes)]
df = cudf.concat(df)
df
    a
0   1
1   2
2   3
3   1
4   2
... ...
29999995    2
29999996    3
29999997    1
29999998    2
29999999    3
180000000 rows 脳 1 columns

Expected behavior

df = [cudf.read_orc(path, stripe=i) for i in range(stripes)]
df = cudf.concat(df)
df
    a
0   1
1   2
2   3
3   1
4   2
... ...
29999995    2
29999996    3
29999997    1
29999998    2
29999999    3
30000000 rows 脳 1 columns

Environment overview (please complete the following information)

  • Environment location: Docker
  • Method of cuDF install: conda,
    Environment details
    cudf 0.17.0a201202 cuda_10.1_py37_ga2d2726f7f_370 rapidsai-nightly
bug cuDF (Python) cuIO doc

All 6 comments

Hi @MikeChenfu , looks like you misspelled the parameter name - it's actually stripes. Please retry with the correct name.

Hi @vuule , I check the IO document in the Rapids. Here is the original code. Let me know if I miss something. Thanks.

df = [cudf.read_orc(fname, stripe=i) for i in range(stripes)]

@MikeChenfu can you link to the doc you're referring to? The API docs indicate that the kwarg is stripes as well: https://docs.rapids.ai/api/cudf/nightly/api.html#cudf.io.orc.read_orc

@kkraus14 Thanks for the link.
The above code is from the example of cudf.io.orc.read_orc_metadata. https://docs.rapids.ai/api/cudf/nightly/api.html#cudf.io.read_orc_metadata

Thanks! I'm going to update this issue to fix the example in that docstring.

Thanks @kkraus14 @vuule. stripes is working.

Was this page helpful?
0 / 5 - 0 ratings