>>> df = cudf.read_orc('data/000000_0')
**Traceback (most recent call last):**
File "<stdin>", line 1, in <module>
File "/conda/lib/python3.7/site-packages/cudf-0.9.0a0+1073.g73e0849-py3.7-li
nux-x86_64.egg/cudf/io/orc.py", line 45, in read_orc
filepath_or_buffer, columns, stripe, skip_rows, num_rows, use_index
File "cudf/bindings/orc.pyx", line 25, in cudf.bindings.orc.cpp_read_orc
File "cudf/bindings/orc.pyx", line 86, in cudf.bindings.orc.cpp_read_orc
File "/conda/lib/python3.7/site-packages/cudf-0.9.0a0+1073.g73e0849-py3.7-li
nux-x86_64.egg/cudf/dataframe/column.py", line 130, in from_mem_views
return columnops.build_column(data_buf, data_mem.dtype, mask=mask)
File "/conda/lib/python3.7/site-packages/cudf-0.9.0a0+1073.g73e0849-py3.7-li
nux-x86_64.egg/cudf/dataframe/columnops.py", line 244, in build_column
data=buffer, dtype=dtype, mask=mask, name=name
File "/conda/lib/python3.7/site-packages/cudf-0.9.0a0+1073.g73e0849-py3.7-li
nux-x86_64.egg/cudf/dataframe/numerical.py", line 42, in __init__
super(NumericalColumn, self).__init__(**kwargs)
File "/conda/lib/python3.7/site-packages/cudf-0.9.0a0+1073.g73e0849-py3.7-li
nux-x86_64.egg/cudf/dataframe/columnops.py", line 36, in __init__
super(TypedColumnBase, self).__init__(**kwargs)
File "/conda/lib/python3.7/site-packages/cudf-0.9.0a0+1073.g73e0849-py3.7-li
nux-x86_64.egg/cudf/dataframe/column.py", line 158, in __init__
self._update_null_count(null_count)
File "/conda/lib/python3.7/site-packages/cudf-0.9.0a0+1073.g73e0849-py3.7-li
nux-x86_64.egg/cudf/dataframe/column.py", line 177, in _update_null_count
nnz = count_nonzero_mask(self._mask.mem, size=len(self))
File "cudf/bindings/cudf_cpp.pyx", line 489, in cudf.bindings.cudf_cpp.count
_nonzero_mask
File "cudf/bindings/cudf_cpp.pyx", line 499, in cudf.bindings.cudf_cpp.count
_nonzero_mask
RuntimeError: CUDA error encountered at: /cudf/cpp/src/bitmask/legacy/bitmask_
ops.cu:147: 9 cudaErrorInvalidConfiguration invalid configuration argument
>>> import pyarrow.orc as orc
>>> with open('data/000000_0', 'rb') as file:
... data = orc.ORCFile(file)
... df = data.read().to_pandas()
...
>>> df
_col0 _col1
0 0 1
1 65975644 12
2 1251275 13
3 69856449 11
@mlahir1 : can we use this file as part of automated regression testing ? It looks like it may be due to the file being written as snappy compression, but not containing any compressed data blocks (too small to gain from compression, so data blocks are actually uncompressed).
This is a bit odd: it looks like the decoding went fine, but when we get to the point of initializing the column and counting the nonzero value count (which btw is entirely redundant since parquet/orc already initialize the null_count field), cudaGetLastError() returns 9. A dummy call to cudaGetLastError() clears the error state and the output is correct. Will dig a bit deeper, could be related to the __launch_bounds__ use in ORC.
Ah, yup, that was indeed due to the number of compressed blocks being zero: we launch a kernel with a grid size of 0x0, which is harmless, but does result in cudaGetLastError() returning 9.
@OlivierNV yes, you can use the file for your testing.