Describe the bug
This bug is filed in line with original lib's contributor. Original issue is NVIDIA/NVTabular#102.
While using cudf._lib.parquet.ParquetWriter.write_table() to write several large amount of data into one .parquet file, I've encountered the following error.
RuntimeError: CUDA error encountered at: /cudf/cpp/src/io/parquet/writer_impl.cu:341: 700 cudaErrorIllegalAddress an illegal memory access was encountered
This issue happens from optimize_criteo.ipynb. Each DataFrame was partially loaded from large csv file by cudf.read_csv() with byte_range option (https://github.com/NVIDIA/NVTabular/blob/master/nvtabular/io.py#L220-L226). I put stacktrace below and cuda-memcheck result is in original issue (https://github.com/NVIDIA/NVTabular/issues/102#issuecomment-648154896). In addition, the program sometimes hangs with 100% GPU usage...
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-2af48634df31> in <module>
47 del gdf
48 path_out = '/data/criteo/parquet/'
---> 49 file_to_pq(train_set, 'csv', output_folder=path_out, cols=cols, dtypes=dtypes)
<ipython-input-1-2af48634df31> in file_to_pq(target_files, file_type, output_folder, cols, dtypes)
43 if file_path != old_file_path:
44 writer = ParquetWriter(path)
---> 45 writer.write_table(gdf)
46 old_file_path = file_path
47 del gdf
cudf/_lib/parquet.pyx in cudf._lib.parquet.ParquetWriter.write_table()
RuntimeError: CUDA error encountered at: /cudf/cpp/src/io/parquet/writer_impl.cu:341: 700 cudaErrorIllegalAddress an illegal memory access was encountered
Steps/Code to reproduce bug
optimize_criteo.ipynbNote that according to bug analysis in original repo, this issue happens on only Pascal GPU. On Volta GPU, it doesn't happen.
Expected behavior
No error and hanging up.
Environment overview (please complete the following information)
sudo docker run --gpus=all --rm -it -v $(pwd):/ws -v /path/to/data/:/data -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE nvcr.io/nvidia/nvtabular:0.1 /bin/bashEnvironment details
Click here to see environment details
**git***
Not inside a git repository
***OS Information***
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Linux f15fcfd05d6b 4.15.0-99-generic #100-Ubuntu SMP Wed Apr 22 20:32:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
***GPU Information***
Sat Jun 27 02:03:54 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) On | 00000000:06:00.0 Off | N/A |
| 23% 32C P8 16W / 250W | 122MiB / 12192MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN X (Pascal) On | 00000000:0A:00.0 Off | N/A |
| 23% 25C P8 8W / 250W | 12MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
***CPU***
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz
Stepping: 1
CPU MHz: 1200.191
CPU max MHz: 4000.0000
CPU min MHz: 1200.0000
BogoMIPS: 7195.06
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0-11
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
***CMake***
/conda/envs/rapids/bin/cmake
cmake version 3.17.0
CMake suite maintained and supported by Kitware (kitware.com/cmake).
***g++***
/usr/bin/g++
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
***nvcc***
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
***Python***
/conda/envs/rapids/bin/python
Python 3.7.6
***Environment Variables***
PATH : /conda/envs/rapids/bin:/conda/condabin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/conda/bin
LD_LIBRARY_PATH : /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib64:/usr/local/lib
NUMBAPRO_NVVM : /usr/local/cuda/nvvm/lib64/libnvvm.so
NUMBAPRO_LIBDEVICE : /usr/local/cuda/nvvm/libdevice/
CONDA_PREFIX : /conda/envs/rapids
PYTHON_PATH :
***conda packages***
/conda/condabin/conda
# packages in environment at /conda/envs/rapids:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_llvm conda-forge
absl-py 0.9.0 pypi_0 pypi
aiohttp 3.6.2 pypi_0 pypi
alabaster 0.7.12 py_0 conda-forge
appdirs 1.4.3 py_1 conda-forge
arrow-cpp 0.15.0 py37h090bef1_2 conda-forge
astor 0.8.1 pypi_0 pypi
async-timeout 3.0.1 pypi_0 pypi
attrs 19.3.0 py_0 conda-forge
babel 2.8.0 py_0 conda-forge
backcall 0.1.0 py_0 conda-forge
beautifulsoup4 4.9.0 pypi_0 pypi
black 19.10b0 py37_0 conda-forge
blas 1.0 mkl
bleach 3.1.5 pyh9f0ad1d_0 conda-forge
blis 0.4.1 pypi_0 pypi
bokeh 1.4.0 pypi_0 pypi
boost-cpp 1.70.0 h8e57a91_2 conda-forge
bottleneck 1.3.2 pypi_0 pypi
brotli 1.0.7 he1b5a44_1001 conda-forge
brotlipy 0.7.0 py37h8f50634_1000 conda-forge
bzip2 1.0.8 h516909a_2 conda-forge
c-ares 1.15.0 h516909a_1001 conda-forge
ca-certificates 2020.1.1 0
cachetools 4.1.0 pypi_0 pypi
catalogue 1.0.0 pypi_0 pypi
certifi 2020.4.5.1 py37_0
cffi 1.14.0 py37hd463f26_0 conda-forge
cfgv 3.1.0 py_0 conda-forge
chardet 3.0.4 py37hc8dfbb8_1006 conda-forge
clang 8.0.1 hc9558a2_2 conda-forge
clang-tools 8.0.1 hc9558a2_2 conda-forge
clangxx 8.0.1 2 conda-forge
click 7.1.2 pyh9f0ad1d_0 conda-forge
cloudpickle 1.4.1 py_0 conda-forge
cmake 3.17.0 h28c56e5_0 conda-forge
cmake_setuptools 0.1.3 py_0 rapidsai
commonmark 0.9.1 py_0 conda-forge
coverage 5.1 pypi_0 pypi
cryptography 2.9.2 py37hb09aad4_0 conda-forge
cudatoolkit 10.2.89 hfd86e86_1
cudf 0.14.0a0+4291.g6bae03d40 pypi_0 pypi
cudnn 7.6.5 cuda10.2_0
cupy 7.4.0 py37h940342b_2 conda-forge
cycler 0.10.0 pypi_0 pypi
cymem 2.0.3 pypi_0 pypi
cython 0.29.17 py37h3340039_0 conda-forge
cytoolz 0.10.1 py37h516909a_0 conda-forge
dask 2.16.0+8.g3573b2dd pypi_0 pypi
dask-cudf 0.14.0a0+4291.g6bae03d40 pypi_0 pypi
decorator 4.4.2 py_0 conda-forge
defusedxml 0.6.0 py_0 conda-forge
distributed 2.16.0+2.gf899a994 pypi_0 pypi
dlpack 0.2 he1b5a44_1 conda-forge
docutils 0.16 py37hc8dfbb8_1 conda-forge
double-conversion 3.1.5 he1b5a44_2 conda-forge
editdistance 0.5.3 py37h3340039_0 conda-forge
entrypoints 0.3 py37hc8dfbb8_1001 conda-forge
expat 2.2.9 he1b5a44_2 conda-forge
fastai 1.0.61 pypi_0 pypi
fastavro 0.23.3 py37h8f50634_0 conda-forge
fastprogress 0.2.3 pypi_0 pypi
fastrlock 0.4 py37h3340039_1001 conda-forge
flake8 3.8.1 pyh9f0ad1d_0 conda-forge
flatbuffers 1.12.0 he1b5a44_0 conda-forge
freetype 2.10.2 he06d7ca_0 conda-forge
fsspec 0.7.3 py_0 conda-forge
future 0.18.2 py37hc8dfbb8_1 conda-forge
gast 0.2.2 pypi_0 pypi
gflags 2.2.2 he1b5a44_1002 conda-forge
glog 0.4.0 h49b9bf7_3 conda-forge
gmp 6.2.0 he1b5a44_2 conda-forge
google-auth 1.14.3 pypi_0 pypi
google-auth-oauthlib 0.4.1 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpc-cpp 1.23.0 h18db393_0 conda-forge
grpcio 1.24.3 pypi_0 pypi
h5py 2.10.0 pypi_0 pypi
heapdict 1.0.1 py_0 conda-forge
hypothesis 5.13.1 py_0 conda-forge
icu 64.2 he1b5a44_1 conda-forge
identify 1.4.15 pyh9f0ad1d_0 conda-forge
idna 2.9 py_1 conda-forge
imagesize 1.2.0 py_0 conda-forge
importlib-metadata 1.6.0 py37hc8dfbb8_0 conda-forge
importlib_metadata 1.6.0 0 conda-forge
intel-openmp 2020.1 217
ipykernel 5.2.1 py37h43977f1_0 conda-forge
ipython 7.14.0 py37hc8dfbb8_0 conda-forge
ipython_genutils 0.2.0 py_1 conda-forge
isort 4.3.21 py37hc8dfbb8_1 conda-forge
jedi 0.17.0 py37hc8dfbb8_0 conda-forge
jinja2 2.11.2 pyh9f0ad1d_0 conda-forge
jpeg 9c h14c3975_1001 conda-forge
json5 0.9.4 py_0
jsonschema 3.2.0 py37hc8dfbb8_1 conda-forge
jupyter-server-proxy 1.4.0 pypi_0 pypi
jupyter_client 6.1.3 py_0 conda-forge
jupyter_core 4.6.3 py37hc8dfbb8_1 conda-forge
jupyterlab 1.2.6 pyhf63ae98_0
jupyterlab-nvdashboard 0.2.1 pypi_0 pypi
jupyterlab_server 1.1.1 py_0
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.1 pypi_0 pypi
kiwisolver 1.2.0 pypi_0 pypi
krb5 1.17.1 h2fd8d38_0 conda-forge
ld_impl_linux-64 2.34 h53a641e_0 conda-forge
libcurl 7.69.1 hf7181ac_0 conda-forge
libedit 3.1.20170329 hf8c457e_1001 conda-forge
libevent 2.1.10 h72c5cf5_0 conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc-ng 9.2.0 h24d8f2e_2 conda-forge
libgfortran-ng 7.3.0 hdf63c60_5 conda-forge
libllvm8 8.0.1 hc9558a2_0 conda-forge
libopenblas 0.3.9 h5ec1e0e_0 conda-forge
libpng 1.6.37 hed695b0_1 conda-forge
libprotobuf 3.8.0 h8b12597_0 conda-forge
librmm 0.14.0a200513 cuda10.2_340 rapidsai-nightly
libsodium 1.0.17 h516909a_0 conda-forge
libssh2 1.8.2 h22169c7_2 conda-forge
libstdcxx-ng 9.2.0 hdf63c60_2 conda-forge
libtiff 4.1.0 hfc65ed5_0 conda-forge
libuv 1.34.0 h516909a_0 conda-forge
llvm-openmp 10.0.0 hc9558a2_0 conda-forge
llvmlite 0.31.0 py37h5202443_1 conda-forge
locket 0.2.0 py_2 conda-forge
lz4-c 1.8.3 he1b5a44_1001 conda-forge
markdown 3.2.2 pypi_0 pypi
markupsafe 1.1.1 py37h8f50634_1 conda-forge
matplotlib 3.2.1 pypi_0 pypi
mccabe 0.6.1 py_1 conda-forge
mistune 0.8.4 py37h8f50634_1001 conda-forge
mkl 2020.1 217
mkl-service 2.3.0 py37he904b0f_0
mkl_fft 1.0.15 py37ha843d7b_0
mkl_random 1.1.0 py37hd6b4f25_0
more-itertools 8.2.0 py_0 conda-forge
msgpack-python 1.0.0 py37h99015e2_1 conda-forge
multidict 4.7.5 pypi_0 pypi
murmurhash 1.0.2 pypi_0 pypi
mypy_extensions 0.4.3 py37hc8dfbb8_1 conda-forge
nbconvert 5.6.1 py37hc8dfbb8_1 conda-forge
nbformat 5.0.6 py_0 conda-forge
nbsphinx 0.7.0 pyh9f0ad1d_0 conda-forge
nccl 2.6.4.1 hc6a2c23_0 conda-forge
ncurses 6.1 hf484d3e_1002 conda-forge
ninja 1.9.0 py37hfd86e86_0
nodeenv 1.3.5 py_0 conda-forge
nodejs 10.13.0 he6710b0_0
notebook 6.0.3 py37hc8dfbb8_0 conda-forge
numba 0.48.0 py37hb3f55d8_0 conda-forge
numexpr 2.7.1 pypi_0 pypi
numpy 1.18.1 py37h4f9e942_0
numpy-base 1.18.1 py37hde5b4d6_1
numpydoc 0.9.2 py_0 conda-forge
nvidia-ml-py3 7.352.0 pypi_0 pypi
nvstrings-cuda102 0.0.0.dev0 pypi_0 pypi
nvtabular 0.1.dev0 dev_0 <develop>
oauthlib 3.1.0 pypi_0 pypi
olefile 0.46 py_0 conda-forge
onnx 1.7.0 pypi_0 pypi
openssl 1.1.1g h7b6447c_0
opt-einsum 3.2.1 pypi_0 pypi
packaging 20.1 py_0 conda-forge
pandas 0.25.3 py37hb3f55d8_0 conda-forge
pandoc 1.19.2 0 conda-forge
pandocfilters 1.4.2 py_1 conda-forge
parquet-cpp 1.5.1 2 conda-forge
parso 0.7.0 pyh9f0ad1d_0 conda-forge
partd 1.1.0 py_0 conda-forge
pathspec 0.8.0 pyh9f0ad1d_0 conda-forge
pexpect 4.8.0 py37hc8dfbb8_1 conda-forge
pickleshare 0.7.5 py37hc8dfbb8_1001 conda-forge
pillow 7.1.2 py37h718be6c_0 conda-forge
pip 20.1 pyh9f0ad1d_0 conda-forge
plac 1.1.3 pypi_0 pypi
pluggy 0.13.1 py37hc8dfbb8_1 conda-forge
pre-commit 2.4.0 py37hc8dfbb8_0 conda-forge
pre_commit 2.4.0 0 conda-forge
preshed 3.0.2 pypi_0 pypi
prometheus_client 0.7.1 py_0 conda-forge
prompt-toolkit 3.0.5 py_0 conda-forge
protobuf 3.11.3 pypi_0 pypi
psutil 5.7.0 py37h8f50634_1 conda-forge
ptyprocess 0.6.0 py_1001 conda-forge
py 1.8.1 py_0 conda-forge
pyarrow 0.15.0 py37h8b68381_1 conda-forge
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pycodestyle 2.6.0 pyh9f0ad1d_0 conda-forge
pycparser 2.20 py_0 conda-forge
pyflakes 2.2.0 pyh9f0ad1d_0 conda-forge
pygments 2.6.1 py_0 conda-forge
pynvml 8.0.4 pypi_0 pypi
pyopenssl 19.1.0 py_1 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyrsistent 0.16.0 py37h8f50634_0 conda-forge
pysocks 1.7.1 py37hc8dfbb8_1 conda-forge
pytest 5.4.2 py37hc8dfbb8_0 conda-forge
pytest-cov 2.8.1 pypi_0 pypi
python 3.7.6 h8356626_5_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.7 1_cp37m conda-forge
pytorch 1.5.0 py3.7_cuda10.2.89_cudnn7.6.5_0 pytorch
pytz 2020.1 pyh9f0ad1d_0 conda-forge
pyyaml 5.3.1 py37h8f50634_0 conda-forge
pyzmq 19.0.1 py37hac76be4_0 conda-forge
rapidjson 1.1.0 he1b5a44_1002 conda-forge
re2 2020.05.01 he1b5a44_0 conda-forge
readline 8.0 hf8c457e_0 conda-forge
recommonmark 0.6.0 py_0 conda-forge
regex 2020.5.13 py37h8f50634_0 conda-forge
requests 2.23.0 pyh8c360ce_2 conda-forge
requests-oauthlib 1.3.0 pypi_0 pypi
rhash 1.3.6 h14c3975_1001 conda-forge
rmm 0.14.0a200513 py37_314 rapidsai-nightly
rsa 4.0 pypi_0 pypi
scipy 1.4.1 pypi_0 pypi
send2trash 1.5.0 py_0 conda-forge
setuptools 46.3.0 py37hc8dfbb8_0 conda-forge
simpervisor 0.3 pypi_0 pypi
six 1.14.0 py_1 conda-forge
snappy 1.1.8 he1b5a44_1 conda-forge
snowballstemmer 2.0.0 py_0 conda-forge
sortedcontainers 2.1.0 py_0 conda-forge
soupsieve 2.0 pypi_0 pypi
spacy 2.2.4 pypi_0 pypi
spdlog 1.5.0 hc9558a2_0 conda-forge
sphinx 3.0.3 py_0 conda-forge
sphinx-markdown-tables 0.0.14 pypi_0 pypi
sphinx_rtd_theme 0.4.3 py_0 conda-forge
sphinxcontrib-applehelp 1.0.2 py_0 conda-forge
sphinxcontrib-devhelp 1.0.2 py_0 conda-forge
sphinxcontrib-htmlhelp 1.0.3 py_0 conda-forge
sphinxcontrib-jsmath 1.0.1 py_0 conda-forge
sphinxcontrib-qthelp 1.0.3 py_0 conda-forge
sphinxcontrib-serializinghtml 1.1.4 py_0 conda-forge
sphinxcontrib-websupport 1.2.2 pyh9f0ad1d_0 conda-forge
sqlite 3.30.1 hcee41ef_0 conda-forge
srsly 1.0.2 pypi_0 pypi
streamz 0.5.3 pypi_0 pypi
tblib 1.6.0 py_0 conda-forge
tensorboard 2.1.1 pypi_0 pypi
tensorflow-estimator 2.1.0 pypi_0 pypi
tensorflow-gpu 2.1.0 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
terminado 0.8.3 py37hc8dfbb8_1 conda-forge
testpath 0.4.4 py_0 conda-forge
tfdlpack-gpu 0.1.3 pypi_0 pypi
thinc 7.4.0 pypi_0 pypi
thrift-cpp 0.12.0 hf3afdfd_1004 conda-forge
tk 8.6.10 hed695b0_0 conda-forge
toml 0.10.0 py_0 conda-forge
toolz 0.10.0 py_0 conda-forge
torchvision 0.6.0 py37_cu102 pytorch
tornado 6.0.4 py37h8f50634_1 conda-forge
tqdm 4.46.0 pypi_0 pypi
traitlets 4.3.3 py37hc8dfbb8_1 conda-forge
typed-ast 1.4.1 py37h516909a_0 conda-forge
typing_extensions 3.7.4.2 py_0 conda-forge
uriparser 0.9.3 he1b5a44_1 conda-forge
urllib3 1.25.9 py_0 conda-forge
virtualenv 16.7.5 py_0 conda-forge
wasabi 0.6.0 pypi_0 pypi
wcwidth 0.1.9 pyh9f0ad1d_0 conda-forge
webencodings 0.5.1 py_1 conda-forge
werkzeug 1.0.1 pypi_0 pypi
wheel 0.34.2 py_1 conda-forge
wrapt 1.12.1 pypi_0 pypi
xz 5.2.5 h516909a_0 conda-forge
yaml 0.2.4 h516909a_0 conda-forge
yarl 1.4.2 pypi_0 pypi
zeromq 4.3.2 he1b5a44_2 conda-forge
zict 2.0.0 pypi_0 pypi
zipp 3.1.0 py_0 conda-forge
zlib 1.2.11 h516909a_1006 conda-forge
zstd 1.4.3 h3b9ef0a_0 conda-forge
Additional info:
Docker version 19.03.8, build afacb8b7f0Additional context
N/A.
I've tested P100 and RTX 2060. As a result, this issue is reproduced on only P100. So, it looks like strongly related to GPU generation.
From the location of the fault, it looks like the GPU fault is happening either in gpu::BuildChunkDictionaries() or gpu::InitEncoderPages(). Adding a stream synchronize between the two calls should narrow it down, though it's most likely the former (If it's a race condition, it may be more likely to show up on Pascal than Volta).
Here's a cudf only repro:
import cudf
from cudf.io.parquet import ParquetWriter
cont_names = ["I" + str(x) for x in range(1, 14)]
cat_names = ["C" + str(x) for x in range(1, 27)]
cols = ["label"] + cont_names + cat_names
df = cudf.read_csv("day_0", sep="\t", names=cols, byte_range=(0, 840000000))
df = df.drop(columns=cont_names)
writer = ParquetWriter("/tmp/baremetal_out_0.parquet")
writer.write_table(df)
Also, I added a device synchronize after each call in init_page_fragments, gather_fragment_statistics,
build_chunk_dictionaries and now it seems to reliably break after BuildChunkDictionaries
I've been able to fix this and also have an idea why this fix works but finding it hard to confirm.
The fix is adding a __syncthreads() after this https://github.com/rapidsai/cudf/blob/4c9efa37a88b3beea9bbb9d459507200ba1e44d8/cpp/src/io/parquet/page_enc.cu#L185-L194
And I think this was broken because the value in scratch_red used in nz_pos += s->scratch_red[(t - 32) >> 5] is written by the previous warp which may reach https://github.com/rapidsai/cudf/blob/4c9efa37a88b3beea9bbb9d459507200ba1e44d8/cpp/src/io/parquet/page_enc.cu#L169-L172 in the next iteration of the loop before this one does.
I just don't know how to test this.
Yup, that makes sense, there should be a __syncthreads() before the next iteration of the loop. There was a similar issue in deflate with the race condition involving two points quite far apart ~ for some reason, the error tends to show up on Maxwell/Pascal but much less often on Turing/Volta (presumably longer mem latencies)
I just don't know how to test this.
If you were able to build a minimal reproducer that triggers sometimes on Pascal / Maxwell then it's worth adding it as a test that someone can run on a loop for example.
If we don't have a minimal reproducer I'd say we just check in a fix without a test and move on.
If you were able to build a minimal reproducer that triggers sometimes on Pascal / Maxwell then it's worth adding it as a test that someone can run on a loop for example.
I have a minimal reproducer that depends upon this 47GB file (well, actually only the beginning 800MB is enough to trigger this). Plus, tests are supposed to not take too long to run. This will take as long to run as a typical benchmark does.
I just don't know how to test this.
I mean that I don't have concrete evidence that the fix I suggest will definitely fix this bug, because I couldn't print dict_index data and compare it with and without __syncthreads(). There's too much data to print. I just have a strong suspicion and I do believe that there should be a __syncthreads() between the two code blocks regardless of this bug.
Is there anything sensitive / non-public in the data? One of the things we've done from the Python side at least is have tests that only run if the file is found and skip otherwise. We could take that approach for this file along with adding a comment with a link to download the data, so that if someone wants / needs to reproduce it down the line they can.