Cudf: [BUG] Error loading files (possible issues with RMM)

Created on 11 Oct 2019  路  6Comments  路  Source: rapidsai/cudf

Describe the bug
Error loading files (possible issues with RMM)

Steps/Code to reproduce bug
1) Download this file https://github.com/rapidsai/cudf/blob/branch-0.11/python/cudf/cudf/tests/data/orc/TestOrcFile.test1.orc

2) Run this script

import cudf
path = "/code/cudf/python/cudf/cudf/tests/data/orc/TestOrcFile.test1.orc"
gdf = cudf.read_orc(path)
print(gdf)

3) See results at the end:

boolean1 byte1 short1 int1 long1 ... list.int1 list.string1 map map.int1 map.string1
0 False 1 1024 65536 9223372036854775807 ... 4 chani 5 chani
1 True 100 2048 65536 9223372036854775807 ... 2147483647 mauddib 1 mauddib

[2 rows x 16 columns]
terminate called after throwing an instance of 'thrust::system::system_error'
what(): rmm_allocator::deallocate(): RMM_FREE: initialization error
Aborted

Expected behavior
Should be able to read the file without exceptions.

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of cuDF install: [conda]

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details:
`(n103) percy@pctabz:~/Blazing/projects/code/cudf$ ./print_env.sh

Click here to see environment details

 **git***
 commit 64588be99736aef7b3f84eba0698a009e1e815e6 (HEAD -> branch-0.10, origin/branch-0.10, origin/HEAD)
 Merge: 335f10f 2c3cfd3
 Author: Ray Douglass <[email protected]>
 Date:   Fri Oct 11 14:53:50 2019 -0400

 Merge pull request #3058 from beckernick/docs/udf-doc-markdown-formatting

 [REVIEW] Fix UDF doc markdown formatting
 **git submodules***
 -b165e1fb11eeea64ccf95053e40f2424312599cc thirdparty/cub
 -63f644be44201467e3938d59ed9d89cc8725c35d thirdparty/jitify

 ***OS Information***
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=16.04
 DISTRIB_CODENAME=xenial
 DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
 NAME="Ubuntu"
 VERSION="16.04.4 LTS (Xenial Xerus)"
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME="Ubuntu 16.04.4 LTS"
 VERSION_ID="16.04"
 HOME_URL="http://www.ubuntu.com/"
 SUPPORT_URL="http://help.ubuntu.com/"
 BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
 VERSION_CODENAME=xenial
 UBUNTU_CODENAME=xenial
 Linux pctabz 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

 ***GPU Information***
 Fri Oct 11 16:12:37 2019
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |===============================+======================+======================|
 |   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
 | N/A   45C    P8    N/A /  N/A |    133MiB /  4042MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+

 +-----------------------------------------------------------------------------+
 | Processes:                                                       GPU Memory |
 |  GPU       PID   Type   Process name                             Usage      |
 |=============================================================================|
 |    0     10587      C   ./testing-libgdf                             123MiB |
 +-----------------------------------------------------------------------------+

 ***CPU***
 Architecture:          x86_64
 CPU op-mode(s):        32-bit, 64-bit
 Byte Order:            Little Endian
 CPU(s):                8
 On-line CPU(s) list:   0-7
 Thread(s) per core:    2
 Core(s) per socket:    4
 Socket(s):             1
 NUMA node(s):          1
 Vendor ID:             GenuineIntel
 CPU family:            6
 Model:                 158
 Model name:            Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
 Stepping:              9
 CPU MHz:               2800.000
 CPU max MHz:           3800,0000
 CPU min MHz:           800,0000
 BogoMIPS:              5616.00
 Virtualization:        VT-x
 L1d cache:             32K
 L1i cache:             32K
 L2 cache:              256K
 L3 cache:              6144K
 NUMA node0 CPU(s):     0-7
 Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti retpoline intel_pt rsb_ctxsw tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

 ***CMake***
 /home/percy/Applications/anaconda/conda/envs/n103/bin/cmake
 cmake version 3.15.4

 CMake suite maintained and supported by Kitware (kitware.com/cmake).

 ***g++***
 /usr/bin/g++
 g++ (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
 Copyright (C) 2015 Free Software Foundation, Inc.
 This is free software; see the source for copying conditions.  There is NO
 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


 ***nvcc***

 ***Python***
 /home/percy/Applications/anaconda/conda/envs/n103/bin/python
 Python 3.7.4

 ***Environment Variables***
 PATH                            : /home/percy/Applications/anaconda/conda/envs/n103/bin:/home/percy/Applications/anaconda/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/home/percy/Applications/anaconda/conda/bin
 LD_LIBRARY_PATH                 :
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    : /home/percy/Applications/anaconda/conda/envs/n103
 PYTHON_PATH                     :

 ***conda packages***
 /home/percy/Applications/anaconda/conda/condabin/conda
 # packages in environment at /home/percy/Applications/anaconda/conda/envs/n103:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                        main
 alsa-lib                  1.1.5             h516909a_1001    conda-forge
 arrow-cpp                 0.14.1           py37h5ac5442_4    conda-forge
 blazingdb-protocol        1.0                       dev_0    <develop>
 blazingsql-toolchain      0.4.4                         0    blazingsql
 bokeh                     1.3.4                    py37_0    conda-forge
 boost-cpp                 1.70.0               h8e57a91_2    conda-forge
 brotli                    1.0.7             he1b5a44_1000    conda-forge
 bzip2                     1.0.8                h516909a_1    conda-forge
 c-ares                    1.15.0            h516909a_1001    conda-forge
 ca-certificates           2019.9.11            hecc5488_0    conda-forge
 certifi                   2019.9.11                py37_0    conda-forge
 chardet                   3.0.4                    pypi_0    pypi
 click                     7.0                        py_0    conda-forge
 cloudpickle               1.2.2                      py_0    conda-forge
 cmake                     3.15.4               hf94ab9c_0    conda-forge
 cudatoolkit               10.0.130                      0
 cudf                      0.10.0a191011         py37_2577    rapidsai-nightly/label/cuda10.0
 curl                      7.65.3               hbc83047_0
 cython                    0.29.13          py37he1b5a44_0    conda-forge
 cytoolz                   0.10.0           py37h516909a_0    conda-forge
 dask                      2.5.2                      py_0    conda-forge
 dask-core                 2.5.2                      py_0    conda-forge
 dask-cudf                 0.10.0a191011         py37_2577    rapidsai-nightly/label/cuda10.0
 distributed               2.5.2                      py_0    conda-forge
 dlpack                    0.2                  he1b5a44_1    conda-forge
 double-conversion         3.1.5                he1b5a44_1    conda-forge
 et-xmlfile                1.0.1                    pypi_0    pypi
 expat                     2.2.5             he1b5a44_1003    conda-forge
 fastavro                  0.22.5           py37h516909a_0    conda-forge
 flatbuffers               1.11                     pypi_0    pypi
 fontconfig                2.13.1            h86ecdb6_1001    conda-forge
 freetype                  2.10.0               he983fc9_1    conda-forge
 fsspec                    0.5.2                      py_0    conda-forge
 gflags                    2.2.2             he1b5a44_1001    conda-forge
 giflib                    5.1.7                h516909a_1    conda-forge
 gitdb2                    2.0.6                    pypi_0    pypi
 gitpython                 3.0.3                    pypi_0    pypi
 glog                      0.4.0                he1b5a44_1    conda-forge
 gmock                     1.10.0                        0    conda-forge
 grpc-cpp                  1.23.0               h18db393_0    conda-forge
 gtest                     1.10.0               hc9558a2_0    conda-forge
 heapdict                  1.0.1                      py_0    conda-forge
 icu                       64.2                 he1b5a44_1    conda-forge
 idna                      2.8                      pypi_0    pypi
 jdcal                     1.4.1                    pypi_0    pypi
 jinja2                    2.10.3                     py_0    conda-forge
 jpeg                      9c                h14c3975_1001    conda-forge
 krb5                      1.16.1               h173b8e3_7
 lcms2                     2.9                  h2e4bb80_0    conda-forge
 libblas                   3.8.0               13_openblas    conda-forge
 libcblas                  3.8.0               13_openblas    conda-forge
 libcudf                   0.10.0a191011     cuda10.0_2577    rapidsai-nightly/label/cuda10.0
 libcurl                   7.65.3               h20c2e04_0
 libedit                   3.1.20181209         hc058e9b_0
 libevent                  2.1.10               h72c5cf5_0    conda-forge
 libffi                    3.2.1                hd88cf55_4
 libgcc-ng                 9.1.0                hdf63c60_0
 libgfortran-ng            7.3.0                hdf63c60_0
 libiconv                  1.15              h516909a_1005    conda-forge
 liblapack                 3.8.0               13_openblas    conda-forge
 libnvstrings              0.10.0a191011     cuda10.0_2577    rapidsai-nightly/label/cuda10.0
 libopenblas               0.3.7                h6e990d7_1    conda-forge
 libpng                    1.6.37               hed695b0_0    conda-forge
 libprotobuf               3.8.0                h8b12597_0    conda-forge
 librmm                    0.10.0a191011      cuda10.0_169    rapidsai-nightly/label/cuda10.0
 libssh2                   1.8.2                h22169c7_2    conda-forge
 libstdcxx-ng              9.1.0                hdf63c60_0
 libtiff                   4.0.10            h57b8799_1003    conda-forge
 libuuid                   2.32.1            h14c3975_1000    conda-forge
 libuv                     1.32.0               h516909a_0    conda-forge
 libxcb                    1.13              h14c3975_1002    conda-forge
 libxml2                   2.9.9                hee79883_5    conda-forge
 llvmlite                  0.29.0           py37hfd453ef_1    conda-forge
 locket                    0.2.0                      py_2    conda-forge
 lz4-c                     1.8.3             he1b5a44_1001    conda-forge
 markupsafe                1.1.1            py37h14c3975_0    conda-forge
 maven                     3.6.0                         0    conda-forge
 msgpack-python            0.6.2            py37hc9558a2_0    conda-forge
 ncurses                   6.1                  he6710b0_1
 numba                     0.45.1           py37hb3f55d8_0    conda-forge
 numpy                     1.17.2           py37h95a1406_0    conda-forge
 nvstrings                 0.10.0a191011         py37_2577    rapidsai-nightly/label/cuda10.0
 olefile                   0.46                       py_0    conda-forge
 openjdk                   8.0.192           h14c3975_1003    conda-forge
 openpyxl                  3.0.0                    pypi_0    pypi
 openssl                   1.1.1c               h516909a_0    conda-forge
 packaging                 19.2                       py_0    conda-forge
 pandas                    0.24.2           py37hb3f55d8_0    conda-forge
 parquet-cpp               1.5.1                         2    conda-forge
 partd                     1.0.0                      py_0    conda-forge
 pillow                    6.2.0            py37h34e0f95_0
 pip                       19.2.3                   py37_0
 psutil                    5.6.3            py37h516909a_0    conda-forge
 pthread-stubs             0.4               h14c3975_1001    conda-forge
 py4j                      0.10.7                     py_1    conda-forge
 pyarrow                   0.14.1           py37h8b68381_2    conda-forge
 pyblazing                 0.1                       dev_0    <develop>
 pydrill                   0.3.4                    pypi_0    pypi
 pymysql                   0.9.3                    pypi_0    pypi
 pynvml                    8.0.3                    pypi_0    pypi
 pyparsing                 2.4.2                      py_0    conda-forge
 pyspark                   2.4.3                      py_0    conda-forge
 python                    3.7.4                h265db76_1
 python-dateutil           2.8.0                      py_0    conda-forge
 pytz                      2019.3                     py_0    conda-forge
 pyyaml                    5.1.2            py37h516909a_0    conda-forge
 rapidjson                 1.1.0             he1b5a44_1002    conda-forge
 re2                       2019.09.01           he1b5a44_0    conda-forge
 readline                  7.0                  h7b6447c_5
 requests                  2.22.0                   pypi_0    pypi
 rhash                     1.3.6             h14c3975_1001    conda-forge
 rmm                       0.10.0a191011          py37_169    rapidsai-nightly/label/cuda10.0
 setuptools                41.4.0                   py37_0
 six                       1.12.0                py37_1000    conda-forge
 smmap2                    2.0.5                    pypi_0    pypi
 snappy                    1.1.7             he1b5a44_1002    conda-forge
 sortedcontainers          2.1.0                      py_0    conda-forge
 sqlite                    3.30.0               h7b6447c_0
 tblib                     1.4.0                      py_0    conda-forge
 thrift-cpp                0.12.0            hf3afdfd_1004    conda-forge
 tk                        8.6.8                hbc83047_0
 toolz                     0.10.0                     py_0    conda-forge
 tornado                   6.0.3            py37h516909a_0    conda-forge
 uriparser                 0.9.3                he1b5a44_1    conda-forge
 urllib3                   1.25.6                   pypi_0    pypi
 wheel                     0.33.6                   py37_0
 xorg-fixesproto           5.0               h14c3975_1002    conda-forge
 xorg-inputproto           2.3.2             h14c3975_1002    conda-forge
 xorg-kbproto              1.0.7             h14c3975_1002    conda-forge
 xorg-libx11               1.6.9                h516909a_0    conda-forge
 xorg-libxau               1.0.9                h14c3975_0    conda-forge
 xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
 xorg-libxext              1.3.4                h516909a_0    conda-forge
 xorg-libxfixes            5.0.3             h516909a_1004    conda-forge
 xorg-libxi                1.7.10               h516909a_0    conda-forge
 xorg-libxrender           0.9.10            h516909a_1002    conda-forge
 xorg-libxtst              1.2.3             h14c3975_1002    conda-forge
 xorg-recordproto          1.14.2            h14c3975_1002    conda-forge
 xorg-renderproto          0.11.1            h14c3975_1002    conda-forge
 xorg-xextproto            7.3.0             h14c3975_1002    conda-forge
 xorg-xproto               7.0.31            h14c3975_1007    conda-forge
 xz                        5.2.4                h14c3975_4
 yaml                      0.1.7             h14c3975_1001    conda-forge
 zict                      1.0.0                      py_0    conda-forge
 zlib                      1.2.11               h7b6447c_3
 zstd                      1.4.0                h3b9ef0a_0    conda-forge

`

CUDA 10.0

bug

Most helpful comment

Should be fixed in branch-0.10 by PR #3061 (but currently blocked by PR #3089 in branch-0.11)

All 6 comments

FWIW, I'm seeing the same thing when running test_parquet.py from cudf python tests in branch-0.11, not specific to orc.

Yep, same on batched reads of a big CSV. Seems to happen on Python exit of an otherwise ok kernel:

    combined_dfs = None
    cols = [str(c) for c in cudf.io.read_csv(infile, nrows=1).columns]
    print('columns: %s' % cols)
    while True:
        print("Step", str(offset))
        if offset > 100000:
            print('EARLY EXIT FOR 1M+ ROWS')
            break
        df = cudf.io.read_csv(infile, nrows=step, skiprows=offset, names=cols)
        if combined_dfs is None:
            combined_dfs = df2
        else:
            combined_dfs = cudf.concat([ combined_dfs, df2 ]).drop_duplicates([col])

    print('combined after lines: %s' % len(df2))
    print('done, saving to disk')
    combined_dfs.to_pandas().to_csv(outfile, index=False, chunksize=100000, header=False)
    print('done, exiting.')

=>

...
Step 200001
EARLY EXIT FOR 1M+ ROWS
done, saving to disk
done, exiting.
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  rmm_allocator::deallocate(): RMM_FREE: initialization error

real    0m33.327s
user    0m0.028s
sys 0m0.027s

This is on rapidsai-nightly (docker) / p100, and the csv was only taking 20% of GPU RAM. Both cuda9.2-ubuntu16.04 and cuda10.0-ubuntu18.04.

(and no error on rapidsai/rapidsai:cuda10.0-runtime-ubuntu18.04)

Should be fixed in branch-0.10 by PR #3061 (but currently blocked by PR #3089 in branch-0.11)

Great! hope we can have the fix soon on the nightly release for 0.10

I just confirm the issue is gone! Thank you!

Was this page helpful?
0 / 5 - 0 ratings