Describe the bug
If i create a DataFrame with cudf.read_parquet() and delete the DataFrame, I dont get all my memory back
Steps/Code to reproduce bug
Run the following script and in parallel have another terminal running watch -n1 nvidia-smi
You will see the memory consumption go up and up after every iteration.
import time
import cudf
if __name__ == "__main__":
filepath = "/home/user/someDecentlySizedFile.parquet"
for i in range(0, 10):
gdf = cudf.read_parquet(filepath)
del gdf
print("iteration %s completed!"%(i))
print("it's time to wait for 3secs")
time.sleep(3)
Expected behavior
when deleting a dataframe the data should be completely freed.
Environment overview (please complete the following information)
Installed cudf using conda. Currently running:
cudf 0.12.0b200124 py37_2002 rapidsai-nightly
dask-cudf 0.12.0b200124 py37_2007 rapidsai-nightly
libcudf 0.12.0b200128 cuda10.0_2007 rapidsai-nightly
Environment details
./print_env.sh
Click here to see environment details
**git***
commit 03d22b30c7315096d8d7d3b94c172b896a81595d (HEAD -> branch-0.12, upstream/branch-0.12)
Merge: 0fa88df 3225a83
Author: Ray Douglass <[email protected]>
Date: Fri Jan 24 11:31:29 2020 -0500
Merge pull request #3906 from raydouglass/pandas-version
[REVIEW] Update cudf meta for pandas version #3486 [skip-ci]
**git submodules***
b165e1fb11eeea64ccf95053e40f2424312599cc thirdparty/cub (v1.7.1)
bcd545071c7a5ddb28cb6576afc6399eb1286c43 thirdparty/jitify (heads/cudf)
cdcda484d0c7db114ea29c3b33429de5756ecfd8 thirdparty/libcudacxx (0.8.1-99-gcdcda48)
a97a7380c76346c22bb67b93695bed19592afad2 thirdparty/libcudacxx/libcxx (heads/rapidsai-interop)
***OS Information***
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.5 LTS"
NAME="Ubuntu"
VERSION="16.04.5 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.5 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Linux william-Lith 4.4.0-140-generic #166-Ubuntu SMP Wed Nov 14 20:09:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
***GPU Information***
Thu Jan 30 16:43:38 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:17:00.0 Off | N/A |
| 0% 54C P2 31W / 120W | 2MiB / 3019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 106... Off | 00000000:65:00.0 On | N/A |
| 31% 56C P0 34W / 120W | 1199MiB / 3016MiB | 9% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 1240 G /usr/lib/xorg/Xorg 582MiB |
| 1 1817 G compiz 100MiB |
| 1 56976 G ...quest-channel-token=9815290313812846948 86MiB |
| 1 56986 G ...uest-channel-token=13177755846581590175 200MiB |
| 1 64364 G ...uest-channel-token=11432194168798043902 227MiB |
+-----------------------------------------------------------------------------+
***CPU***
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz
Stepping: 4
CPU MHz: 1699.824
CPU max MHz: 4000.0000
CPU min MHz: 1200.0000
BogoMIPS: 7007.90
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 8448K
NUMA node0 CPU(s): 0-11
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ssbd ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req flush_l1d
***CMake***
/home/william/miniconda3/envs/blazingsql2/bin/cmake
cmake version 3.16.2
CMake suite maintained and supported by Kitware (kitware.com/cmake).
***g++***
/usr/bin/g++
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
***nvcc***
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
***Python***
/home/william/miniconda3/envs/blazingsql2/bin/python
Python 3.7.5
***Environment Variables***
PATH : /home/william/miniconda3/envs/blazingsql2/bin:/home/william/miniconda3/condabin:/usr/local/cuda/bin:/home/william/bin:/home/william/.local/bin:/opt/maven/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LD_LIBRARY_PATH : /home/william/miniconda3/envs/blazingsql2/lib
NUMBAPRO_NVVM :
NUMBAPRO_LIBDEVICE :
CONDA_PREFIX : /home/william/miniconda3/envs/blazingsql2
PYTHON_PATH :
***conda packages***
/home/william/miniconda3/condabin/conda
# packages in environment at /home/william/miniconda3/envs/blazingsql2:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
arrow-cpp 0.15.0 py37h090bef1_2 conda-forge
blazingsql 0.6 pypi_0 pypi
bokeh 1.4.0 py37_0 conda-forge
boost-cpp 1.70.0 h8e57a91_2 conda-forge
brotli 1.0.7 he1b5a44_1000 conda-forge
bsql-engine 0.6 pypi_0 pypi
bsql-rapids-thirdparty 0.11 4 blazingsql-nightly
bsql-toolchain 0.11 4 blazingsql-nightly
bsql-toolchain-aws-cpp 0.11 0 blazingsql-nightly
bsql-toolchain-gcp-cpp 0.11 0 blazingsql-nightly
bzip2 1.0.8 h516909a_2 conda-forge
c-ares 1.15.0 h516909a_1001 conda-forge
ca-certificates 2019.11.28 hecc5488_0 conda-forge
certifi 2019.11.28 py37_0 conda-forge
cffi 1.13.2 py37h8022711_0 conda-forge
chardet 3.0.4 py37_1003 conda-forge
click 7.0 py_0 conda-forge
cloudpickle 1.2.2 py_1 conda-forge
cmake 3.16.2 h28c56e5_0 conda-forge
cppzmq 4.4.1 hc9558a2_0 conda-forge
cryptography 2.8 py37h72c5cf5_1 conda-forge
cudatoolkit 10.0.130 0 nvidia
cudf 0.12.0b200124 py37_2002 rapidsai-nightly
cudnn 7.6.0 cuda10.0_0 nvidia
cupy 6.6.0 py37h809cb0f_1 conda-forge
curl 7.65.3 hf8cf82a_0 conda-forge
cyrus-sasl 2.1.27 he38ecfd_0 conda-forge
cython 0.29.14 py37he1b5a44_0 conda-forge
cytoolz 0.10.1 py37h516909a_0 conda-forge
dask 2.9.1 py_0 conda-forge
dask-core 2.9.1 py_0 conda-forge
dask-cuda 0.12.0a200128 py37_48 rapidsai-nightly
dask-cudf 0.12.0b200124 py37_2007 rapidsai-nightly
distributed 2.9.1 py_0 conda-forge
dlpack 0.2 he1b5a44_1 conda-forge
double-conversion 3.1.5 he1b5a44_2 conda-forge
et-xmlfile 1.0.1 pypi_0 pypi
expat 2.2.5 he1b5a44_1004 conda-forge
fastavro 0.22.9 py37h516909a_0 conda-forge
fastrlock 0.4 py37he1b5a44_1000 conda-forge
freetype 2.10.0 he983fc9_1 conda-forge
fsspec 0.6.2 py_0 conda-forge
future 0.18.2 py37_0 conda-forge
gflags 2.2.2 he1b5a44_1002 conda-forge
gitdb2 2.0.6 pypi_0 pypi
gitpython 3.0.5 pypi_0 pypi
glog 0.4.0 he1b5a44_1 conda-forge
gmock 1.10.0 1 conda-forge
grpc-cpp 1.23.0 h18db393_0 conda-forge
gtest 1.10.0 hc9558a2_1 conda-forge
heapdict 1.0.1 py_0 conda-forge
icu 64.2 he1b5a44_1 conda-forge
idna 2.8 py37_1000 conda-forge
jdcal 1.4.1 pypi_0 pypi
jinja2 2.10.3 py_0 conda-forge
jpeg 9c h14c3975_1001 conda-forge
jpype1 0.7 py37h9de70de_0 conda-forge
krb5 1.16.4 h173b8e3_0
libblas 3.8.0 14_openblas conda-forge
libcblas 3.8.0 14_openblas conda-forge
libcudf 0.12.0b200128 cuda10.0_2007 rapidsai-nightly
libcurl 7.65.3 hda55be3_0 conda-forge
libedit 3.1.20181209 hc058e9b_0
libevent 2.1.10 h72c5cf5_0 conda-forge
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_2 conda-forge
liblapack 3.8.0 14_openblas conda-forge
libllvm8 8.0.1 hc9558a2_0 conda-forge
libntlm 1.4 h14c3975_1002 conda-forge
libnvstrings 0.12.0b200128 cuda10.0_2007 rapidsai-nightly
libopenblas 0.3.7 h5ec1e0e_6 conda-forge
libpng 1.6.37 hed695b0_0 conda-forge
libprotobuf 3.8.0 h8b12597_0 conda-forge
librmm 0.12.0a200128 cuda10.0_194 rapidsai-nightly
libsodium 1.0.17 h516909a_0 conda-forge
libssh2 1.8.2 h22169c7_2 conda-forge
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 hfc65ed5_0 conda-forge
libuv 1.34.0 h516909a_0 conda-forge
llvmlite 0.30.0 py37h8b12597_1 conda-forge
locket 0.2.0 py_2 conda-forge
lz4-c 1.8.3 he1b5a44_1001 conda-forge
markupsafe 1.1.1 py37h516909a_0 conda-forge
maven 3.6.0 0 conda-forge
msgpack-python 0.6.2 py37hc9558a2_0 conda-forge
nccl 2.4.6.1 cuda10.0_0 nvidia
ncurses 6.1 he6710b0_1
netifaces 0.10.9 py37h516909a_1000 conda-forge
numba 0.46.0 py37hb3f55d8_1 conda-forge
numpy 1.17.3 py37h95a1406_0 conda-forge
nvstrings 0.12.0b200128 py37_2007 rapidsai-nightly
olefile 0.46 py_0 conda-forge
openjdk 8.0.192 h14c3975_1003 conda-forge
openpyxl 3.0.3 pypi_0 pypi
openssl 1.1.1d h516909a_0 conda-forge
packaging 20.0 py_0 conda-forge
pandas 0.24.2 py37hb3f55d8_1 conda-forge
parquet-cpp 1.5.1 2 conda-forge
partd 1.1.0 py_0 conda-forge
pillow 5.3.0 py37h00a061d_1000 conda-forge
pip 19.3.1 py37_0
psutil 5.6.7 py37h516909a_0 conda-forge
py4j 0.10.7 py_1 conda-forge
pyarrow 0.15.0 py37h8b68381_1 conda-forge
pycparser 2.19 py37_1 conda-forge
pydrill 0.3.4 pypi_0 pypi
pyhive 0.6.1 py37_0
pymysql 0.9.3 pypi_0 pypi
pynvml 8.0.3 py_0 conda-forge
pyopenssl 19.1.0 py37_0 conda-forge
pyparsing 2.4.6 py_0 conda-forge
pysocks 1.7.1 py37_0 conda-forge
pyspark 2.4.3 py_0 conda-forge
python 3.7.5 h0371630_0
python-dateutil 2.8.1 py_0 conda-forge
pytz 2019.3 py_0 conda-forge
pyyaml 5.2 py37h516909a_0 conda-forge
rapidjson 1.1.0 he1b5a44_1002 conda-forge
re2 2020.01.01 he1b5a44_0 conda-forge
readline 7.0 h7b6447c_5
requests 2.22.0 py37_1 conda-forge
rhash 1.3.6 h14c3975_1001 conda-forge
rmm 0.12.0a200128 py37_194 rapidsai-nightly
sasl 0.2.1 py37he1b5a44_1001 conda-forge
setuptools 44.0.0 py37_0
six 1.13.0 py37_0 conda-forge
smmap2 2.0.5 pypi_0 pypi
snappy 1.1.7 he1b5a44_1003 conda-forge
sortedcontainers 2.1.0 py_0 conda-forge
sqlalchemy 1.3.10 py37h516909a_0 conda-forge
sqlite 3.30.1 h7b6447c_0
tblib 1.6.0 py_0 conda-forge
thrift 0.11.0 py37he1b5a44_1001 conda-forge
thrift-cpp 0.12.0 hf3afdfd_1004 conda-forge
thrift_sasl 0.3.0 py37h516909a_1001 conda-forge
tk 8.6.8 hbc83047_0
toolz 0.10.0 py_0 conda-forge
tornado 6.0.3 py37h516909a_0 conda-forge
uriparser 0.9.3 he1b5a44_1 conda-forge
urllib3 1.25.7 py37_0 conda-forge
wheel 0.33.6 py37_0
xz 5.2.4 h14c3975_4
yaml 0.2.2 h516909a_1 conda-forge
zeromq 4.3.2 he1b5a44_2 conda-forge
zict 1.0.0 py_0 conda-forge
zlib 1.2.11 h7b6447c_3
zstd 1.4.3 h3b9ef0a_0 conda-forge
Additional context
Add any other context about the problem here.
The following file can replicate the issue. It seems like not all files will replicate the issue:
https://drive.google.com/file/d/1tUJvxDU_eGJ2zUkaFl0OIaGGgjxYF9xf/view?usp=sharing
The dtypes of that file are:
o_orderkey int64
o_custkey int32
o_orderstatus object
o_totalprice float64
o_orderdate datetime64[ms]
o_orderpriority object
o_clerk object
o_shippriority object
o_comment object
dtype: object
@OlivierNV does anything in those dtypes stick out to you? The Cython uses pretty standard utilities here so I don't think we're leaking at that layer.
@kkraus14 can reproduce a memory leak with this example:
n_rows = 100_000_000
n_cols = 4
for i in range(50):
df = cudf.DataFrame()
for i in range(n_cols):
df[str(i)] = np.ones(n_rows)
With every iteration the mem usage increases (slightly) and with many iterations 50-100 the difference is noticeable.
@OlivierNV it looks like the mask pointer can be non-null when you're giving us a pointer to an NVStrings object when dtype is GDF_STRING, is this expected / the issue? We are currently not freeing it.
@ayushdg you have nested loops with the same variable, is that intended?
@ayushdg you have nested loops with the same variable, is that intended?
Yes. I can probably do away with the inner loop and just work with one column. But I intended to overwrite the same variable to see the memory usage at the end of the whole process.
For my case after deleting df (outside the loop) I'm still left with 3GB of memory not accounted for.
@OlivierNV it looks like the mask pointer can be non-null when you're giving us a pointer to an NVStrings object when dtype is
GDF_STRING, is this expected / the issue? We are currently not freeing it.
Yeah, strings can have a separate null mask (I suppose it's redundant with the nullptr in the string pairs). Not sure if we expect it to be freed -> I'll have to take a closer look at the code.
[Edit] Oh, I see, it's on the cython side, the legacy libcudf reader just returns a normal gdf_column with both data and null mask.
@ayushdg thanks for the repro, we found a separate second memory leak that affects many float columns. Both should be fixed by the PR.
[Edit] Oh, I see, it's on the cython side, the legacy libcudf reader just returns a normal gdf_column with both data and null mask.
Yea the thing here is data is a pointer to an NVStrings instance which encapsulates nulls so we weren't expecting to have to free the mask ptr. No worries though as this is a quick fix and it will go away as part of refactoring.
Thanks for the eyes @OlivierNV!
Ah, yeah, the reader initially populates the data with string {ptr, len} pairs along with null mask, then before returning the gdf_column, data is subsequently swapped with the corresponding nvstring instance, but null mask left untouched, so we end up with nvstring + null_mask.
Can't wait till we switch to the libcudf++ readers :)
Fixed by #4017
Most helpful comment
Ah, yeah, the reader initially populates the data with string {ptr, len} pairs along with null mask, then before returning the gdf_column, data is subsequently swapped with the corresponding nvstring instance, but null mask left untouched, so we end up with nvstring + null_mask.
Can't wait till we switch to the libcudf++ readers :)