Cudf: How do you handle out of memory error in cudf or dask-cudf

Created on 28 Jun 2019 · 7Comments · Source: rapidsai/cudf

What is your question?
I was trying to load data larger than GPU memory. and i am getting following error.

Function: load_data
args: (125000000)
kwargs: {}
Exception: CudaAPIError(1, 'Call to cuMemcpyHtoD results in CUDA_ERROR_INVALID_VALUE')

ERROR: RMM call in line 68of file /conda/envs/gdf/conda-bld/libcudf_1558047478243/work/cpp/src/hash/managed_allocator.cuh failed with result RMM_ERROR_OUT_OF_MEMORY (4) Attempted to allocate: 4000000000 bytes.
distributed.worker - WARNING - Compute Failed
Function: groupby
args: ( kwargs: {}
Exception: MemoryError('std::bad_alloc')

Is it possible to avoid this error by any intelligence or any tool/framework?

i.e. Intelligently load data in chunks/spill over to CPU RAM/Disk; reduce it in GPU or CPU?? as Spark does.

I tried it with dask-cudf and it doesn't seem to be equipped yet to handle this scenario.

Any suggestions.

dask-cudf

Source

vnkesarwani

👍1

All 7 comments

I have the same problem.
Reading ~5 GB csv data using dask-cudf. I have 2x 11GB vmem.

import os
import gc
import timeit
import cudf as cu
import dask_cudf as dkcu
x = dkcu.read_csv("data/G1_1e8_1e2_0_0.csv", header=0, dtype=['str','str','str','int32','int32','int32','int32','int32','float64'])
print(len(x.index), flush=True)
#terminate called after throwing an instance of 'thrust::system::system_error'
#  what():  parallel_for failed: out of memory
#Aborted

could you please point me to any documentation that was requested in this issue?
https://github.com/rapidsai/cudf/issues/2277 seems to be related issue

jangorecki on 6 Jan 2020

👍1

You could try using RMM managed memory. You can enable that by setting the RMM allocator through a Dask client.run call:

client = Client(cluster)  # Sample client connecting to `cluster` object
client.run(cudf.set_allocator, "managed")  # Uses managed memory instead of "default"

By doing this, you'll be using CUDA unified memory, which will take care of moving data GPU<->CPU automatically when the GPU is running OOM.

EDIT: If you're not using Dask, you can simply use cudf.set_allocator("managed").

pentschev on 8 Jan 2020

Thanks @pentschev going to close this issue. @jangorecki if this doesn't work please re-open.

datametrician on 9 Jan 2020

@datametrician I will try this soon. Regarding the GH issue status, I think it is more appropriate to keep this issue open till requested info will land somewhere in the documentation. GitHub comments are fine but the end goal should be a complete working example in documentation.

jangorecki on 9 Jan 2020

👍1

Agreed, but there's another issue as you pointed out above about improving documentation that @taureandyernv is working on right now. We're trying to reduce the number of issues pointing to the same issues, but feel free to reopen. Our goal is to improve the community experience.

datametrician on 9 Jan 2020

👍1

@datametrician I am unable to re-open closed issues here. #2277 is related but does not mention anything about managed memory usage, will put my comment there to not miss that.

jangorecki on 10 Jan 2020

👍1

There are newly published docs for dask-cuda here: https://dask-cuda.readthedocs.io/

In the docs there is a section on device memory spilling: https://dask-cuda.readthedocs.io/en/latest/specializations.html#spilling-from-device

This is handled at the dask worker level as opposed to directly within dask-cudf.

kkraus14 on 29 May 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[FEA] Update Python implementation of fillna to use libcudf function

kkraus14 · 3Comments

Latest Docker container gives CUDA driver version error

MurrayData · 3Comments

[BUG] datetime dtype not being inferred by cudf avro reader

galipremsagar · 3Comments

automatically infer dtypes when using read_csv

ericmjl · 3Comments

[BUG] RunTimeError in `cudf::strings::starts_with`, `cudf::strings::ends_with` and `cudf::strings::find` when `target=''`

galipremsagar · 3Comments