What is your question?
I was trying to load data larger than GPU memory. and i am getting following error.
Function: load_data
args: (125000000)
kwargs: {}
Exception: CudaAPIError(1, 'Call to cuMemcpyHtoD results in CUDA_ERROR_INVALID_VALUE')
ERROR: RMM call in line 68of file /conda/envs/gdf/conda-bld/libcudf_1558047478243/work/cpp/src/hash/managed_allocator.cuh failed with result RMM_ERROR_OUT_OF_MEMORY (4) Attempted to allocate: 4000000000 bytes.
distributed.worker - WARNING - Compute Failed
Function: groupby
args: (
Exception: MemoryError('std::bad_alloc')
ERROR: RMM call in line 68of file /conda/envs/gdf/conda-bld/libcudf_1558047478243/work/cpp/src/hash/managed_allocator.cuh failed with result RMM_ERROR_OUT_OF_MEMORY (4) Attempted to allocate: 4000000000 bytes.
distributed.worker - WARNING - Compute Failed
Function: groupby
Is it possible to avoid this error by any intelligence or any tool/framework?
i.e. Intelligently load data in chunks/spill over to CPU RAM/Disk; reduce it in GPU or CPU?? as Spark does.
I tried it with dask-cudf and it doesn't seem to be equipped yet to handle this scenario.
Any suggestions.
I have the same problem.
Reading ~5 GB csv data using dask-cudf. I have 2x 11GB vmem.
import os
import gc
import timeit
import cudf as cu
import dask_cudf as dkcu
x = dkcu.read_csv("data/G1_1e8_1e2_0_0.csv", header=0, dtype=['str','str','str','int32','int32','int32','int32','int32','float64'])
print(len(x.index), flush=True)
#terminate called after throwing an instance of 'thrust::system::system_error'
# what(): parallel_for failed: out of memory
#Aborted
could you please point me to any documentation that was requested in this issue?
https://github.com/rapidsai/cudf/issues/2277 seems to be related issue
You could try using RMM managed memory. You can enable that by setting the RMM allocator through a Dask client.run call:
client = Client(cluster) # Sample client connecting to `cluster` object
client.run(cudf.set_allocator, "managed") # Uses managed memory instead of "default"
By doing this, you'll be using CUDA unified memory, which will take care of moving data GPU<->CPU automatically when the GPU is running OOM.
EDIT: If you're not using Dask, you can simply use cudf.set_allocator("managed").
Thanks @pentschev going to close this issue. @jangorecki if this doesn't work please re-open.
@datametrician I will try this soon. Regarding the GH issue status, I think it is more appropriate to keep this issue open till requested info will land somewhere in the documentation. GitHub comments are fine but the end goal should be a complete working example in documentation.
Agreed, but there's another issue as you pointed out above about improving documentation that @taureandyernv is working on right now. We're trying to reduce the number of issues pointing to the same issues, but feel free to reopen. Our goal is to improve the community experience.
@datametrician I am unable to re-open closed issues here. #2277 is related but does not mention anything about managed memory usage, will put my comment there to not miss that.
There are newly published docs for dask-cuda here: https://dask-cuda.readthedocs.io/
In the docs there is a section on device memory spilling: https://dask-cuda.readthedocs.io/en/latest/specializations.html#spilling-from-device
This is handled at the dask worker level as opposed to directly within dask-cudf.