ENVIRONMENT
conda create -n rapids-0.16 -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.16 python=3.7 cudatoolkit=10.2CODE
train_data = xgboost.DMatrix(data=X_train, label=y_train)test_data = xgboost.DMatrix(data=X_test, label=y_test) couple cells down the line, they are not executed togetherERROR
---------------------------------------------------------------------------
XGBoostError Traceback (most recent call last)
<ipython-input-25-7bd66d4fabf4> in <module>
1 #train = xgboost.DMatrix(data=X, label=y) #ORIGINAL
----> 2 test_data = xgboost.DMatrix(data=X_test, label=y_test)
~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/core.py in __init__(self, data, label, weight, base_margin, missing, silent, feature_names, feature_types, nthread, enable_categorical)
448 feature_names=feature_names,
449 feature_types=feature_types,
--> 450 enable_categorical=enable_categorical)
451 assert handle is not None
452 self.handle = handle
~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/data.py in dispatch_data_backend(data, missing, threads, feature_names, feature_types, enable_categorical)
543 if _is_cudf_df(data):
544 return _from_cudf_df(data, missing, threads, feature_names,
--> 545 feature_types)
546 if _is_cudf_ser(data):
547 return _from_cudf_df(data, missing, threads, feature_names,
~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/data.py in _from_cudf_df(data, missing, nthread, feature_names, feature_types)
400 ctypes.c_float(missing),
401 ctypes.c_int(nthread),
--> 402 ctypes.byref(handle)))
403 return handle, feature_names, feature_types
404
~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/core.py in _check_call(ret)
184 """
185 if ret != 0:
--> 186 raise XGBoostError(py_str(_LIB.XGBGetLastError()))
187
188
XGBoostError: [12:32:18] /opt/conda/envs/rapids/conda-bld/xgboost_1603491651651/work/src/c_api/../data/../common/device_helpers.cuh:400: Memory allocation error on worker 0: std::bad_alloc: CUDA error at: ../include/rmm/mr/device/cuda_memory_resource.hpp:68: cudaErrorMemoryAllocation out of memory
- Free memory: 1539047424
- Requested memory: 3091258960
Stack trace:
[bt] (0) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(+0x13674f) [0x7fad04f7274f]
[bt] (1) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(dh::detail::ThrowOOMError(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long)+0x3ad) [0x7fad05190b0d]
[bt] (2) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(dh::detail::XGBDefaultDeviceAllocatorImpl<xgboost::Entry>::allocate(unsigned long)+0x1df) [0x7fad051ac11f]
[bt] (3) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(thrust::detail::vector_base<xgboost::Entry, dh::detail::XGBDefaultDeviceAllocatorImpl<xgboost::Entry> >::fill_insert(thrust::detail::normal_iterator<thrust::device_ptr<xgboost::Entry> >, unsigned long, xgboost::Entry const&)+0x26d) [0x7fad051d0d0d]
[bt] (4) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::HostDeviceVector<xgboost::Entry>::Resize(unsigned long, xgboost::Entry)+0xc9) [0x7fad051d1cc9]
[bt] (5) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::data::SimpleDMatrix::SimpleDMatrix<xgboost::data::CudfAdapter>(xgboost::data::CudfAdapter*, float, int)+0x3df) [0x7fad052259cf]
[bt] (6) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::DMatrix* xgboost::DMatrix::Create<xgboost::data::CudfAdapter>(xgboost::data::CudfAdapter*, float, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long)+0x133) [0x7fad051f3aa3]
[bt] (7) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(XGDMatrixCreateFromArrayInterfaceColumns+0xc6) [0x7fad0518c286]
[bt] (8) /home/ubuntu/anaconda3/envs/rapids/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7fae60078630]
CODE 2 If I clean a out a restart the notebook that execute them together in 1 cell.
train_data = xgboost.DMatrix(data=X_train, label=y_train)
test_data = xgboost.DMatrix(data=X_test, label=y_test)
ERROR 2
---------------------------------------------------------------------------
XGBoostError Traceback (most recent call last)
<ipython-input-20-f0c3710678a8> in <module>
1 #train = xgboost.DMatrix(data=X, label=y) #ORIGINAL
2 train_data = xgboost.DMatrix(data=X_train, label=y_train)
----> 3 test_data = xgboost.DMatrix(data=X_test, label=y_test)
~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/core.py in __init__(self, data, label, weight, base_margin, missing, silent, feature_names, feature_types, nthread, enable_categorical)
448 feature_names=feature_names,
449 feature_types=feature_types,
--> 450 enable_categorical=enable_categorical)
451 assert handle is not None
452 self.handle = handle
~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/data.py in dispatch_data_backend(data, missing, threads, feature_names, feature_types, enable_categorical)
543 if _is_cudf_df(data):
544 return _from_cudf_df(data, missing, threads, feature_names,
--> 545 feature_types)
546 if _is_cudf_ser(data):
547 return _from_cudf_df(data, missing, threads, feature_names,
~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/data.py in _from_cudf_df(data, missing, nthread, feature_names, feature_types)
400 ctypes.c_float(missing),
401 ctypes.c_int(nthread),
--> 402 ctypes.byref(handle)))
403 return handle, feature_names, feature_types
404
~/anaconda3/envs/rapids/lib/python3.7/site-packages/xgboost/core.py in _check_call(ret)
184 """
185 if ret != 0:
--> 186 raise XGBoostError(py_str(_LIB.XGBGetLastError()))
187
188
XGBoostError: [15:20:36] /opt/conda/envs/rapids/conda-bld/xgboost_1603491651651/work/src/c_api/../data/../common/device_helpers.cuh:400: Memory allocation error on worker 0: std::bad_alloc: CUDA error at: ../include/rmm/mr/device/cuda_memory_resource.hpp:68: cudaErrorMemoryAllocation out of memory
- Free memory: 3015442432
- Requested memory: 3091258960
Stack trace:
[bt] (0) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(+0x13674f) [0x7f7eea73674f]
[bt] (1) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(dh::detail::ThrowOOMError(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long)+0x3ad) [0x7f7eea954b0d]
[bt] (2) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(dh::detail::XGBDefaultDeviceAllocatorImpl<xgboost::Entry>::allocate(unsigned long)+0x1df) [0x7f7eea97011f]
[bt] (3) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(thrust::detail::vector_base<xgboost::Entry, dh::detail::XGBDefaultDeviceAllocatorImpl<xgboost::Entry> >::fill_insert(thrust::detail::normal_iterator<thrust::device_ptr<xgboost::Entry> >, unsigned long, xgboost::Entry const&)+0x26d) [0x7f7eea994d0d]
[bt] (4) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::HostDeviceVector<xgboost::Entry>::Resize(unsigned long, xgboost::Entry)+0xc9) [0x7f7eea995cc9]
[bt] (5) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::data::SimpleDMatrix::SimpleDMatrix<xgboost::data::CudfAdapter>(xgboost::data::CudfAdapter*, float, int)+0x3df) [0x7f7eea9e99cf]
[bt] (6) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(xgboost::DMatrix* xgboost::DMatrix::Create<xgboost::data::CudfAdapter>(xgboost::data::CudfAdapter*, float, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long)+0x133) [0x7f7eea9b7aa3]
[bt] (7) /home/ubuntu/anaconda3/envs/rapids/lib/libxgboost.so(XGDMatrixCreateFromArrayInterfaceColumns+0xc6) [0x7f7eea950286]
[bt] (8) /home/ubuntu/anaconda3/envs/rapids/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f8044f8d630]
Ultimately this is just an out of memory error: cudaErrorMemoryAllocation out of memory
I would suggest trying a larger GPU with 32GB of GPU memory.
@kkraus14
Ultimately this is just an out of memory error:
cudaErrorMemoryAllocation out of memoryI would suggest trying a larger GPU with 32GB of GPU memory.
3 091 258 960 -> 3 Gigabyte
3 015 442 432 -> 3 Gigabyte
And this GPU has 16 GB VRAM
For ERROR 2: It looks like you have at least X_train, X_test, and train_data in GPU memory when you try to create test_data which causes the OOM. Add in needing some temporary space for calculations and you can very quickly hit the 16GB limit.
@kkraus14
Q1.) How can I delete things from GPU memory from jupyter lab to have enough space for the next cell?
Q2.) is it ok to keep data in daks dataframe and just at training and testing use the 16 gb VRAM?
How can I delete things from GPU memory from jupyter lab to have enough space for the next cell?
Generally, you just want to make sure you don't have python variables referring to GPU backed objects lying around. If they are then Python can't garbage collect them and we can't free the GPU memory. Additionally, in Jupyter instead of just doing a, do print(a). Doing just a, causes Jupyter to hold a reference to the variable a which prevents garbage collection.
is it ok to keep data in daks dataframe and just at training and testing use the 16 gb VRAM?
That will do all of the dataframe computation on the CPU instead of GPU.
@kkraus14
Generally, you just want to make sure you don't have python variables referring to GPU backed objects lying around. If they are then Python can't garbage collect them and we can't free the GPU memory. Additionally, in Jupyter instead of just doing a, do print(a). Doing just a, causes Jupyter to hold a reference to the variable a which prevents garbage collection.
I just want to make sure I understand it correctly.
EXAMPLE
Cells remark jupyter notebook cells
1.cell
data = cudf.read_csv('X_train.csv.txt', delimiter=',', skiprows=1, names=colNames, dtype=['float64', 'float64', 'float64'])
2.cell
X = data[1:500,:]
y = data[0;:]
3.cell
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size = 0.25, random_state = 0, shuffle=True)
??? 4.cell
?At this point how can I clean out 'data' from the GPU memory?
5.cell
train_data = xgboost.DMatrix(data=X_train, label=y_train)
6.cell
MACHINE LEARNING
??? 7.cell
?At this point how can I clean out 'train_data' from the GPU memory?
8.cell
test_data = xgboost.DMatrix(data=X_test, label=y_test)
....
4.
You could do something like:
data, X, y = None
or
del(data)
del(X)
del(y)
Similar approach would be taken in 7 to clear the other variables.
Most helpful comment
4.
You could do something like:
or
Similar approach would be taken in 7 to clear the other variables.