OS:Ubuntu 16.04.5
cuda:release 9.2, V9.2.148
xgboost:0.80
I'm using skopt.gp_minimize for optimizing xgboost's parameters on mult-GPU. after some iterations I got OOM like these:
terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
what(): what(): std::bad_alloc: out of memory
std::bad_alloc: out of memory
ion.py", line 404, in _send_bytes
self._send(header + buf)
File "/home/wuyh/anaconda3/lib/python3.7/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/wuyh/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/wuyh/anaconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(self._args, *self._kwargs)
File "/home/wuyh/anaconda3/lib/python3.7/multiprocessing/pool.py", line 132, in worker
put((job, i, (False, wrapped)))
File "/home/wuyh/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/pool.py", line 386, in put
return send(obj)
File "/home/wuyh/anaconda3/lib/python3.7/site-packages/sklearn/externals/joblib/pool.py", line 372, in send
self._writer.send_bytes(buffer.getvalue())
File "/home/wuyh/anaconda3/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/home/wuyh/anaconda3/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header + buf)
File "/home/wuyh/anaconda3/lib/python3.7/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Can you try using version 0.81? We have a bug fix related memory usage: #3635.
@hcho3 thanks your advice, however, I just change xgboost from 0.80 into 0.81, another error throws like these:
terminate called after throwing an instance of 'dmlc::Error'
terminate called after throwing an instance of 'dmlc::Error'
what(): [10:12:10] /workspace/include/xgboost/./../../src/common/common.h:41: /workspace/src/tree/updater_gpu_hist.cu: 279: invalid argument
Stack trace returned 7 entries:
[bt] (0) /home/wuyh/anaconda3/xgboost/libxgboost.so(dmlc::StackTrace()+0x3d) [0x7fa131bc0b0d]
[bt] (1) /home/wuyh/anaconda3/xgboost/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x18) [0x7fa131bc0f08]
[bt] (2) /home/wuyh/anaconda3/xgboost/libxgboost.so(+0x34faa0) [0x7fa131e10aa0]
[bt] (3) /home/wuyh/anaconda3/xgboost/libxgboost.so(+0x3517bb) [0x7fa131e127bb]
[bt] (4) /home/wuyh/anaconda3/bin/../lib/libgomp.so.1(+0x11bef) [0x7fa138207bef]
[bt] (5) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fa1cec0c6ba]
[bt] (6) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fa1ce94241d]
what(): [10:12:10] /workspace/include/xgboost/./../../src/common/common.h:41: /workspace/src/tree/updater_gpu_hist.cu: 279: invalid argument
Stack trace returned 10 entries:
[bt] (0) /home/wuyh/anaconda3/xgboost/libxgboost.so(dmlc::StackTrace()+0x3d) [0x7fa131bc0b0d]
[bt] (1) /home/wuyh/anaconda3/xgboost/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x18) [0x7fa131bc0f08]
[bt] (2) /home/wuyh/anaconda3/xgboost/libxgboost.so(+0x34faa0) [0x7fa131e10aa0]
[bt] (3) /home/wuyh/anaconda3/xgboost/libxgboost.so(+0x3517bb) [0x7fa131e127bb]
[bt] (4) /home/wuyh/anaconda3/xgboost/libxgboost.so(void dh::ExecuteShards
[bt] (5) /home/wuyh/anaconda3/xgboost/libxgboost.so(xgboost::tree::GPUHistMaker::BuildHistLeftRight(int, int, int)+0x249) [0x7fa131e28599]
[bt] (6) /home/wuyh/anaconda3/xgboost/libxgboost.so(xgboost::tree::GPUHistMaker::UpdateTree(xgboost::HostDeviceVector
[bt] (7) /home/wuyh/anaconda3/xgboost/libxgboost.so(xgboost::tree::GPUHistMaker::Update(xgboost::HostDeviceVector
[bt] (8) /home/wuyh/anaconda3/xgboost/libxgboost.so(xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector
[bt] (9) /home/wuyh/anaconda3/xgboost/libxgboost.so(xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix, xgboost::HostDeviceVector
Can you post the full script?
OK, the server has 4GPUs, tesla V100, and the training datasets.shape:(888096, 60), which size is 410.088649MB
the key script snippet with related xgboost :
if __name__ == "__main__":
train=get_dataframe(base_url)# get training datasets
preporcessing(train)# do feature engineering
pp_xgb = {'predictor':'cpu_predictor',"tree_method":'gpu_hist',"n_gpus":-1,"gpu_id":0,'n_jobs':2,'max_bin':63}
reg = XGBRegressor(**pp_xgb)
space = [Integer(5, 25, name='max_depth'),
Real(.005, .1, "log-uniform", name='learning_rate'),
Integer(800, 1000, name='n_estimators'),
Real(0.05,1,'log-uniform', name='gamma'),
Real(1e-9,1.,'log-uniform', name='reg_alpha'),
Real(1e-9,1000,'log-uniform', name='reg_lambda'),
Real(.6,1.,'log-uniform', name='colsample_bytree'),
Real(.6,1.,'log-uniform', name='subsample')]
feature_selected = train.columns[:-6]
X = train[feature_selected]
q = deque(maxlen=15)
for period in [7,14,21]:
q.extend([1000000]*50)#init deque for each period
RET_AF = 'Y{:d}'.format(period)
y = train[RET_AF]
cv = PurgedTimeSeriesSplit(n_splits=2, period=period)#self-definition CV
@use_named_args(space)
def objective(**params):
reg.set_params(**params)
return -np.mean(cross_val_score(reg, X, y, cv=cv, n_jobs=-1,verbose=1,pre_dispatch=1,
scoring="neg_mean_squared_error"))
# optimizing
mycallback = MyCallback(50)
res_gp = gp_minimize(objective, space, n_calls=50,callback=mycallback)
#logging infos
#
gc.enable()
del res_gp
gc.collect()
You are using a maxdepth range from 5-25. Actually every one increase in max depth doubles the size of the tree, essentially doubling the memory requirements of the algorithm.
We've seen this problem before, and I'd recommend we deprecate that parameter in favor of providing a power-of-two max leaf number that makes explicit the growth in computation cost/memory/model size, or at least make it very clear that changing this setting from something like 10 to 15 is _not_ a 50% increase in resources but rather a 32x increase (1024 leaves vs. 32768).
@thvasilo thanks, maybe the max_depth is the key of this solution other than xgboost itself, I'll try it out later...
@wuyunhua I'd say the general recommendation is to not go above 10 for this parameter. It's probably preferable to add more trees (iterations) if you find that your model is underfitting, though given the sensitivity of the algorithm that's unlikely (much easier to over rather than under-fit).
Most helpful comment
@wuyunhua I'd say the general recommendation is to not go above 10 for this parameter. It's probably preferable to add more trees (iterations) if you find that your model is underfitting, though given the sensitivity of the algorithm that's unlikely (much easier to over rather than under-fit).