Xgboost: Xgboost GPU models do not release memory after training

Created on 17 Jan 2018 · 14Comments · Source: dmlc/xgboost

Xgboost doesn't release gpu memory after training/predicting the model on large data.
Every further rerun of .fit causes more memory allocation until eventual crash of the kernel because GPU memory is out of bounds.

Environment info

Operating System: Ubuntu 16.04 on PowerPC

Compiler:

Package used (python/R/jvm/C++): python

xgboost version used:

If installing from source, please provide

The commit hash (git rev-parse HEAD) 84ab74f3a56739829b03161fb9c249f3a760a518

If you are using python package, please provide

The python version and distribution: Python 2.7.12

Steps to reproduce

Following code should not cause issues but causes out of memory issues if you run it twice. You might have to decrease the repeat number for data depending on the GPU memory you have (16gb on my side)

import numpy as np
import xgboost as xgb
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.datasets import dump_svmlight_file
from sklearn.externals import joblib
from sklearn.metrics import precision_score

iris = datasets.load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# use DMatrix for xgbosot
dtrain = xgb.DMatrix(X_train.repeat(300000,axis=0), label=y_train.repeat(300000))
dtest = xgb.DMatrix(X_test.repeat(300000,axis=0), label=y_test.repeat(300000))

# set xgboost params
param = {
    'tree_method': 'gpu_exact',
    'max_depth': 3,  # the maximum depth of each tree
    'eta': 0.3,  # the training step for each iteration
    'silent': 1,  # logging mode - quiet
    'objective': 'multi:softprob',  # error evaluation for multiclass training
    'num_class': 3,
    'n_jobs':10}  # the number of classes that exist in this datset
num_round = 20  # the number of training iterations

#------------- numpy array ------------------
# training and testing - numpy matrices
bst = xgb.train(param, dtrain, num_round)
preds = bst.predict(dtest)

Source

tRosenflanz

👍2

Most helpful comment

@jpbowman01 "deleting the booster" in R would be

rm(bst)
gc()

khotilov on 16 Feb 2018

👍4

All 14 comments

Can you try calling bst.__delete__() after each round? Python is garbage collected so it may keep the booster object around. If the error persists after this then it may be a bug.

RAMitchell on 19 Jan 2018

Closing as no response. Can reopen if the issue persists.

RAMitchell on 23 Jan 2018

👎5

Sorry, I haven't gotten around to test it on the original system. I will give it a try and see what happens

tRosenflanz on 23 Jan 2018

Okay. I have called bst.__del__() which seems to work. Two things to note:

If bst.__del__() is called before the .predict the kernel dies and the core is dumped (makes sense that it won't work but kernel death can be prevented by some check I assume)
Booster keeps training data on the gpu before you call __del__() which means that if your training+inference data exceed GPU memory you will get OOM even though individual datasets might fit into the memory.That seems limiting since there is no need to keep training data in the GPU memory after training is completed. .predict()method on the other hand, purges the data after the call.

This raises a question - is there any way to purge the data off the GPU but keep the trained model?

P.S. By no means an expert in how the things are handled in this amazing package. I will understand if it is necessary to keep the training data after .fit is complete

tRosenflanz on 23 Jan 2018

Saving the model, deleting the booster then loading the model again should achieve this.

RAMitchell on 23 Jan 2018

👍1

Sounds good, thanks for the help!

tRosenflanz on 23 Jan 2018

I am having what appears to be the same problem, but using R. I'm not sure what the equivalent of "deleting the booster" in R would be, since what is returned in R is considered a model object. There also does not appear to be a close match to the bst.__del__() call in Python. Any suggestions for what might work in a similar manner to purge the data off the GPU would be much appreciated.

Since this is a closely-related issue, I'm hoping to piggyback on this ticket rather than opening a nearly-duplicate ticket.

jpbowman01 on 14 Feb 2018

@jpbowman01 "deleting the booster" in R would be

rm(bst)
gc()

khotilov on 16 Feb 2018

👍4

Have the same problem.
Did the delete() trick but it does not work,
bst.__delete__()

Booster' object has no attribute '__delete__'

aliyesilkanat on 22 Apr 2018

👍2

@aliyesilkanat typo above. it needs to be bst.__del__()

se-l on 17 May 2018

nonetheless not working for me. single process. applying .__del__(). also seeing in nvidia-smi that the GPU mem is being cleared. still always running into this issue even predictably. compiled with different nvidia drivers, GCCs, linux headers, cmake. dont understand why this issue is closed.

se-l on 17 May 2018

se-I, I had the same problem and was able to solve it using garbage collect gc.collect() after the del() command.

caolanko on 28 Aug 2018

I also have this problem on a windows machine, with xgboost 0.7 and tree_method='gpu_hist'. the GPU memory does not get released if, for example, xgbReggressor.fit finishes successfully, but some post-processing results in a Python Error).

del xgbRegressor gc.collect();

does not seem to release the GPU memory (but a kernel restart does:).

aviolov on 26 Sep 2018

Trying to call bst.__del__(), I get an exception:

'XGBRegressor' object has no attribute '__del__'

I run my models with {'predictor':'cpu_predictor'} (including, due to issue 3756), and so would like to free GPU memory as soon as training is finished. This way I would be able to test more hyper-parameter sets in parallel.