Xgboost: Allow releasing GPU memory

Created on 17 Jul 2019  路  11Comments  路  Source: dmlc/xgboost

One common piece of feedback we receive about the GPU algorithms is that memory is not released after training. It may be possible to release memory by deleting the booster object but this is not a great user experience.

See

4018

3083

2663

3045

The reason why we have not implemented this already is that the internal C++ code does not actually know when training is finished. The language bindings call each training iteration one by one and I don't believe we have any information inside the GPU training code to say if another training iteration is expected or not.

I see a few solutions:
1) We try to use some heuristic internally to decide if it is a good time to free all memory from inside GBTree.
2) We implement an API function for cleanup - this function can be specific to GPU memory or it can just be a general hint for xgboost to delete any working memory or temporary data structures. I do not like this option as it will propagate through the entire code base - the learner, booster, updaters and predictor will all have to implement these methods.
3) We implement a method in the language bindings where the booster object serializes itself and then deserializes from disk. Doing this will clear all temporary data structures and should leave the booster in a usable state to resume training or do prediction.

I am leaning towards option 3) but I think it relies on #3980 to make sure all parameters are correctly saved. Maybe it's still possible to do this with current serialization and not have any unexpected side-effects due to parameters not all being saved.

@trivialfis @sriramch @rongou

feature-request

All 11 comments

Or we pass num boost round to c++?

For those who look for a quick workaround till you fix it properly check my solution here

@seanthegreat7 Thanks. That's actually an interesting workaround.

None of the workarounds seem to be working on Windows 10. Tried deleting and loading the booster object (still crashed).

Tried predicting in a subprocess similar to @seanthegreat7 (but for R instead of python). The subprocess just ran indefinitely without finishing.

Would indeed be greatly appreciated if you provided a solution for this issue!

I'm finding this very difficult especially when performing a wide parameter search in a loop of some kind.

For example:

exp_models= []
for cnt, mdl_version in enumerate(range(200)):
    clf = xgb.XGBClassifier(booster='gbtree', objective='binary:logistic', 
                tree_method='gpu_hist', n_gpus=1, gpu_id=1, n_estimators=30) 
    trained_model = clf.fit(X_train, y_train, verbose=False)
    exp_models.append(trained_model)

This will crash, since I guess the trained_model hangs around on the GPU indefinitely. Alternatively, if I exp_models.append(trained_model.get_booster().copy()) all is well.

However, I'm also running into the same issue when submitting numerous jobs via a Dask scheduler (note not dask-xgboost).

In both cases I eventually get:

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: out of memory

I don't have a view on the best solution, but would love to resolve.

My hack is to do this

xgbPredictor = xgboost.XGBRegressor(**self.xgb_params)            
        xgbPredictor .fit(Xs,ys)

        # This hack should only be used if tree_method == gpu_hist or gpu_exact
        if self.xgb_params['tree_method'][:3] == 'gpu':        
            with tempfile.TemporaryFile() as dump_file:
                pickle.dump(xgbPredictor , dump_file)
                dump_file.seek(0)
                self.predictor_ = pickle.load(dump_file)
        else:
            self.predictor_= xgbPredictor

and it has solved my GPU mem-leak

wouldn't it be easier to implement the function as in pytorch?
Like:
torch.cuda.empty_cache()

It wouldn't be easier, but that's an option.

@trivialfis Do you (or someone else) plan to fix this problem at all?
in Dask and Spark it is not like this?

I am running into this same issue, when training many small gpu_hist models.

Could you please open a new issue?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

XiaoxiaoWang87 picture XiaoxiaoWang87  路  3Comments

FabHan picture FabHan  路  4Comments

hx364 picture hx364  路  3Comments

trivialfis picture trivialfis  路  3Comments

choushishi picture choushishi  路  3Comments