Incubator-mxnet: [Feature Request] Release gpu memory by API.

Created on 30 Nov 2018 · 27Comments · Source: apache/incubator-mxnet

Description

When it is required to do a forward propagation at first, and then do another one in the same process,
if gpu memory malloced by the first propagation can not be released, the second one will not have enough memory to use.

Feature request Memory

Source

DuCheng2018

👍9

Most helpful comment

This API only releases the cache in the GPU memory pool. It doesn't release the NDArray cache held by CachedOp

szha on 27 Oct 2019

👍2

All 27 comments

@DuCheng2018 Thank you for the feature, request.

vrakesh on 30 Nov 2018

@mxnet-label-bot add [Feature Request]

vrakesh on 30 Nov 2018

I am getting multiple(> 5 times) requests similar to this one, I think we need a formal C API as well as python API so user can explicitly release gpu memory pool without waiting for program to exit.

zhreshold on 30 Nov 2018

+1, for some application it is necessary to release gpu memory.

haoxintong on 3 Dec 2018

+1, yes, this is a function needed badly

relaxli00 on 5 Dec 2018

Zhao-yunfei on 6 Dec 2018

hdjsjyl on 7 Dec 2018

kohillyang on 8 Dec 2018

HermanLiu1 on 12 Dec 2018

fallingdust on 8 Jan 2019

fe-codes on 25 Mar 2019

cgraywang on 4 May 2019

Shicc on 8 May 2019

yuyijie1995 on 24 May 2019

noodle-overflow on 11 Jul 2019

evi-Genius on 22 Jul 2019

Wondersui on 22 Jul 2019

anonymouslycn on 15 Aug 2019

khui on 29 Aug 2019

hgt312 on 5 Sep 2019

smartwell on 9 Sep 2019

Kongsea on 25 Sep 2019

ducnguyen96 on 10 Oct 2019

Hi, there.
Thank @vladoovtcharov for adding this API for GPU memory release.
The API is mx.gpu(0).empty_cache().
Firstly, we should set the environment variable MXNET_GPU_MEM_POOL_TYPE to use GPU memory pool.

Here is an example:

# MXNET_GPU_MEM_POOL_TYPE=Round python
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet as mx
>>> a = mx.nd.zeros((10000, 30000), ctx=mx.gpu(0)) + 1
[14:56:53] src/storage/storage.cc:110: Using GPUPooledRoundedStorageManager.
>>> del a
>>> mx.gpu(0).empty_cache()

In my test, the most GPU memory will be released after calling this API, although there is still around 600Mib memory cache.