Incubator-mxnet: How can I clear the memory usage?

Created on 8 Feb 2016 · 12Comments · Source: apache/incubator-mxnet

Hi,
I have several networks, but I will train them sequentially. Now I am a little confused about how the memory gets cleared after one network does the forward and backward. For example, I have 2 networks, net1 and net2. First, I use bind to create an executor and do forward with is_train= True and do backward. Then I need to create another executor by binding net2. Since the first one will take too much memory especially many intermediate feature maps and so on, I am wondering if this would cause any trouble. What I want is after I use net1, I need to clear it to give space for net2. Could anyone explain to me how this works? Then I will be able to manage my memory more efficiently. Thanks a lot

Source

horserma

All 12 comments

You can bind multiple executors that share memory if you are never using them in parallel. Look into Executor.reshape for example. If your use case is applying the same network with different input shapes, you can directly use reshape(). Otherwise, you need to bind shared mem executors by yourself.

piiswrong on 8 Feb 2016

My case is not using the same network with different input shapes but dealing with different networks. I just want to know, if I bind another network, how I could clear the previous one, or it gets cleared automatically?

horserma on 8 Feb 2016

I mean like if first I set executor = net1.bind(...) and then I set executor = net2.bind(...), since it is the same variable, does it mean the first executor object will get released and cleared and the second created will be assigned to it. So I dont clear it manually, is that so? Just want to make sure. Thanks

horserma on 9 Feb 2016

It depends on the behavior of python's garbage collector. memory will be freed up when python executor object destructs. You can refer to python's document for that.

piiswrong on 9 Feb 2016

I tried this for example, I used vgg16 and defined its symbol as net1 and I have the same configuration called net2.
If I tried this:

executor = net1.simple_bind(ctx=mx.gpu(0), data=(1,3,224,224), grad_req="write")
executor = net2.simple_bind(ctx=mx.gpu(0), data=(1,3,224,224), grad_req="write")

this will cause out of memory error. If I only ran the first, everything is fine, which only took 2+g memory. But when I change it to the second one, it seems it doesnt release the memory so the memory just got overrun. So here is what I said earlier. how could I manage the memory? According to Python, I think as long as there is no reference to the variable, it would be cleared. However, I dont know how to deal with this situation. Any advice?

horserma on 9 Feb 2016

I think it is pretty necessary to bind different symbols dynamically. Because of the limited gpu memory, we can not run several networks in parallel, so binding them dynamically and assigning corresponding parameters can be necessary. So how could I do this without being out of memory? To be specific, how could I clear the memory up and set up a new network afterwards? @piiswrong

horserma on 9 Feb 2016

you can try delete executor.
On Feb 9, 2016 1:29 AM, "horserma" [email protected] wrote:

I think it is pretty necessary to bind different symbols dynamically.
Because of the limited gpu memory, we can not run several networks in
parallel, so binding them dynamically and assigning corresponding
parameters can be necessary. So how could I do this without being out of
memory? To be specific, how could I clear the memory up and set up a new
network afterwards? @piiswrong https://github.com/piiswrong

—
Reply to this email directly or view it on GitHub
https://github.com/dmlc/mxnet/issues/1432#issuecomment-181777190.

piiswrong on 9 Feb 2016

do u mean this "del executor"? I tried this, but it seems the memory can not be released instantly for gpu memory. It seems ok for mx.cpu(). is there a difference between releasing cpu memory and gpu memory due to the different behavior. And by the way, I always got segmentation fault by doing the simple_bind. I saw some previous issues talking about this. Have you resolved this yet?

horserma on 9 Feb 2016

I think I found a solution, just make the executor a local variable in a function. When the function returns, it gets released. Here I want to know more about the difference between bind and simple_bind. Since simple_bind only needs the shape of input, so unlike bind, which requires grad_arrays, does simple_bind need more time to allocate memory for grad_arrays? Is this just trivial time, or I have to consider? so if I simple_bind and create an executor for each batch. Do I have to worry about this time consumption? Thanks a lot @piiswrong

horserma on 10 Feb 2016

If you are frequently creating executors, I strongly recommend you use similar mechanisms like in executor.reshape to create shared memory executors. You can also cache them to mitigate construction cost.

piiswrong on 10 Feb 2016

👎4

This worked for me:

del mod
gc.collect()
# memory should be freed

joeddav on 25 Jan 2018

This worked for me:

del mod
gc.collect()
# memory should be freed

but in Vram Gpu, my memory still don't reduce.

damvantai on 5 Sep 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

what's the usage of ' is_train' in forward?

xzqjack · 3Comments

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.

zy-huang · 3Comments

Mxnet : test and validation accuracy during training ?

Shiro-LK · 3Comments

Use R, how to manually confficients predict y^, make the result equal to the function `predict`?

GuilongZh · 3Comments

No module named bbox when running rcnn demo.py

realbns2008 · 3Comments