Hi,
I have several networks, but I will train them sequentially. Now I am a little confused about how the memory gets cleared after one network does the forward and backward. For example, I have 2 networks, net1 and net2. First, I use bind to create an executor and do forward with is_train= True and do backward. Then I need to create another executor by binding net2. Since the first one will take too much memory especially many intermediate feature maps and so on, I am wondering if this would cause any trouble. What I want is after I use net1, I need to clear it to give space for net2. Could anyone explain to me how this works? Then I will be able to manage my memory more efficiently. Thanks a lot
You can bind multiple executors that share memory if you are never using them in parallel. Look into Executor.reshape for example. If your use case is applying the same network with different input shapes, you can directly use reshape(). Otherwise, you need to bind shared mem executors by yourself.
My case is not using the same network with different input shapes but dealing with different networks. I just want to know, if I bind another network, how I could clear the previous one, or it gets cleared automatically?
I mean like if first I set executor = net1.bind(...) and then I set executor = net2.bind(...), since it is the same variable, does it mean the first executor object will get released and cleared and the second created will be assigned to it. So I dont clear it manually, is that so? Just want to make sure. Thanks
It depends on the behavior of python's garbage collector. memory will be freed up when python executor object destructs. You can refer to python's document for that.
I tried this for example, I used vgg16 and defined its symbol as net1 and I have the same configuration called net2.
If I tried this:
executor = net1.simple_bind(ctx=mx.gpu(0), data=(1,3,224,224), grad_req="write")
executor = net2.simple_bind(ctx=mx.gpu(0), data=(1,3,224,224), grad_req="write")
this will cause out of memory error. If I only ran the first, everything is fine, which only took 2+g memory. But when I change it to the second one, it seems it doesnt release the memory so the memory just got overrun. So here is what I said earlier. how could I manage the memory? According to Python, I think as long as there is no reference to the variable, it would be cleared. However, I dont know how to deal with this situation. Any advice?
I think it is pretty necessary to bind different symbols dynamically. Because of the limited gpu memory, we can not run several networks in parallel, so binding them dynamically and assigning corresponding parameters can be necessary. So how could I do this without being out of memory? To be specific, how could I clear the memory up and set up a new network afterwards? @piiswrong
you can try delete executor.
On Feb 9, 2016 1:29 AM, "horserma" [email protected] wrote:
I think it is pretty necessary to bind different symbols dynamically.
Because of the limited gpu memory, we can not run several networks in
parallel, so binding them dynamically and assigning corresponding
parameters can be necessary. So how could I do this without being out of
memory? To be specific, how could I clear the memory up and set up a new
network afterwards? @piiswrong https://github.com/piiswrong—
Reply to this email directly or view it on GitHub
https://github.com/dmlc/mxnet/issues/1432#issuecomment-181777190.
do u mean this "del executor"? I tried this, but it seems the memory can not be released instantly for gpu memory. It seems ok for mx.cpu(). is there a difference between releasing cpu memory and gpu memory due to the different behavior. And by the way, I always got segmentation fault by doing the simple_bind. I saw some previous issues talking about this. Have you resolved this yet?
I think I found a solution, just make the executor a local variable in a function. When the function returns, it gets released. Here I want to know more about the difference between bind and simple_bind. Since simple_bind only needs the shape of input, so unlike bind, which requires grad_arrays, does simple_bind need more time to allocate memory for grad_arrays? Is this just trivial time, or I have to consider? so if I simple_bind and create an executor for each batch. Do I have to worry about this time consumption? Thanks a lot @piiswrong
If you are frequently creating executors, I strongly recommend you use similar mechanisms like in executor.reshape to create shared memory executors. You can also cache them to mitigate construction cost.
This worked for me:
del mod
gc.collect()
# memory should be freed
This worked for me:
del mod gc.collect() # memory should be freed
but in Vram Gpu, my memory still don't reduce.