Hello,
When I run a training using learn.py, the process allocated all the memory of the GPU.
Is there a way to avoid this, and make it takes only what it needs?
Thanks
I dont think the problem is with tensorflow or the ml-agents (when i start training it uses about 20 mb vram)
You should check if your game is doing what you think it does. If you have some kind of memory leak you have to remember than the game is played 100 times as fast as normal, so that might amplify the problem.
Hi @r-lipton, is this using one of our sample environments or your own? Generally, as @Hengoo and @MarcoMeter pointed out, we haven't noticed this on our environments.
I have seen this problem with OpenAI.Baselines when invoking a 2nd training run. Setting gpu_options.allow_growth = True fixed it for me
replace trainer_controller.py line 212 with tf.Session(config=config) as sess: with:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
Update: I tested this today and was able to run multiple training runs concurrently on a single GPU
It's using my own created environment.
The solution of @Sohojoe worked for me, thanks!
Hi all. I've made a PR for this, and it will be added to the v0.5 release. https://github.com/Unity-Technologies/ml-agents/pull/1192
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Most helpful comment
I have seen this problem with OpenAI.Baselines when invoking a 2nd training run. Setting
gpu_options.allow_growth = Truefixed it for mereplace trainer_controller.py line 212
with tf.Session(config=config) as sess:with:Update: I tested this today and was able to run multiple training runs concurrently on a single GPU