Ml-agents: Learn.py takes all GPU memory

Created on 11 Apr 2018 · 6Comments · Source: Unity-Technologies/ml-agents

Hello,

When I run a training using learn.py, the process allocated all the memory of the GPU.
Is there a way to avoid this, and make it takes only what it needs?

Thanks

help-wanted needs-info

Source

r-lipton

Most helpful comment

I have seen this problem with OpenAI.Baselines when invoking a 2nd training run. Setting gpu_options.allow_growth = True fixed it for me

replace trainer_controller.py line 212 with tf.Session(config=config) as sess: with:

    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    with tf.Session(config=config) as sess:

Update: I tested this today and was able to run multiple training runs concurrently on a single GPU

Sohojoe on 12 Apr 2018

👍3 🎉2

All 6 comments

I dont think the problem is with tensorflow or the ml-agents (when i start training it uses about 20 mb vram)

You should check if your game is doing what you think it does. If you have some kind of memory leak you have to remember than the game is played 100 times as fast as normal, so that might amplify the problem.

Hengoo on 11 Apr 2018

👍1

Hi @r-lipton, is this using one of our sample environments or your own? Generally, as @Hengoo and @MarcoMeter pointed out, we haven't noticed this on our environments.

mmattar on 11 Apr 2018

I have seen this problem with OpenAI.Baselines when invoking a 2nd training run. Setting gpu_options.allow_growth = True fixed it for me

replace trainer_controller.py line 212 with tf.Session(config=config) as sess: with:

    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    with tf.Session(config=config) as sess:

Update: I tested this today and was able to run multiple training runs concurrently on a single GPU

Sohojoe on 12 Apr 2018

👍3 🎉2

It's using my own created environment.
The solution of @Sohojoe worked for me, thanks!

r-lipton on 13 Apr 2018

Hi all. I've made a PR for this, and it will be added to the v0.5 release. https://github.com/Unity-Technologies/ml-agents/pull/1192

awjuliani on 6 Sep 2018

👍1

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lock[bot] on 3 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Small error in Tennis Examples

Porigon45 · 3Comments

Can ml-agents learn Bejeweled? One vector action requires multiple observations

tensorgpu · 3Comments

Couldn't connect to trainer

Rodnyy · 3Comments

Multiple Observations Concat Error TensorFlow

gerardsimons · 3Comments

Is it possible to resume training?

GeriBP · 3Comments