Ml-agents: Memory Leak on Linux

Created on 5 Nov 2018  路  22Comments  路  Source: Unity-Technologies/ml-agents

Hi,

I have been using Unity ml-agents on my Mac to create environments. I then build the environment for Linux and upload it to git in order to pull the environment down on a Linux machine. When I use the environment on the Linux machine, I run the learn.py algorithm, but it continues to use more and more memory until it runs out and it crashes.

I am running on Ubuntu 16.04 with ml-agents 0.4.0b and Unity 2017.2.0f1.

I know this was an issue with textures in the past, but the version of ml-agents that I have is after this issue was fixed.

bug

Most helpful comment

Located the problem. Will see if I can get to the bottom of this and submit a PR.

All 22 comments

My ML version 0.5 is also has this problem in Linux. The memory increase, increase ....

This is a known issue that is on our to fix bug list.

Resources.UnloadUnusedAssets();

@xiaomaogy has fix bug on Version 0.6 ?

@arixlin Not yet.

@xiaomaogy Thanks!

any pointers to what the cause of this is ?

In my opinion it could either be an issue on Unity's end with the visual observations, or an issue with Tensorflow itself. I have seen a few things saying that Tensorflow has a memory leak like this. I don't have the issue with vector observations though, so I'm not sure.

@atapley Thanks for the fast reply. Yes, I suspected TF too, but will need to do some profiling to confirm that. The unity process (executable) itself seems stable, the python side is doing horrible things. Will post updates.

Using mlagents 0.7 with TF 1.13.1 results with the same memory leak. Could be a TF usage error - not freeing memory when it should be freed from the session?

Onwards...

Why I say it may be a usage error, is that utilizing rainbow (dopamine) with the same environment does not result in the memory leak. So it can't be TF alone - well it could, but not likely ?

@xiaomaogy let me in on any knowns - perhaps including where this bug is being managed please-, would like to keep eye on progress and provide input.

From what I can tell, it's got to do with the trainer's accumulation of experiences. The trainer is the only thing that I see that is holding onto the memory - I haven't located its limit yet, or where it maintains the amount of memory it uses.

It doesn't matter what dimensions the observations are, that just determines how quickly you'll run out of memory.

Located the problem. Will see if I can get to the bottom of this and submit a PR.

This bug is mainly being tracked on this issue. We also have a internal Trello board, which just links to this issue.

@xiaomaogy It's not the most desirable fix, but it does the job.

Ideally the training and eval would be completely decoupled, but that's a larger task than have time for at present.

This PR ultimately stops the buffer from being filled at all during evaluation, hence no leak. I believe this is desirable for efficiency.
Alternatively control flow could have been added as to clear the buffer during eval after it fills, would be fruitless though.

Ah, ignore the first commit (be5e0ea). I split some coupled changes, the patch is in 6fc56f7

Hence no leak during the evaluation. But the training will still have memory leak, am I correct? @tjad

The training wont have a leak - as far as I remember, I'll confirm that. The way in which the training works clears the buffer ,but during eval it wasn't training, and hence not clearing the buffer.

Probably why it's not been such a major issue :-)

I suspected when I looked into this issue that there may have been lack of direction where the actual memory leak was. I trained fine for hours on end. Evaluation wasn't happening as easily.

It looks like this issue has been solved, so I'm going to close it. Thanks, and feel free to reopen if needed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MarkTension picture MarkTension  路  3Comments

MarcPilgaard picture MarcPilgaard  路  3Comments

Porigon45 picture Porigon45  路  3Comments

Hongsungchan picture Hongsungchan  路  3Comments

DVonk picture DVonk  路  3Comments