Ml-agents: Hyperparameters

Created on 14 Mar 2018 · 6Comments · Source: Unity-Technologies/ml-agents

I read through the best practices page on hyperparameters, but I'm still not sure how I should be thinking about adjusting these. Specifically, how should I think about buffer/batch size? The best practices page says that increasing the buffer size will lead to "more stable" training. What does this mean though? Slower? More likely to actually find a good policy? Less likely to try new policies (I thought this was what entropy was for).

I guess what I'm asking for is a more in-depth discussion of each of the hyperparameters and how to think about adjusting them.

discussion

Source

tschmidt64

Most helpful comment

Tweaking hyperparameters is tricky, because they work in conjunction with each other and with the network architecture. So there's likely no straightforward recipe for tuning individual parameters to achieve a specific behavior.
Here's my attempt to automate hyperparameter optimization:
https://github.com/mbaske/ml-agents-hyperparams

mbaske on 28 Apr 2018

👍3

All 6 comments

I, too, would like to see a more thorough description of the hyperparameters. In particular, it would help to have a detailed description of the relationship between episode, experience, step, epoch, batch, buffer, and other parameters.

If there was a simple flowchart (like this one) of the training process that showed when the experience counter gets incremented, when the network weights get updated, when data gets saved, etc., it would really help guide people's intuition about how to set the hyperparameters.

ellerychan on 6 Apr 2018

mbaske on 28 Apr 2018

👍3

Hi all, as mentioned above, the effects of hyperparameters can often be complex, and intuition unfortunately only goes so far. That being said, we are always working to improve our documentation, so if there is some aspect of our algorithm which might be clearer in a way that would benefit end-users, we'd be happy to explain it better.

@mbaske This project seems really great. For reasons that you know all too well, something like the project you are working on would likely benefit many users who want to quickly do many experiments and find the best results. If you are interested, I would be happy to chat some about how it might be possible to adapt what you are working on to provide hyperparameter search for a wider group of our users. We are currently working to build out a more robust/scalable training architecture, and a search feature like yours could fit in nicely.