Ml-agents: Hyperparameters

Created on 14 Mar 2018  路  6Comments  路  Source: Unity-Technologies/ml-agents

I read through the best practices page on hyperparameters, but I'm still not sure how I should be thinking about adjusting these. Specifically, how should I think about buffer/batch size? The best practices page says that increasing the buffer size will lead to "more stable" training. What does this mean though? Slower? More likely to actually find a good policy? Less likely to try new policies (I thought this was what entropy was for).

I guess what I'm asking for is a more in-depth discussion of each of the hyperparameters and how to think about adjusting them.

discussion

Most helpful comment

Tweaking hyperparameters is tricky, because they work in conjunction with each other and with the network architecture. So there's likely no straightforward recipe for tuning individual parameters to achieve a specific behavior.
Here's my attempt to automate hyperparameter optimization:
https://github.com/mbaske/ml-agents-hyperparams

All 6 comments

I, too, would like to see a more thorough description of the hyperparameters. In particular, it would help to have a detailed description of the relationship between episode, experience, step, epoch, batch, buffer, and other parameters.

If there was a simple flowchart (like this one) of the training process that showed when the experience counter gets incremented, when the network weights get updated, when data gets saved, etc., it would really help guide people's intuition about how to set the hyperparameters.

Tweaking hyperparameters is tricky, because they work in conjunction with each other and with the network architecture. So there's likely no straightforward recipe for tuning individual parameters to achieve a specific behavior.
Here's my attempt to automate hyperparameter optimization:
https://github.com/mbaske/ml-agents-hyperparams

Hi all, as mentioned above, the effects of hyperparameters can often be complex, and intuition unfortunately only goes so far. That being said, we are always working to improve our documentation, so if there is some aspect of our algorithm which might be clearer in a way that would benefit end-users, we'd be happy to explain it better.

@mbaske This project seems really great. For reasons that you know all too well, something like the project you are working on would likely benefit many users who want to quickly do many experiments and find the best results. If you are interested, I would be happy to chat some about how it might be possible to adapt what you are working on to provide hyperparameter search for a wider group of our users. We are currently working to build out a more robust/scalable training architecture, and a search feature like yours could fit in nicely.

Thanks Arthur, I'd be happy to contribute. Let me know how I can help. Email is probably best, since I'm 9 time zones away from you guys.

Thank you for the discussion. We are closing this issue due to inactivity. Feel free to reopen it if you鈥檇 like to continue to discussion though.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings