Stable-baselines: Regularization in baselines

Created on 18 Aug 2020  路  8Comments  路  Source: hill-a/stable-baselines

How can I do regularization (such as l1/l2, drop outs) in baselines?

duplicate question

Most helpful comment

A good reference for this is the coinrun repository (https://github.com/openai/coinrun). Maybe it would be something easy and with little side-impacts to introduce in the master code.

https://github.com/openai/coinrun/blob/523704f3a203dcaad84caf96ea92799452dc902f/coinrun/ppo2.py#L105

All 8 comments

Duplicate of #817 and #403.

See docs on custom policies. You may need to modify loss functions for L1/L2/"weight decay" regularization, and that has to be done manually to the algorithm's code.

Thx Miffyli! Could you point me to the code location where I can modify to add regularization? I'm using the MlpLstmPolicy specifically.

Something like this could do the trick, which you then add to the loss. Losses are computed in the algorithm, e.g. PPO2 here. You may close the issue if there are no other bugs/issues to raise related to stable-baselines.

A good reference for this is the coinrun repository (https://github.com/openai/coinrun). Maybe it would be something easy and with little side-impacts to introduce in the master code.

https://github.com/openai/coinrun/blob/523704f3a203dcaad84caf96ea92799452dc902f/coinrun/ppo2.py#L105

One thing I forgot to mention: This is much easier in PyTorch version of stable-baselines, where you can add L2 regularization via the weight_decay parameter to optimizers. Note to self: We should probably expose this there.

Thx! One question: is stable-baselines moving to stable-baselines3 or is stable-baselines3 just a PyTorch version of this repo?

Our main focus is now on stable-baselines3 and we plan to mostly include bug fixes and small adjustments to this library. This one will continue to exist though and we do not intent to abandon it completely, at least not until underlying libraries break (i.e. support for TF1.x ends).

Got it!

Was this page helpful?
0 / 5 - 0 ratings