Stable-baselines: A2C hyperparameters for MuJoCo

Created on 28 Mar 2019 · 6Comments · Source: hill-a/stable-baselines

Hi,

I was wondering do you happen to have A2C's hyperparams for MuJoCo that can reproduce results close/similar to the PPO paper [PPO paper, Figure 3, and results for A2C]? or any A2C hyperparameters that work for MuJoCo?

Thanks.

question

Source

rasoolfa

Most helpful comment

Hey,
here is for now the best hyperparams found so far (using add-trpo branch in the rl baselines zoo) with stable-baselines v2.5.0 (please upgrade ;)):

HalfCheetahBulletEnv-v0:
  normalize: true
  n_envs: 8
  n_timesteps: !!float 2e6
  policy: 'MlpPolicy'
  ent_coef: 0.0
  n_steps: 32
  vf_coef: 0.5
  lr_schedule: 'linear'
  gamma: 0.99
  learning_rate: 0.0013

araffin on 29 Mar 2019

👍2

All 6 comments

Hello,

Please wait a bit or use the gail-test branch (see PR #206 ), that will be merged with master soon.
In the master branch, there is a tricky bug in A2C with continuous actions, but fortunately easy to fix (see https://github.com/hill-a/stable-baselines/pull/206/commits/689afd16f5b07d2fead1fa5e8474a8efa2826a64 for the fix)

For the hyperparameters, I would recommend you to take a look at the rl baselines zoo on the add-trpo branch. There are hyperparameters for Pybullet envs that are similar and a bit harder than the mujoco ones.
From what I remember, default hyperparameters where working quite well for A2C.

EDIT: it seems that A2C needs some hyperparameter tuning for Mujoco (I'm currenltly running some)
EDIT: the branch is now merged with master ;)

araffin on 28 Mar 2019

Hi,

Thanks.
That would be very helpful and great if you can share A2C hyperparameters when you have it. It seems A2C needs different hyperparameters for Mujoco than Atari.
Thanks again for your help.

rasoolfa on 28 Mar 2019

Hey,
here is for now the best hyperparams found so far (using add-trpo branch in the rl baselines zoo) with stable-baselines v2.5.0 (please upgrade ;)):

HalfCheetahBulletEnv-v0:
  normalize: true
  n_envs: 8
  n_timesteps: !!float 2e6
  policy: 'MlpPolicy'
  ent_coef: 0.0
  n_steps: 32
  vf_coef: 0.5
  lr_schedule: 'linear'
  gamma: 0.99
  learning_rate: 0.0013

araffin on 29 Mar 2019

👍2

Thanks a lot. Really appreciated for the update.

rasoolfa on 29 Mar 2019