Ray: Details about the hyperparameter in PPO Algorithm?

Created on 6 Apr 2020  路  2Comments  路  Source: ray-project/ray

Hi, so I want to tune my hyperparameter for the PPO Algorithm but I've found difficulties when reading the docs about the configs, so I guess I want to ask you guys in here about:

  1. What is the value of lr_schedule in the PPO Algorithm? Suppose that my starting learning_rate is 'lr': 1e-4 and I want to decay its value to 0 when I train.
  2. Is it possible to set the hidden layer size in the PPO algorithm? If yes, what is the corresponding config as I didn't find it in the documentation (I found this kind of config in the SAC algorithm documentation but not in PPO).

Thank you very much guys! I really appreciate your help 馃槃

question

Most helpful comment

Yeah, sorry, it's not clearly documented. Here are the answers. We'll add this to the docs.
1) You are basically configuring a PiecewiseSchedule.
So lr_schedule: [[0, 0.01], [1000, 0.0005]] means that you decay from ts=0 (lr=0.01) linearly to ts=1000 (lr=0.0005). After 1000ts your learning rate will stay at 0.0005. The config key "lr" is ignored in this setting.
2) You can do e.g. config["model"]["fcnet_hiddens"] = [16, 32, 64]. Change the activation by using config["model"]["fcnet_activation"] ("tanh", "relu", or "linear").

All 2 comments

Yeah, sorry, it's not clearly documented. Here are the answers. We'll add this to the docs.
1) You are basically configuring a PiecewiseSchedule.
So lr_schedule: [[0, 0.01], [1000, 0.0005]] means that you decay from ts=0 (lr=0.01) linearly to ts=1000 (lr=0.0005). After 1000ts your learning rate will stay at 0.0005. The config key "lr" is ignored in this setting.
2) You can do e.g. config["model"]["fcnet_hiddens"] = [16, 32, 64]. Change the activation by using config["model"]["fcnet_activation"] ("tanh", "relu", or "linear").

Thank you so

Yeah, sorry, it's not clearly documented. Here are the answers. We'll add this to the docs.

  1. You are basically configuring a PiecewiseSchedule.
    So lr_schedule: [[0, 0.01], [1000, 0.0005]] means that you decay from ts=0 (lr=0.01) linearly to ts=1000 (lr=0.0005). After 1000ts your learning rate will stay at 0.0005. The config key "lr" is ignored in this setting.
  2. You can do e.g. config["model"]["fcnet_hiddens"] = [16, 32, 64]. Change the activation by using config["model"]["fcnet_activation"] ("tanh", "relu", or "linear").

Thank you so much for your help!!! It helps me a lot for my project 馃槃

Was this page helpful?
0 / 5 - 0 ratings