Stable-baselines: PPO2 learning rate schedule

Created on 14 Oct 2019  Â·  7Comments  Â·  Source: hill-a/stable-baselines

Hi,

I have trouble finding examples using learning rate schedule with PPO2 algorithm, although it seems possible to use it :

https://github.com/hill-a/stable-baselines/blob/4a5f8d886953a94e7b0a0433e6fbe147fd11163a/stable_baselines/ppo2/ppo2.py#L309

Would have hints on how to use it, or an existing example ?

Many thanks in advance

RTFM question

All 7 comments

Hello,

two things:

  • as written in the documentation: "learning_rate – (float or callable) The learning rate, it can be a function" it is a function of the progress (from 0 to 1) and return
  • you can take a look at the rl zoo for a concrete example

EDIT: the docstring is a bit better for SAC which uses the exact same mechanism

Thank you :)

Looking to implement a Linear Schedule to the learning rate but I am receiving an AssertionError

I couldn't find any examples using schedules in the documentation. If there are some examples please point me to the location

An example of the code is shown below:

if __name__ == '__main__':

    tensorboard_log_location = '.\\tensorboard\\'

    # Register the policy, it will check that the name is not already taken
    register_policy('CustomPolicy', CustomPolicyDetailed)

    env = Env_Tester()
    env = DummyVecEnv([lambda: env])
    tensorboard_log_location = '.\\tensorboard\\'

    TIMESTEPS = 1000000
    sched_LR = LinearSchedule(TIMESTEPS, 0.005, 0.00001)

    model = PPO2(policy ='CustomPolicy', 
                 env = env, 
                 verbose = 1, 
                 vf_coef = 1.0, 
                 noptepochs = 5, 
                 ent_coef = 0.005, 
                 learning_rate = sched_LR,
                 tensorboard_log = tensorboard_log_location,
                 n_steps = 8192, 
                 nminibatches = 128)

    model.learn(total_timesteps = TIMESTEPS)

I am receiving the following error

  File "C:\Users\xxx\source\repos\Stable_Baseline_testing\PPO2_Single.py", line 101, in <module>
    model.learn(total_timesteps=1000000)
  File "F:\anaconda3\envs\envTensorflow\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 309, in learn
    self.learning_rate = get_schedule_fn(self.learning_rate)
  File "F:\anaconda3\envs\envTensorflow\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 531, in get_schedule_fn
    assert callable(value_schedule)
AssertionError

Current versions:
Python == 3.6
Tensorflow == 1.14
stable-baselines == 2.9.0

as mentioned above, please take a look at the rl zoo.

Looking to implement a Linear Schedule to the learning rate but I am receiving an AssertionError

I couldn't find any examples using schedules in the documentation. If there are some examples please point me to the location

An example of the code is shown below:

if __name__ == '__main__':

    tensorboard_log_location = '.\\tensorboard\\'

    # Register the policy, it will check that the name is not already taken
    register_policy('CustomPolicy', CustomPolicyDetailed)

    env = Env_Tester()
    env = DummyVecEnv([lambda: env])
    tensorboard_log_location = '.\\tensorboard\\'

    TIMESTEPS = 1000000
    sched_LR = LinearSchedule(TIMESTEPS, 0.005, 0.00001)

    model = PPO2(policy ='CustomPolicy', 
                 env = env, 
                 verbose = 1, 
                 vf_coef = 1.0, 
                 noptepochs = 5, 
                 ent_coef = 0.005, 
                 learning_rate = sched_LR,
                 tensorboard_log = tensorboard_log_location,
                 n_steps = 8192, 
                 nminibatches = 128)

    model.learn(total_timesteps = TIMESTEPS)

I am receiving the following error

  File "C:\Users\xxx\source\repos\Stable_Baseline_testing\PPO2_Single.py", line 101, in <module>
    model.learn(total_timesteps=1000000)
  File "F:\anaconda3\envs\envTensorflow\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 309, in learn
    self.learning_rate = get_schedule_fn(self.learning_rate)
  File "F:\anaconda3\envs\envTensorflow\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 531, in get_schedule_fn
    assert callable(value_schedule)
AssertionError

Current versions:
Python == 3.6
Tensorflow == 1.14
stable-baselines == 2.9.0

You need to pass learning_rate = sched_LR.value to PPO2. `
For example this works for me:

model = PPO2(policy ='CustomPolicy', 
                 env = env, 
                 verbose = 1, 
                 vf_coef = 1.0, 
                 noptepochs = 5, 
                 ent_coef = 0.005, 
                 learning_rate = sched_LR.value,
                 tensorboard_log = tensorboard_log_location,
                 n_steps = 8192, 
                 nminibatches = 128)

I can't get a learning rate schedule to work for DDPG either. I couldn't find examples in the documentation nor in the RL zoo.

lr_schedule = LinearSchedule(total_steps, final_p=1e-4, initial_p=1e-2)
self.agent = DDPG(MlpPolicy, env,
                          actor_lr=lr_schedule.value,
                          critic_lr=lr_schedule.value
                          )

gives me TypeError: unsupported operand type(s) for *: 'method' and 'float'; also when just passing lr_schedule without .value.
This makes sense, since the docs say it expects a float. But how would I use a learning rate schedule then?

I can't get a learning rate schedule to work for DDPG either. I couldn't find examples in the documentation nor in the RL zoo.

The learning rate schedule is not available for all algorithms (as you mentioned, in the doc actor_lr and critic_lr are floats for DDPG). I would recommend to use SB3 (https://github.com/DLR-RM/stable-baselines3) which is more consistent and has that feature for all algorithms (or use TD3 which is an improved DDPG).

Was this page helpful?
0 / 5 - 0 ratings