Hi,
I have trouble finding examples using learning rate schedule with PPO2 algorithm, although it seems possible to use it :
Would have hints on how to use it, or an existing example ?
Many thanks in advance
Hello,
two things:
EDIT: the docstring is a bit better for SAC which uses the exact same mechanism
Thank you :)
Looking to implement a Linear Schedule to the learning rate but I am receiving an AssertionError
I couldn't find any examples using schedules in the documentation. If there are some examples please point me to the location
An example of the code is shown below:
if __name__ == '__main__':
tensorboard_log_location = '.\\tensorboard\\'
# Register the policy, it will check that the name is not already taken
register_policy('CustomPolicy', CustomPolicyDetailed)
env = Env_Tester()
env = DummyVecEnv([lambda: env])
tensorboard_log_location = '.\\tensorboard\\'
TIMESTEPS = 1000000
sched_LR = LinearSchedule(TIMESTEPS, 0.005, 0.00001)
model = PPO2(policy ='CustomPolicy',
env = env,
verbose = 1,
vf_coef = 1.0,
noptepochs = 5,
ent_coef = 0.005,
learning_rate = sched_LR,
tensorboard_log = tensorboard_log_location,
n_steps = 8192,
nminibatches = 128)
model.learn(total_timesteps = TIMESTEPS)
I am receiving the following error
File "C:\Users\xxx\source\repos\Stable_Baseline_testing\PPO2_Single.py", line 101, in <module>
model.learn(total_timesteps=1000000)
File "F:\anaconda3\envs\envTensorflow\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 309, in learn
self.learning_rate = get_schedule_fn(self.learning_rate)
File "F:\anaconda3\envs\envTensorflow\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 531, in get_schedule_fn
assert callable(value_schedule)
AssertionError
Current versions:
Python == 3.6
Tensorflow == 1.14
stable-baselines == 2.9.0
as mentioned above, please take a look at the rl zoo.
Looking to implement a Linear Schedule to the learning rate but I am receiving an AssertionError
I couldn't find any examples using schedules in the documentation. If there are some examples please point me to the location
An example of the code is shown below:
if __name__ == '__main__': tensorboard_log_location = '.\\tensorboard\\' # Register the policy, it will check that the name is not already taken register_policy('CustomPolicy', CustomPolicyDetailed) env = Env_Tester() env = DummyVecEnv([lambda: env]) tensorboard_log_location = '.\\tensorboard\\' TIMESTEPS = 1000000 sched_LR = LinearSchedule(TIMESTEPS, 0.005, 0.00001) model = PPO2(policy ='CustomPolicy', env = env, verbose = 1, vf_coef = 1.0, noptepochs = 5, ent_coef = 0.005, learning_rate = sched_LR, tensorboard_log = tensorboard_log_location, n_steps = 8192, nminibatches = 128) model.learn(total_timesteps = TIMESTEPS)I am receiving the following error
File "C:\Users\xxx\source\repos\Stable_Baseline_testing\PPO2_Single.py", line 101, in <module> model.learn(total_timesteps=1000000) File "F:\anaconda3\envs\envTensorflow\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 309, in learn self.learning_rate = get_schedule_fn(self.learning_rate) File "F:\anaconda3\envs\envTensorflow\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 531, in get_schedule_fn assert callable(value_schedule) AssertionErrorCurrent versions:
Python == 3.6
Tensorflow == 1.14
stable-baselines == 2.9.0
You need to pass learning_rate = sched_LR.value to PPO2. `
For example this works for me:
model = PPO2(policy ='CustomPolicy',
env = env,
verbose = 1,
vf_coef = 1.0,
noptepochs = 5,
ent_coef = 0.005,
learning_rate = sched_LR.value,
tensorboard_log = tensorboard_log_location,
n_steps = 8192,
nminibatches = 128)
I can't get a learning rate schedule to work for DDPG either. I couldn't find examples in the documentation nor in the RL zoo.
lr_schedule = LinearSchedule(total_steps, final_p=1e-4, initial_p=1e-2)
self.agent = DDPG(MlpPolicy, env,
actor_lr=lr_schedule.value,
critic_lr=lr_schedule.value
)
gives me TypeError: unsupported operand type(s) for *: 'method' and 'float'; also when just passing lr_schedule without .value.
This makes sense, since the docs say it expects a float. But how would I use a learning rate schedule then?
I can't get a learning rate schedule to work for DDPG either. I couldn't find examples in the documentation nor in the RL zoo.
The learning rate schedule is not available for all algorithms (as you mentioned, in the doc actor_lr and critic_lr are floats for DDPG). I would recommend to use SB3 (https://github.com/DLR-RM/stable-baselines3) which is more consistent and has that feature for all algorithms (or use TD3 which is an improved DDPG).