I'm trying to change learning rate of policies in training loop,
while True
rest=trainer.train()
#here I want to change learning rate based on environment statistics
I tried to use reset_config function, but it doesn't work
def gen_policy(GENV,lr):
config = {
"model": {
"custom_model": 'GomokuModel',
"custom_options": {"use_symmetry": True, "reg_loss": 0},
},
"cur_lr":lr,
"custom_action_dist": Categorical,
}
return (None, GENV.observation_space, GENV.action_space, config)
new_config = trainer.get_config()
new_config['multi_agent']['policy_0']=gen_policy(GENV,0.0123)
resss=trainer.reset_config(new_config)
Hi, can you try this? I've been able to change the LR of a PPO policy in this manner. Keep in mind that the policy you're using almost certainly has an LR schedule, so that's why I end up setting the schedule with the value that I want.
Thank you very much, @kiddyboots216 ! Your Idea of changing learning rate schedule was very helpful! I changed policy generation function:
def gen_policy(GENV,lr=0.005):
config = {
"model": {
"custom_model": 'GomokuModel',
"custom_options": {"use_symmetry": True, "reg_loss": 0},
},
"cur_lr": lr,
"lr_schedule": [[lr]],
"custom_action_dist": Categorical,
}
and in the loop after each training I do
new_config = trainer.get_config()
new_config['multiagent']['policies']['policy_0'] = gm.gen_policy(GENV, lr=0.0123)
trainer._setup(new_config)
Strangely, even though during training
while True:
rest=trainer.train()
print("First policy learning rate={}".format(rest['info']['learner']['policy_0']['cur_lr']))
print("Second policy learning rate={}".format(rest['info']['learner']['policy_1']['cur_lr']))
learning rate that is printed is exactly what I want it to be, somehow actual learning rate didn't change. I tested the approach on 5-in-a-row game, and set learning rate of first player to 0. I expected that second player wins almost all of the games (as play of first player remains random), but actually after hundred iterations first player learned how to win (despite its learning rate was set to 0)
As I need only to turn on or off training, I found a workaround
new_config = trainer.get_config()
new_config['multiagent']["policies_to_train"] = ["policy_0"]
trainer._setup(new_config)
And this actually works as expected.
I suppose that learning rate is set to each worker on the stage of worker initialization. Maybe each worker holds Tensorflow graph that we can't change but reinit all workers.
It seems the best choice is do such way if you want to change some hyper-parameters
```
state = trainer.save()
trainer.stop()
trainer.restore(state)
Most helpful comment
Hi, can you try this? I've been able to change the LR of a PPO policy in this manner. Keep in mind that the policy you're using almost certainly has an LR schedule, so that's why I end up setting the schedule with the value that I want.