Ray: Can not change learning rate after initialization

Created on 1 Sep 2019  路  5Comments  路  Source: ray-project/ray

5536 # System information

  • **OS Platform and Distribution : Ubuntu 18.04
  • Ray installed from (source or binary): binary
  • Ray version: 0.7.3
  • Python version: 3.6.8
  • Exact command to reproduce:

I'm trying to change learning rate of policies in training loop,

while True
      rest=trainer.train()
     #here I want to change learning rate based on environment statistics

I tried to use reset_config function, but it doesn't work

def gen_policy(GENV,lr):
    config = {
        "model": {
            "custom_model": 'GomokuModel',
            "custom_options": {"use_symmetry": True, "reg_loss": 0},
        },
        "cur_lr":lr,
        "custom_action_dist": Categorical,
    }
    return (None, GENV.observation_space, GENV.action_space, config)

new_config = trainer.get_config()
new_config['multi_agent']['policy_0']=gen_policy(GENV,0.0123)
resss=trainer.reset_config(new_config)

Most helpful comment

Hi, can you try this? I've been able to change the LR of a PPO policy in this manner. Keep in mind that the policy you're using almost certainly has an LR schedule, so that's why I end up setting the schedule with the value that I want.

All 5 comments

Hi, can you try this? I've been able to change the LR of a PPO policy in this manner. Keep in mind that the policy you're using almost certainly has an LR schedule, so that's why I end up setting the schedule with the value that I want.

Thank you very much, @kiddyboots216 ! Your Idea of changing learning rate schedule was very helpful! I changed policy generation function:

def gen_policy(GENV,lr=0.005):
    config = {
        "model": {
            "custom_model": 'GomokuModel',
            "custom_options": {"use_symmetry": True, "reg_loss": 0},
        },
        "cur_lr": lr,
        "lr_schedule": [[lr]],
        "custom_action_dist": Categorical,
    }

and in the loop after each training I do

new_config = trainer.get_config()
new_config['multiagent']['policies']['policy_0'] = gm.gen_policy(GENV, lr=0.0123)
trainer._setup(new_config)

Strangely, even though during training

while True:
    rest=trainer.train()
    print("First policy learning rate={}".format(rest['info']['learner']['policy_0']['cur_lr']))
    print("Second policy learning rate={}".format(rest['info']['learner']['policy_1']['cur_lr']))

learning rate that is printed is exactly what I want it to be, somehow actual learning rate didn't change. I tested the approach on 5-in-a-row game, and set learning rate of first player to 0. I expected that second player wins almost all of the games (as play of first player remains random), but actually after hundred iterations first player learned how to win (despite its learning rate was set to 0)
As I need only to turn on or off training, I found a workaround

new_config = trainer.get_config()
new_config['multiagent']["policies_to_train"] = ["policy_0"]
trainer._setup(new_config)

And this actually works as expected.

I suppose that learning rate is set to each worker on the stage of worker initialization. Maybe each worker holds Tensorflow graph that we can't change but reinit all workers.

It seems the best choice is do such way if you want to change some hyper-parameters
```
state = trainer.save()
trainer.stop()

re_initialise trainer

trainer.restore(state)

Was this page helpful?
0 / 5 - 0 ratings