Ray: Can not change learning rate after initialization

Created on 1 Sep 2019 · 5Comments · Source: ray-project/ray

5536 # System information

**OS Platform and Distribution : Ubuntu 18.04
Ray installed from (source or binary): binary
Ray version: 0.7.3
Python version: 3.6.8
Exact command to reproduce:

I'm trying to change learning rate of policies in training loop,

while True
      rest=trainer.train()
     #here I want to change learning rate based on environment statistics

I tried to use reset_config function, but it doesn't work

def gen_policy(GENV,lr):
    config = {
        "model": {
            "custom_model": 'GomokuModel',
            "custom_options": {"use_symmetry": True, "reg_loss": 0},
        },
        "cur_lr":lr,
        "custom_action_dist": Categorical,
    }
    return (None, GENV.observation_space, GENV.action_space, config)

new_config = trainer.get_config()
new_config['multi_agent']['policy_0']=gen_policy(GENV,0.0123)
resss=trainer.reset_config(new_config)

Source

yarik1988

Most helpful comment

Hi, can you try this? I've been able to change the LR of a PPO policy in this manner. Keep in mind that the policy you're using almost certainly has an LR schedule, so that's why I end up setting the schedule with the value that I want.

kiddyboots216 on 2 Sep 2019

👍2

All 5 comments

kiddyboots216 on 2 Sep 2019

👍2

Thank you very much, @kiddyboots216 ! Your Idea of changing learning rate schedule was very helpful! I changed policy generation function:

def gen_policy(GENV,lr=0.005):
    config = {
        "model": {
            "custom_model": 'GomokuModel',
            "custom_options": {"use_symmetry": True, "reg_loss": 0},
        },
        "cur_lr": lr,
        "lr_schedule": [[lr]],
        "custom_action_dist": Categorical,
    }

and in the loop after each training I do

new_config = trainer.get_config()
new_config['multiagent']['policies']['policy_0'] = gm.gen_policy(GENV, lr=0.0123)
trainer._setup(new_config)

yarik1988 on 2 Sep 2019

Strangely, even though during training

while True:
    rest=trainer.train()
    print("First policy learning rate={}".format(rest['info']['learner']['policy_0']['cur_lr']))
    print("Second policy learning rate={}".format(rest['info']['learner']['policy_1']['cur_lr']))

learning rate that is printed is exactly what I want it to be, somehow actual learning rate didn't change. I tested the approach on 5-in-a-row game, and set learning rate of first player to 0. I expected that second player wins almost all of the games (as play of first player remains random), but actually after hundred iterations first player learned how to win (despite its learning rate was set to 0)
As I need only to turn on or off training, I found a workaround

new_config = trainer.get_config()
new_config['multiagent']["policies_to_train"] = ["policy_0"]
trainer._setup(new_config)

And this actually works as expected.

yarik1988 on 2 Sep 2019

I suppose that learning rate is set to each worker on the stage of worker initialization. Maybe each worker holds Tensorflow graph that we can't change but reinit all workers.

yarik1988 on 2 Sep 2019

It seems the best choice is do such way if you want to change some hyper-parameters
```
state = trainer.save()
trainer.stop()