Stable-baselines: Continuing training on a previous trained model

Created on 4 Dec 2019 · 11Comments · Source: hill-a/stable-baselines

Hi, I have trained an agent using PPO2 for 10000 steps and saved the model . I feel that the model can be improved by letting it train for more episodes. So I want to load this model and continue training on the loaded model which is already trained for 10000 steps. I have gone through the documentation but could find anything related to this. Is a feature available currently in stable baselines for this?

RTFM question

Source

venkatesh-chinni

Most helpful comment

I have tried with

model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
             learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
             nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
             verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
             policy_kwargs=None, full_tensorboard_log=False)

model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019")

model = PPO2.load("agent_03_12_2019")

n_cpu = 8
env = DummyVecEnv([lambda: AHUenv() for i in range(n_cpu)])
model.set_env(env)


model.learn(total_timesteps=2000000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019_continued_training_1")

With this the continual training is happening but the tensor board graphs are being updated. I have manually changed the reset_num_timesteps to False. but still the tensorboard graphs are not updated.

venkatesh-chinni on 28 Dec 2019

👍4

All 11 comments

Please read the documentation more carefully ;)
You have an example of what you are looking for here: https://stable-baselines.readthedocs.io/en/master/guide/examples.html abd here: https://stable-baselines.readthedocs.io/en/master/guide/examples.html#continual-learning

araffin on 4 Dec 2019

I am training a model and saving it

model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
             learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
             nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
             verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
             policy_kwargs=None, full_tensorboard_log=False)

model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=True)

model.save("agent_03_12_2019")

So now if I wish to continue with the training on the same environment as earlier, I am supposed to load the model and continue with the training. Say like

model = PPO2.load("agent_03_12_2019")
model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=True)

So will this continue the training ?

venkatesh-chinni on 4 Dec 2019

okay, i need to use model.set_env(env) before model.learn. Thanks.

But one thing I found missing is , when I trained my model for first time I have integrated it with tensorbaord, now when I do the continual training, the tensorboard graphs are not updated for the new timesteps. Am I missing something ?

venkatesh-chinni on 4 Dec 2019

👍1

the tensorboard graphs are not updated for the new timesteps

again please read the doc about tensorboard integration, we cover that issue.

EDIT: you may need to set num_timesteps manually to continue properly the graphs

araffin on 4 Dec 2019

@araffin

Thanks a lot for your answer!

I must be missing something though - reading the code, I don't see how set_env makes it learn continuously. It seems like set_env in base_class.py only changes self.envs and self.env, and reading the code for Runner and learn (in particular ppo2.py), I can't find the code that makes it learn continuously instead of starting fresh.

Could you elaborate on which part of the code makes it learn continuously? Is it related to the variable _init_setup_model somewhere?

Thanks in advance!

matthew-hsr on 10 Dec 2019

I have tried with

model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
             learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
             nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
             verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
             policy_kwargs=None, full_tensorboard_log=False)

model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019")

model = PPO2.load("agent_03_12_2019")

n_cpu = 8
env = DummyVecEnv([lambda: AHUenv() for i in range(n_cpu)])
model.set_env(env)


model.learn(total_timesteps=2000000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019_continued_training_1")

venkatesh-chinni on 28 Dec 2019

👍4

I'm working on some similar code, but I am am having issues. I believe the vectorized environments are not being closed correctly or reinitialized properly:

  File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 442, in __init__
    super().__init__(env=env, model=model, n_steps=n_steps)
  File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\common\runners.py", line 19, in __init__
    self.obs[:] = env.reset()
  File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 111, in reset
    remote.send(('reset', None))
  File "C:\Anaconda3\envs\envTF1\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "C:\Anaconda3\envs\envTF1\lib\multiprocessing\connection.py", line 280, in _send_bytes
    ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
BrokenPipeError: [WinError 232] The pipe is being closed

a sampling of my code is shown below:

    env = SubprocVecEnv(env_list)

    model = PPO2(policy ='CustomPolicy', env = env, verbose = 1, 
                 vf_coef = VF_COEFF,
                 noptepochs = EPOCHS,
                 ent_coef = ENT_COEFF,
                 learning_rate = LEARNING_RATE,
                 tensorboard_log = tensorboard_log_location,
                 n_steps = NSTEPS,
                 nminibatches = MINIBATCHES)

    model.save(results_folder + run_name)

    # Training the model
    for i in range(number_training_steps):
        logname = run_name + '_' + str(i)
        model.learn(total_timesteps = int((total_timesteps/number_training_steps)),
                    reset_num_timesteps = False,
                    tb_log_name = logname)

        env.close()

        path = results_folder + logname
        model.save(path)


        if i < number_training_steps:
            env = SubprocVecEnv(env_list)
            model.load(load_path=path, env=env)

The the first training will complete, but when the model attempts to execute the learn method on the second iteration, the BrokenPipeError: [WinError 232] The pipe is being closed is thrown.

Not sure what this error means or how to resolve the problem. pointing me towards documentation or pointing out coding mistakes would be appreciated

Configuration:
python 3.6
stable-baselines: 2.8
tensorflow: 1.14

EDIT:
I resolved this problem by using model.set_env(env)

cevans3098 on 30 Dec 2019

I have tried with

model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
             learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
             nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
             verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
             policy_kwargs=None, full_tensorboard_log=False)

model.learn(total_timesteps=10000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019")

model = PPO2.load("agent_03_12_2019")

n_cpu = 8
env = DummyVecEnv([lambda: AHUenv() for i in range(n_cpu)])
model.set_env(env)


model.learn(total_timesteps=2000000,callback=None, seed=None,
            log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)

model.save("agent_03_12_2019_continued_training_1")

Hello, have you solved the problem that the tensorboard is not updated? Thanks!

yjc765 on 11 Feb 2020

@araffin I note that you mentioned:
"EDIT: you may need to set num_timesteps manually to continue properly the graphs"

Where do you set this parameter? Obviously it is not an argument to the PPO learn.

jtromans on 8 Apr 2020

I could solve the tensorboard problem. When loading the model, set tensorboard_log too:
model = PPO2.load(model_path, tensorboard_log="some_name")
After that I could see my logs again

SheilaGLZ on 28 May 2020

👍3

Hi,

my case is: if have a changing environment like a RNN network, and the reinforcment learning agent is to control whether I should mask the input. (inout - input*mask) and mask is [0,1] discrete.

So what I did is that I train RNN for some epochs, and RNN is the observation of agent.

Agent will then train by model.learn(5000), RNN is in infernce mode when model is trained

Then I go back and train RNN with model.predict(deterministic= True) for predicting the mask.

I am not sure if the model will sample from the updated RNN environment?

anguyenbus on 12 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

can any of the baseline can be used for chess? [question]

Unimax · 3Comments

[question] What does .action_probability mean for continuous spaces?

shwang · 3Comments

Tensorboard add summary image

maystroh · 3Comments

"Error: the action space must be a vector" error is not included in the env_checker

saeid93 · 3Comments

RDPG implementation ?

H2SO4T · 3Comments