Hi, I have trained an agent using PPO2 for 10000 steps and saved the model . I feel that the model can be improved by letting it train for more episodes. So I want to load this model and continue training on the loaded model which is already trained for 10000 steps. I have gone through the documentation but could find anything related to this. Is a feature available currently in stable baselines for this?
Please read the documentation more carefully ;)
You have an example of what you are looking for here: https://stable-baselines.readthedocs.io/en/master/guide/examples.html abd here: https://stable-baselines.readthedocs.io/en/master/guide/examples.html#continual-learning
I am training a model and saving it
model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
policy_kwargs=None, full_tensorboard_log=False)
model.learn(total_timesteps=10000,callback=None, seed=None,
log_interval=1, tb_log_name="Logs", reset_num_timesteps=True)
model.save("agent_03_12_2019")
So now if I wish to continue with the training on the same environment as earlier, I am supposed to load the model and continue with the training. Say like
model = PPO2.load("agent_03_12_2019")
model.learn(total_timesteps=10000,callback=None, seed=None,
log_interval=1, tb_log_name="Logs", reset_num_timesteps=True)
So will this continue the training ?
okay, i need to use model.set_env(env) before model.learn. Thanks.
But one thing I found missing is , when I trained my model for first time I have integrated it with tensorbaord, now when I do the continual training, the tensorboard graphs are not updated for the new timesteps. Am I missing something ?
the tensorboard graphs are not updated for the new timesteps
again please read the doc about tensorboard integration, we cover that issue.
EDIT: you may need to set num_timesteps manually to continue properly the graphs
@araffin
Thanks a lot for your answer!
I must be missing something though - reading the code, I don't see how set_env makes it learn continuously. It seems like set_env in base_class.py only changes self.envs and self.env, and reading the code for Runner and learn (in particular ppo2.py), I can't find the code that makes it learn continuously instead of starting fresh.
Could you elaborate on which part of the code makes it learn continuously? Is it related to the variable _init_setup_model somewhere?
Thanks in advance!
I have tried with
model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01,
learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95,
nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None,
verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True,
policy_kwargs=None, full_tensorboard_log=False)
model.learn(total_timesteps=10000,callback=None, seed=None,
log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)
model.save("agent_03_12_2019")
model = PPO2.load("agent_03_12_2019")
n_cpu = 8
env = DummyVecEnv([lambda: AHUenv() for i in range(n_cpu)])
model.set_env(env)
model.learn(total_timesteps=2000000,callback=None, seed=None,
log_interval=1, tb_log_name="Logs", reset_num_timesteps=False)
model.save("agent_03_12_2019_continued_training_1")
With this the continual training is happening but the tensor board graphs are being updated. I have manually changed the reset_num_timesteps to False. but still the tensorboard graphs are not updated.
I'm working on some similar code, but I am am having issues. I believe the vectorized environments are not being closed correctly or reinitialized properly:
File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\ppo2\ppo2.py", line 442, in __init__
super().__init__(env=env, model=model, n_steps=n_steps)
File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\common\runners.py", line 19, in __init__
self.obs[:] = env.reset()
File "C:\Anaconda3\envs\envTF1\lib\site-packages\stable_baselines\common\vec_env\subproc_vec_env.py", line 111, in reset
remote.send(('reset', None))
File "C:\Anaconda3\envs\envTF1\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\Anaconda3\envs\envTF1\lib\multiprocessing\connection.py", line 280, in _send_bytes
ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
BrokenPipeError: [WinError 232] The pipe is being closed
a sampling of my code is shown below:
env = SubprocVecEnv(env_list)
model = PPO2(policy ='CustomPolicy', env = env, verbose = 1,
vf_coef = VF_COEFF,
noptepochs = EPOCHS,
ent_coef = ENT_COEFF,
learning_rate = LEARNING_RATE,
tensorboard_log = tensorboard_log_location,
n_steps = NSTEPS,
nminibatches = MINIBATCHES)
model.save(results_folder + run_name)
# Training the model
for i in range(number_training_steps):
logname = run_name + '_' + str(i)
model.learn(total_timesteps = int((total_timesteps/number_training_steps)),
reset_num_timesteps = False,
tb_log_name = logname)
env.close()
path = results_folder + logname
model.save(path)
if i < number_training_steps:
env = SubprocVecEnv(env_list)
model.load(load_path=path, env=env)
The the first training will complete, but when the model attempts to execute the learn method on the second iteration, the BrokenPipeError: [WinError 232] The pipe is being closed is thrown.
Not sure what this error means or how to resolve the problem. pointing me towards documentation or pointing out coding mistakes would be appreciated
Configuration:
python 3.6
stable-baselines: 2.8
tensorflow: 1.14
EDIT:
I resolved this problem by using model.set_env(env)
I have tried with
model = PPO2(CustomPolicy,env,gamma=1, n_steps=132, ent_coef=0.01, learning_rate=2.5e-4, vf_coef=0.5, max_grad_norm=0.5, lam=0.95, nminibatches=4, noptepochs=4, cliprange=0.2, cliprange_vf=None, verbose=0, tensorboard_log="./03_12_2019_logs/", _init_setup_model=True, policy_kwargs=None, full_tensorboard_log=False) model.learn(total_timesteps=10000,callback=None, seed=None, log_interval=1, tb_log_name="Logs", reset_num_timesteps=False) model.save("agent_03_12_2019")model = PPO2.load("agent_03_12_2019") n_cpu = 8 env = DummyVecEnv([lambda: AHUenv() for i in range(n_cpu)]) model.set_env(env) model.learn(total_timesteps=2000000,callback=None, seed=None, log_interval=1, tb_log_name="Logs", reset_num_timesteps=False) model.save("agent_03_12_2019_continued_training_1")With this the continual training is happening but the tensor board graphs are being updated. I have manually changed the
reset_num_timestepsto False. but still the tensorboard graphs are not updated.
Hello, have you solved the problem that the tensorboard is not updated? Thanks!
@araffin I note that you mentioned:
"EDIT: you may need to set num_timesteps manually to continue properly the graphs"
Where do you set this parameter? Obviously it is not an argument to the PPO learn.
I could solve the tensorboard problem. When loading the model, set tensorboard_log too:
model = PPO2.load(model_path, tensorboard_log="some_name")
After that I could see my logs again
Hi,
my case is: if have a changing environment like a RNN network, and the reinforcment learning agent is to control whether I should mask the input. (inout - input*mask) and mask is [0,1] discrete.
So what I did is that I train RNN for some epochs, and RNN is the observation of agent.
Agent will then train by model.learn(5000), RNN is in infernce mode when model is trained
Then I go back and train RNN with model.predict(deterministic= True) for predicting the mask.
I am not sure if the model will sample from the updated RNN environment?
Most helpful comment
I have tried with
With this the continual training is happening but the tensor board graphs are being updated. I have manually changed the
reset_num_timestepsto False. but still the tensorboard graphs are not updated.