Stable-baselines: Question about training a model after changing the environment ?

Created on 11 Dec 2019 · 1Comment · Source: hill-a/stable-baselines

I'm using stable baselines to train a policy in an environment. My setting is that I train a TRPO policy on the environment for N timesteps, update the environment (the dynamics slightly changes) and now i rerun the training. I'm worried if the internal states of the Adam optimizer used to train the TRPO policy is not being reset when I call learn() again. Can someone confirm if this would be a problem ? Since the distribution of states in the updated environment changes the second time I run learn(), shouldn't the optimizer's internal states be reset especially when the optimizer heavily relies on momentum ? This isn't a bug necessarily, but i'm concerned about the way i'm currently using the learn() function for my work. I'd appreciate any thoughts on this, Thanks ! :)

Code example

model = TRPO(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=25000)
env.update()
model.learn(total_timesteps=25000)

question

Source

HareshKarnan

Most helpful comment

Hello,

I would first try and ask afterward ;)
If you want to reset the state of the optimizer, the best way is to load the model (the state is currently not saved, cf issue #301 ).

araffin on 11 Dec 2019

👍2

>All comments

Hello,

I would first try and ask afterward ;)
If you want to reset the state of the optimizer, the best way is to load the model (the state is currently not saved, cf issue #301 ).

araffin on 11 Dec 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings