Stable-baselines: Possibility to resume training

Created on 17 Feb 2020  路  8Comments  路  Source: hill-a/stable-baselines

Is there a way to resume training, for example if our PC crashes or we face memory issues?

Could saving the "model" object as a pickle file in every step and using the "learn" function a way to resume a training? (if so, can I make a pull request for it)

question

Most helpful comment

It would not be as easy, as models contain a bunch of un-pickable objects and Tensorflow variables are not included in the pickling process by default. Also, as mentioned earlier, the learn method does some initializations upon every call to it, which could also cause differences, not to mention all the non-determinism that could spur up.

We could design next version of stable-baselines to support this "continue as if it was never stopped" behavior, where it should be easier with eager-type of computations and graphs.

Edit: Yup, pickling or serialization in general picks specific variables to store for this reason.

All 8 comments

I am not sure if I follow here. Yes, you can save models at any point of training (via callbacks), load models and resume training, as shown in this example.

So I figured out that with the callback function we can save the model parameters, I was just not completely sure if the training will resume or continue completely similar to if it wouldn't have been stopped in the first place. I resume training model which was saved with the "save" function, like this:

model = SAC.load("sac_model")
model.env = some_physics_based_gym_environment()
model.learn(total_timesteps=50000, callback=resume_callback)

Does this continue the training exacltly as if it was not stopped in the first place at that point when it was saved?
and my final question was also that if the tensorboard log also continues as it was, or if it gets reinitialized without showing the history of the earlier training.

thanks.

Ah yes, this is a valid question.

Answer is no, no it does not continue _exactly_ as without saving and loading. Most notably, optimizer parameters are not stored along the model, and schedulers for learning rates and such start from zero again upon new call to learn.

As for Tensorboard being updated: I have not tried this, but others seem to have issues with (e.g. #599). I am not sure how the code is supposed to function in this case when you re-use the same name.

I think the only way is saving the complete model object perhaps right? then probably with some changes in the "learn" function, one can resume from a previously learning process. Do you think this could be a pull request to make?

Edit:
Saving the complete model object doesn't seem very easy with pickle, it gives this error:
TypeError: can't pickle _thread.lock objects

It would not be as easy, as models contain a bunch of un-pickable objects and Tensorflow variables are not included in the pickling process by default. Also, as mentioned earlier, the learn method does some initializations upon every call to it, which could also cause differences, not to mention all the non-determinism that could spur up.

We could design next version of stable-baselines to support this "continue as if it was never stopped" behavior, where it should be easier with eager-type of computations and graphs.

Edit: Yup, pickling or serialization in general picks specific variables to store for this reason.

Related #301

This example maybe help you

closing this one in favor of #301

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maystroh picture maystroh  路  3Comments

HareshKarnan picture HareshKarnan  路  3Comments

sahilgupta2105 picture sahilgupta2105  路  3Comments

RyanRizzo96 picture RyanRizzo96  路  3Comments

stefanbschneider picture stefanbschneider  路  3Comments