Please answer the following question for yourself before submitting an issue.
Well, there are a couple other issues #4636, #5139 discussing the same feature, but I can't find any clear instructions/solutions.
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1.md
Two questions:
Is this already possible? If so, could clear instructions for these be added to the documentation?
I'm using TF1 and particularly interested in instructions for TF1, but adding documentation for both TF1 and TF2 would be the ideal.
May be.
Right now I use TF2 and what I have modified to save the ckpt is in the "model_main_tf2.py" file by adding to the model_lib_v2.train_loop call the following: checkpoint_every_n=FLAGS.num_train_steps. This way I save the ckpt once I've finished training the steps I've selected. Also, in the file "model_lib_v2.py" the function train_loop has the parameter checkpoint_max_to_keep with a value of 7 that you can extend. I hope I helped.
+1
Also I noticed that the saved checkpoints are not corresponding the actual training steps but increment of 1,2,3,.. and so on based on the number of time checkpoint criteria is met. For the purpose of exporting the right model at the end of training, it is better to have checkpoint step corresponding to the training step.
Most helpful comment
Right now I use TF2 and what I have modified to save the ckpt is in the "model_main_tf2.py" file by adding to the
model_lib_v2.train_loopcall the following:checkpoint_every_n=FLAGS.num_train_steps. This way I save the ckpt once I've finished training the steps I've selected. Also, in the file "model_lib_v2.py" the functiontrain_loophas the parametercheckpoint_max_to_keepwith a value of 7 that you can extend. I hope I helped.