I'm using modified model_main.py to train a faster_rcnn_inception_v2_coco model.
I modified this line:
config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir)
To add keep_checkpoint_max , save_summary_steps, log_step_count_steps
config = tf.estimator.RunConfig( model_dir = FLAGS.model_dir,
keep_checkpoint_max = 0,
save_summary_steps = 1000,
log_step_count_steps = 10)
(also tried keep_checkpoint_max = 9999
but the param keep_checkpoint_max doesn't seem to be working.
when training it only keeps the last 5 checkpoints.
python.exe model_main.py --pipeline_config_path=../../Data/Config/faster_rcnn_inception_v2_coco.config --num_train_steps=100000

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce
Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce
updated
Any update on this issue? keep_checkpoint_max flag used to work before recent updates. A fix would be appreciated.
Adding keep_checkpoint_max to tf.estimator.RunConfig in model_main.py and max_to_keep to tf.train.Saver in model_lib.py worked for me, so this should work
Adding keep_checkpoint_max to tf.estimator.RunConfig in model_main.py and max_to_keep to tf.train.Saver in model_lib.py worked for me, so this should work
Thank you! max_to_keep does the trick,
Most helpful comment
Adding keep_checkpoint_max to tf.estimator.RunConfig in model_main.py and max_to_keep to tf.train.Saver in model_lib.py worked for me, so this should work