Models: [Solved] keep_checkpoint_max in model_main.py not working?

Created on 26 Mar 2019 · 5Comments · Source: tensorflow/models

I'm using modified model_main.py to train a faster_rcnn_inception_v2_coco model.
I modified this line:

config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir)

To add keep_checkpoint_max , save_summary_steps, log_step_count_steps

config = tf.estimator.RunConfig( model_dir = FLAGS.model_dir,
keep_checkpoint_max = 0,
save_summary_steps = 1000,
log_step_count_steps = 10)

(also tried keep_checkpoint_max = 9999

but the param keep_checkpoint_max doesn't seem to be working.
when training it only keeps the last 5 checkpoints.

System information

What is the top-level directory of the model you are using: tensorflow/models/blob/master/research/object_detection/model_main.py
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): only added the 3 params mentioned.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): windows 10
TensorFlow installed from (source or binary): 1.13.1
TensorFlow version (use command below): 1.13.1
Bazel version (if compiling from source): N/A
CUDA/cuDNN version: 10
GPU model and memory: GTX 1070
Exact command to reproduce:

python.exe model_main.py --pipeline_config_path=../../Data/Config/faster_rcnn_inception_v2_coco.config --num_train_steps=100000

Source

JoseLuisFriedrich

👍1

Most helpful comment

Adding keep_checkpoint_max to tf.estimator.RunConfig in model_main.py and max_to_keep to tf.train.Saver in model_lib.py worked for me, so this should work

kulkarnivishal on 8 May 2019

👍3

All 5 comments

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

tensorflowbutler on 27 Mar 2019

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

updated

JoseLuisFriedrich on 30 Mar 2019

Any update on this issue? keep_checkpoint_max flag used to work before recent updates. A fix would be appreciated.

kulkarnivishal on 8 May 2019

Adding keep_checkpoint_max to tf.estimator.RunConfig in model_main.py and max_to_keep to tf.train.Saver in model_lib.py worked for me, so this should work

kulkarnivishal on 8 May 2019

👍3

Adding keep_checkpoint_max to tf.estimator.RunConfig in model_main.py and max_to_keep to tf.train.Saver in model_lib.py worked for me, so this should work

Thank you! max_to_keep does the trick,

JoseLuisFriedrich on 9 May 2019

Was this page helpful?

0 / 5 - 0 ratings