Models: [Solved] keep_checkpoint_max in model_main.py not working?

Created on 26 Mar 2019  路  5Comments  路  Source: tensorflow/models

I'm using modified model_main.py to train a faster_rcnn_inception_v2_coco model.
I modified this line:

config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir)

To add keep_checkpoint_max , save_summary_steps, log_step_count_steps

config = tf.estimator.RunConfig( model_dir = FLAGS.model_dir,
keep_checkpoint_max = 0,
save_summary_steps = 1000,
log_step_count_steps = 10)

(also tried keep_checkpoint_max = 9999

but the param keep_checkpoint_max doesn't seem to be working.
when training it only keeps the last 5 checkpoints.

System information

  • What is the top-level directory of the model you are using: tensorflow/models/blob/master/research/object_detection/model_main.py
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): only added the 3 params mentioned.
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): windows 10
  • TensorFlow installed from (source or binary): 1.13.1
  • TensorFlow version (use command below): 1.13.1
  • Bazel version (if compiling from source): N/A
  • CUDA/cuDNN version: 10
  • GPU model and memory: GTX 1070
  • Exact command to reproduce:

python.exe model_main.py --pipeline_config_path=../../Data/Config/faster_rcnn_inception_v2_coco.config --num_train_steps=100000

image

Most helpful comment

Adding keep_checkpoint_max to tf.estimator.RunConfig in model_main.py and max_to_keep to tf.train.Saver in model_lib.py worked for me, so this should work

All 5 comments

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

updated

Any update on this issue? keep_checkpoint_max flag used to work before recent updates. A fix would be appreciated.

Adding keep_checkpoint_max to tf.estimator.RunConfig in model_main.py and max_to_keep to tf.train.Saver in model_lib.py worked for me, so this should work

Adding keep_checkpoint_max to tf.estimator.RunConfig in model_main.py and max_to_keep to tf.train.Saver in model_lib.py worked for me, so this should work

Thank you! max_to_keep does the trick,

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sun9700 picture sun9700  路  3Comments

kamal4493 picture kamal4493  路  3Comments

mbenami picture mbenami  路  3Comments

jacknlliu picture jacknlliu  路  3Comments

rakashi picture rakashi  路  3Comments