Models: No variables to save error using tensorflow object detection API

Created on 20 Jan 2018 · 48Comments · Source: tensorflow/models

System information

What is the top-level directory of the model you are using:/tensorflow/models/research
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.3 LTS"
TensorFlow installed from (source or binary):
binary
TensorFlow version (use command below):
('v1.3.0-rc1-7384-gc9437c1', '1.6.0-dev20180119')
Bazel version (if compiling from source): Not Applicable
CUDA/cuDNN version: Not Applicable
GPU model and memory: Not Applicable
Exact command to reproduce:

python object_detection/train.py --logtostderr
--pipeline_config_path=train_mydata/models/model/ssd_mobilenet_v1_coco.config
--train_dir=train_mydata/models/train

Describe the problem

Successfully created necessary files for training my custom data as described at https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md

Executing the above command complained about absence of tkinter, which is installed with apt-get install.Updated the tensorflow to latest nightly build.

Getting 'No Variables to save' after running above training

Source code / logs

---Config file

  SSD with Mobilenet v1 configuration for MSCOCO Dataset.


model {
  ssd {
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
          anchorwise_output: true
        }
      }
      localization_loss {
        weighted_smooth_l1 {
          anchorwise_output: true
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 15
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "/tensorflow/models/research/train_mydata/data/mobilenet_v1_1.0_224.ckpt"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 300
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "/tensorflow/models/research/train_mydata/data/train.record"
  }
  label_map_path: "/tensorflow/models/research/train_mydata/data/object_label.pbtxt"
}

eval_config: {
  num_examples: 160
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/tensorflow/models/research/train_mydata/data/test.record"
  }
  label_map_path: "/tensorflow/models/research/train_mydata/data/object_label.pbtxt"
  shuffle: false
  num_readers: 1
  num_epochs: 1
}

Errors

Traceback (most recent call last):
File "object_detection/train.py", line 163, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "object_detection/train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "build/bdist.linux-x86_64/egg/object_detection/trainer.py", line 255, in train
init_saver = tf.train.Saver(available_var_map)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1288, in __init__
self.build()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1297, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1322, in _build
raise ValueError("No variables to save")
ValueError: No variables to save

Source

ddurgaprasad

😕9

Most helpful comment

Solved for me using the solution from @NinjaWendy for training ssd_mobilenet_v1_fpn_coco with the stock config file that did not contain a from_detection_checkpoint line.

To my config file I added the line:

from_detection_checkpoint: true

Immediately after the fine_tune_checkpoint line. Worked fine.

joskaaaa on 21 Dec 2018

👍21 ❤8 🎉3

All 48 comments

Any ideas @jch1 @tombstone ?

tatatodd on 26 Jan 2018

Getting the same issue. I was able to bypass the error by using other models (faster_rcnn_inception_v2_coco_2017_11_08).

HugoCMU on 4 Feb 2018

I'd love to bypass this error by using other models but other models give another error, for which there is already an issue:

Expected int32, got range(0, 3) of type 'range' instead.

So, I've so far been unsuccessful at using ANY model with TF object detection.

jazoom on 15 Mar 2018

Solution for issue #3443 might help you, @jazoom.

TheFlashover on 15 Mar 2018

👍1

resnet_v1_101，按照 #3443 解决了“Expected int32, got range(0, 3) of type 'range' instead”的错误，但是出现了“ValueError: No variables to save”的错误。resnet_v2_101也有这些错误。

yuanyuanxiang on 24 Mar 2018

@yuanyuanxiang 所有你的问题解决了吗？求助

shuli163love on 19 Apr 2018

@shuli163love 这可能是预训练模型的问题，换一个模型就行了。虽然模型名称一样，但是日期不同也应该换着试一下。

yuanyuanxiang on 19 Apr 2018

👎4 👍2

I'm getting same errors on different models including one I trained yesterday.
yesterday btw I didn't get any errors, I terminated the training script (I'm using the stock script train.py from models/research/object_detection which doesn't stop itself) and then I presumably did something wrong and started to get those variable errors
I wrote no custom code, and was trying to train inception v2, mobilenet v1&2 models

before raise the variable exception, it gives up some warnings like this

WARNING:root:Variable [MobilenetV1/Conv2d_9_depthwise/depthwise_weights/RMSProp] is not available in checkpoint
WARNING:root:Variable [MobilenetV1/Conv2d_9_depthwise/depthwise_weights/RMSProp_1] is not available in checkpoint
WARNING:root:Variable [MobilenetV1/Conv2d_9_pointwise/BatchNorm/beta] is not available in checkpoint

which looks strange because I used different models but the mentioning of MobilenetV1 pops up every time

speaking of Stackoverflow, there's some custom code being discussed, it's hard to figure out how their solutions may be applied to the stock code.

Insertfunnylogin on 20 Apr 2018

I don't know why but setting this variable to false solved the issue for me:
from_detection_checkpoint: false

chriskraus0 on 23 May 2018

🎉3 👍3 😄2

@yuanyuanxiang is right, I met the same problem,
finally use ssd_mobilenet_v1_coco_2017_11_17 succed

hbbdwk on 27 Jun 2018

2025 should help @ddurgaprasad

from_detection_checkpoint: true should be changed to from_detection_checkpoint: false

stoneyang on 28 Jun 2018

👍4

Did anyone find any other solutions to this problem? I am trying to run train.py using the ssd_mobilenet_v1_fpn model and I am running into the same error:

ValueError: No variables to save.

There is no line from_detection_checkpoint in the ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync.config config file unfortunately. Tried adding it into the config file and run into the same error.

GeorgiaA on 29 Aug 2018

👍17 ❤1

those "from checkpoint=false" tricks never worked for me....
though, it's always useful to ensure whether your TF version matches the version that was used for exporting the model
models from the zoo was from v. 1.5 (last time i've checked) when the last TF-GPU is 1.8 or so

just run export model before training, it should help

Insertfunnylogin on 30 Aug 2018

I get the same error with ssd_mobilenet_v1_fpn_coco and ssd_resnet_50_fpn_coco as well. Also there is not any " from_detection_checkpoint " in those config files. im using ubuntu 16.4 ang tensorflow 1.9 gpu version. what can i do to bypass the error?

farshadopencv on 2 Sep 2018

The opposite happened for me. Setting from_detection_checkpoint: true helped me resolve the error. I had it to false previously. This is for ssd_resnet101_fpn_ct_coco.config

ravikantgupta9 on 7 Sep 2018

👍14

@GeorgiaA did you find a solution to the probelm? It's kind of what I'm facing too.

karansomaiah on 18 Sep 2018

@karansomaiah sadly I didn't. I couldn't find any solution to it. I was only testing out various models so I didn't spend much time fretting over the problem. If you find anything please let me know.

GeorgiaA on 18 Sep 2018

I was getting same error from past few hours. I started again with clean tensorflow repository and custom scripts as mentioned in the following blog and it worked.
https://becominghuman.ai/tensorflow-object-detection-api-tutorial-training-and-evaluating-custom-object-detector-ed2594afcf73

Earlier it was not detecting model.ckpt as there is no file in the folder with this name but somehow it's working now.
Value of variable in config file is
fine_tune_checkpoint="path_to_model_dir/model.ckpt"

goravkaul on 20 Sep 2018

👍2

Hey @goravkaul thanks for your reply. I did get it working by changing the value of "from_detection_checkpoint" to "true". Thank you for your suggestions.

@GeorgiaA I did get it working by changing the above values in the config file. Let me know if anyone is still facing issues. Also, I was facing this specifically with the fpn models in ssd only. So I'm training them with the train.py files in the legacy folder rather than the model_main.py

If you're using multiple GPUs, make sure to change "sync_replicas" to "true" as well.

karansomaiah on 20 Sep 2018

@ravikantgupta9 Same here. I was using ssd_resnet50_v1_fpn and setting from_detection_checkpoint to true resolved it. I added it separately as it wasn't there in the config file.

shivaniNK8 on 28 Sep 2018

I got it working for
from_detection_checkpoint: true

For faster_rcnn_inception_resnet_v2_atrous_oid.config for pre training that model

nareshmungpara on 20 Nov 2018

👍5

@karansomaiah @nareshmungpara Hello! I'm DS
.
I use the model "ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config"
And met the same error before.
.
Could you tell me where is the command "from_detection_checkpoint" ?
I edit the config file "ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config"
But I can't find the command
.
Thank you and wish you have a nice day
.
Cheers
.
DS

dswang81825 on 30 Nov 2018

I received the same error using Mobilenet_V2. Within the config file there was no from_detection_checkpoint line. I have added the following (addition is in the middle) and it worked:

fine_tune_checkpoint: "/home/Machine_Learning_Verstening/scripts/training/model.ckpt"
from_detection_checkpoint: true
#fine_tune_checkpoint_type: "detection"

NinjaWendy on 11 Dec 2018

👍5 😄1

Solved for me using the solution from @NinjaWendy for training ssd_mobilenet_v1_fpn_coco with the stock config file that did not contain a from_detection_checkpoint line.

To my config file I added the line:

from_detection_checkpoint: true

Immediately after the fine_tune_checkpoint line. Worked fine.

joskaaaa on 21 Dec 2018

👍21 ❤8 🎉3

@NinjaWendy has addressed this thread well, concerning 'detection-checkpoint'.

msymp on 5 Jan 2019

Training ssd_mobilenet_v1_fpn with gpu, same problem.
Add from_detection_checkpoint: true after checkpoint path in pipeline.config, and if you have an error
valueerror no variables to save train.py
make this setting replicas_to_aggregate: 1

xtianhb on 18 Feb 2019

👍1

I don't know why but setting this variable to false solved the issue for me:
from_detection_checkpoint: false

I'm on the contrary, I try to turn this to false and raise the "No variables to save" error.

rocknamx8 on 22 Mar 2019

Adding following line below fine_tune_checkpoint solved me.
from_detection_checkpoint: true

train_config: { fine_tune_checkpoint: "path/model.ckpt" from_detection_checkpoint: true

prempatra on 11 Apr 2019

have the same err when runs for model faster_rcnn_resnet152_coco and get ValueError: No variables to save, tried change from_detection_checkpoint: true, but doesn't work. the solution is better switch to another model or the same model but different released date

Chloejay on 14 Apr 2019

@karansomaiah @nareshmungpara Hello! I'm DS
.
I use the model "ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config"
And met the same error before.
.
Could you tell me where is the command "from_detection_checkpoint" ?
I edit the config file "ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config"
But I can't find the command
.
Thank you and wish you have a nice day
.
Cheers
.
DS

just add it manual ! you will find it can be solved!

chaogeSDK on 26 Apr 2019

@karansomaiah did you get it working for ssd_mobilenet_v1_fpn model? Can you share which lines in the config file you changed? I manually added the from_detection_checkpoint : true but it still doesn't work.

zawad59 on 17 Jul 2019

Did anyone find any other solutions to this problem? I am trying to run train.py using the ssd_mobilenet_v1_fpn model and I am running into the same error:

ValueError: No variables to save.

There is no line from_detection_checkpoint in the ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync.config config file unfortunately. Tried adding it into the config file and run into the same error.

Add the below line in the lin 157 of the pipeline.config in {"ssd_mobilenet_v1_fpn model"}
---> from_detection_checkpoint : true

Zeeshan75 on 19 Jul 2019

@karansomaiah did you get it working for ssd_mobilenet_v1_fpn model? Can you share which lines in the config file you changed? I manually added the from_detection_checkpoint : true but it still doesn't work.

Add the below line in the lin 157 of the pipeline.config in {"ssd_mobilenet_v1_fpn model"}
---> from_detection_checkpoint : true

Zeeshan75 on 19 Jul 2019

👍2

ValueError: No variables to save
i am facing the no variable to save issue. anyone have solution for this??

MahmoodGhouri001 on 17 Sep 2019

Same problem. I tried adding "from_detection_checkpoint: false"- it didn't work, I tried adding "from_detection_checkpoint: true"- it started stepping but didn't work eventually.
Using tensorflow-gpu == 1.14 on ssd_mobilenet_v2_oid_v4_2018_12_12 model.

Any solution would be great!

Shaul-Z on 6 Dec 2019

i had the same error and after two days i finally found a solution. look at the guy's comment in youtube window. hope it will work for you!

nika9774 on 12 Dec 2019

👍5

Hey @goravkaul thanks for your reply. I did get it working by changing the value of "from_detection_checkpoint" to "true". Thank you for your suggestions.

@GeorgiaA I did get it working by changing the above values in the config file. Let me know if anyone is still facing issues. Also, I was facing this specifically with the fpn models in ssd only. So I'm training them with the train.py files in the legacy folder rather than the model_main.py

If you're using multiple GPUs, make sure to change "sync_replicas" to "true" as well.

where is sync_replicas

Data-drone on 3 Jan 2020

@nika9774 , THANK YOU!!!!!

SirPhemmiey on 6 Jan 2020

i had the same error and after two days i finally found a solution. look at the guy's comment in youtube window. hope it will work for you!

hey! that was my solution.. i was 2 days stuck in that problem. My question is why? why did we have to comment this out to make it work?

Vinmel24 on 12 Jan 2020

@nika9774 thank you!!!!!.. your answer was my solution. My question is why did we have to comment this line out in order to make it work?... the training process is supposed to used this pretrained model? isn't?
Im asking so, because as far as know, the accuracy of our result will be better if we use a pretrained file.

Vinmel24 on 12 Jan 2020

Hey guys, truly @nika9774's solution will fix this issue for you but it's a short-lived solution that will come back to haunt you as you proceed. The best thing is to know what's wrong and look for the actual solution.

Check that the name attribute in your label_map.txt matches your XML files among other things. If you have more than class, see that you specify it in your label.txt and also in your <model>.config file.

I also initially removed it and i was able to pass this stage. But in the long run, I had other issues to solve and had to uncomment it for me to finish all the process.

Do NOT comment out from_detection_checkpoint prop

SirPhemmiey on 12 Jan 2020

👍4

@SirPhemmiey i am training right now with the fine_tune_checkpoint commented out. I will check the results, then i'll write the results here..
My question is what is the use of fine_tune_checkpoint command.?
I've been checking around and the say that the traininig can be done directly from the scratch or you can update the weights of the neurons of a pretrained neural network. So, basically what i think fine_tune_checkpoint does is link the training to a pretrained file to modify the its weights values based on your custom data.
im not sure if im right....

Vinmel24 on 12 Jan 2020

Yes, you're correct.

SirPhemmiey on 12 Jan 2020

@SirPhemmiey i found the problem. The file i was using the train my own model was different than the configuration file of the pretained model, the one i put in fine_tune_checkpoint command. So, i checked the file and i used it with the proper modifications (paths, batch size, num_classes....) and now it's working properly. !! thank you for your suggestion.... @nika9774 i hope this info helps u.

Vinmel24 on 12 Jan 2020

👍1

That's a good one man!
I'm glad you dug it out. Good luck in your ML project :)

SirPhemmiey on 12 Jan 2020

Does anyone managed to solve this problem besides trying to the solutions above?
I tried to use from_detection_checkpoint = False/True and fine_tune_checkpoint_type = "detection"but, in severals weight-config combinations, I'm facing this same problem. Don't have anymore clues of solvings this besides going deep on code (which certainly I will lost so, so many time on). Anyone who managed to solve this, please, help. It's happening on so many combinations.

Parameters I tried:

def start_training(f):
    return subprocess.run(["python", f"{OD_PATH}/train.py",
                                "--train_dir", train_path,
                                "--pipeline_config_path", pipeline_file_path],
                                stdout=f, stderr=f)

with open(train_logs_file, "wb") as f:
    proc_result = start_training(f)
    if proc_result.returncode != 0:
        pipeline_config.train_config.from_detection_checkpoint = False
        proc_result = start_training(f)
    if proc_result.returncode != 0:
        pipeline_config.train_config.from_detection_checkpoint = True
        proc_result = start_training(f)
    if proc_result.returncode != 0:
        pipeline_config.train_config.fine_tune_checkpoint_type = "detection"
        proc_result = start_training(f)

with open(f"{current_logs_dir}/general-train-results.txt", "a") as f:
    if proc_result.returncode != 0:
        f.write(f"problem during train of model {weight_file_name} using configs {pipeline_file_name}\n")
    else:
        f.write(f"succesfully trained model {weight_file_name} using configs {pipeline_file_name}\n")

Combinations I checked, until now:

succesfully trained model faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28 using configs faster_rcnn_inception_resnet_v2_atrous_coco.config
succesfully trained model faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco_2018_01_28 using configs faster_rcnn_inception_resnet_v2_atrous_coco.config
problem during train of model faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28 using configs faster_rcnn_inception_resnet_v2_atrous_cosine_lr_coco.config
problem during train of model faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco_2018_01_28 using configs faster_rcnn_inception_resnet_v2_atrous_cosine_lr_coco.config
problem during train of model faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28 using configs faster_rcnn_inception_resnet_v2_atrous_oid_v4.config
problem during train of model faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco_2018_01_28 using configs faster_rcnn_inception_resnet_v2_atrous_oid_v4.config
problem during train of model faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28 using configs faster_rcnn_inception_resnet_v2_atrous_oid.config
problem during train of model faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco_2018_01_28 using configs faster_rcnn_inception_resnet_v2_atrous_oid.config
succesfully trained model faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28 using configs faster_rcnn_inception_resnet_v2_atrous_pets.config
succesfully trained model faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco_2018_01_28 using configs faster_rcnn_inception_resnet_v2_atrous_pets.config
succesfully trained model faster_rcnn_inception_v2_coco_2018_01_28 using configs faster_rcnn_inception_resnet_v2_atrous_pets.config
succesfully trained model faster_rcnn_inception_v2_coco_2018_01_28 using configs faster_rcnn_inception_v2_coco.config
succesfully trained model faster_rcnn_inception_v2_coco_2018_01_28 using configs faster_rcnn_inception_v2_pets.config
succesfully trained model faster_rcnn_nas_coco_2018_01_28 using configs faster_rcnn_nas_coco.config
succesfully trained model faster_rcnn_nas_lowproposals_coco_2018_01_28 using configs faster_rcnn_nas_coco.config
problem during train of model faster_rcnn_resnet101_coco_2018_01_28 using configs faster_rcnn_resnet101_atrous_coco.config
problem during train of model faster_rcnn_nas_lowproposals_coco_2018_01_28 using configs faster_rcnn_resnet101_atrous_coco.config
problem during train of model faster_rcnn_resnet101_coco_2018_01_28 using configs faster_rcnn_resnet101_ava_v2.1.config
problem during train of model faster_rcnn_nas_lowproposals_coco_2018_01_28 using configs faster_rcnn_resnet101_ava_v2.1.config
succesfully trained model faster_rcnn_resnet101_coco_2018_01_28 using configs faster_rcnn_resnet101_coco.config
problem during train of model faster_rcnn_nas_lowproposals_coco_2018_01_28 using configs faster_rcnn_resnet101_coco.config
problem during train of model faster_rcnn_resnet101_coco_2018_01_28 using configs faster_rcnn_resnet101_fgvc.config
problem during train of model faster_rcnn_nas_lowproposals_coco_2018_01_28 using configs faster_rcnn_resnet101_fgvc.config
succesfully trained model faster_rcnn_resnet101_coco_2018_01_28 using configs faster_rcnn_resnet101_kitti.config
problem during train of model faster_rcnn_nas_lowproposals_coco_2018_01_28 using configs faster_rcnn_resnet101_kitti.config
succesfully trained model faster_rcnn_resnet101_coco_2018_01_28 using configs faster_rcnn_resnet101_pets.config
succesfully trained model faster_rcnn_nas_lowproposals_coco_2018_01_28 using configs faster_rcnn_resnet101_pets.config
succesfully trained model faster_rcnn_resnet101_coco_2018_01_28 using configs faster_rcnn_resnet101_voc07.config
problem during train of model faster_rcnn_nas_lowproposals_coco_2018_01_28 using configs faster_rcnn_resnet101_voc07.config
succesfully trained model faster_rcnn_resnet50_coco_2018_01_28 using configs faster_rcnn_resnet50_coco.config
succesfully trained model faster_rcnn_resnet50_lowproposals_coco_2018_01_28 using configs faster_rcnn_resnet50_coco.config
problem during train of model faster_rcnn_resnet50_coco_2018_01_28 using configs faster_rcnn_resnet50_fgvc.config
problem during train of model faster_rcnn_resnet50_lowproposals_coco_2018_01_28 using configs faster_rcnn_resnet50_fgvc.config
succesfully trained model faster_rcnn_resnet50_coco_2018_01_28 using configs faster_rcnn_resnet50_pets.config
succesfully trained model faster_rcnn_resnet101_coco_2018_01_28 using configs rfcn_resnet101_coco.config
succesfully trained model faster_rcnn_resnet101_lowproposals_coco_2018_01_28 using configs rfcn_resnet101_coco.config
succesfully trained model faster_rcnn_resnet101_coco_2018_01_28 using configs rfcn_resnet101_pets.config
succesfully trained model faster_rcnn_resnet101_lowproposals_coco_2018_01_28 using configs rfcn_resnet101_pets.config
succesfully trained model ssd_inception_v2_coco_2018_01_28 using configs ssd_inception_v2_coco.config
succesfully trained model ssd_inception_v2_coco_2018_01_28 using configs ssd_inception_v2_pets.config
problem during train of model ssd_mobilenet_v1_0.75_depth_300x300_coco14_sync_2018_07_03 using configs ssd_mobilenet_v1_0.75_depth_300x300_coco14_sync.config
problem during train of model ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_18 using configs ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync.config
problem during train of model ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_18 using configs ssd_mobilenet_v1_0.75_depth_quantized_300x300_pets_sync.config
problem during train of model ssd_mobilenet_v1_coco_2018_01_28 using configs ssd_mobilenet_v1_300x300_coco14_sync.config
succesfully trained model ssd_mobilenet_v1_coco_2018_01_28 using configs ssd_mobilenet_v1_coco.config
succesfully trained model ssd_mobilenet_v1_coco_2018_01_28 using configs ssd_mobilenet_v1_pets.config
problem during train of model ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_18 using configs ssd_mobilenet_v1_quantized_300x300_coco14_sync.config
succesfully trained model ssd_mobilenet_v2_coco_2018_03_29 using configs ssd_mobilenet_v2_coco.config
problem during train of model ssd_mobilenet_v2_coco_2018_03_29 using configs ssd_mobilenet_v2_fpnlite_quantized_shared_box_predictor_256x256_depthmultiplier_75_coco14_sync.config
problem during train of model ssd_mobilenet_v2_coco_2018_03_29 using configs ssd_mobilenet_v2_fullyconv_coco.config
problem during train of model ssd_mobilenet_v2_coco_2018_03_29 using configs ssd_mobilenet_v2_oid_v4.config
problem during train of model ssd_mobilenet_v2_coco_2018_03_29 using configs ssd_mobilenet_v2_pets_keras.config
problem during train of model ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_18 using configs ssd_mobilenet_v2_quantized_300x300_coco.config
problem during train of model ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03 using configs ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config
problem during train of model ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03 using configs ssdlite_mobilenet_edgetpu_320x320_coco_quant.config
problem during train of model ssd_mobilenet_v2_coco_2018_03_29 using configs ssdlite_mobilenet_edgetpu_320x320_coco.config
succesfully trained model ssdlite_mobilenet_v2_coco_2018_05_09 using configs ssdlite_mobilenet_v2_coco.config

I didn't check if all these problems are related to the same error, but every log I checked (about 7..8..) it was.

If anyone is interested in the script I ran these combinarions, here it is. The combinations file.

Usage:

python train.py --input-images ./data/train/images/ --input-annotations ./data/train/annotations-4/ --num-steps 1 --config-weights-relation-file config-weights-relation.csv

denisb411 on 21 Feb 2020

I was getting this error and like other people I couldn't find an option to change the from_detection_checkpoint = False/True but found instead that i had a checkpoint file in my training folder. When I deleted the checkpoint file the error went away and a new one is created when you start training. Hope this helps someone else!