Models: SSD model with mobilenet feature extractor is not converging .

Created on 11 Aug 2017 · 6Comments · Source: tensorflow/models

Hi I am trying to train SSD -mobilenet in-order to detect 13 classes. I also trained a faster rcnn -resnet101 .

My training data images have resolution of 265 * 450 . (most of them) and each class had 400 images.

Then this weird thing happened faster rcnn converged faster with batch size of 1 .

But my SSD didn't . It's not converging at all . Here I will put the two graphs ,

Purple - Faster Rcnn | resnet101
Blue - SSD | mobilenet

I didn't change the configuration scripts . Both faster rcnn with resnet101 and ssd with mobilenet are same as in the original scripts .

Then I checked the histograms . It's clearly a vanishing gradient problems . It's saturated . All my weights are in zero range .

Ex :

I have seen some people have trained ssd with mobile net for single class or 2 class object detection problems . From them I want to ask

How fast it converged ?
Do you have any validation curvers

Source

shamanez

All 6 comments

@datitran

shamanez on 11 Aug 2017

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

skye on 16 Aug 2017

👎1

if you didn't change configs then here the difference:

Faster Rcnn has batch_size 1 - https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/faster_rcnn_inception_v2_pets.config#L85
SSD MobileNet has batch_size 24 - https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_pets.config#L141

To make SSD MobileNet train faster, change 24 to 1, it will also stop eating your CPU

so it seems bug of training scripts...

@skye what do you think?

also here: https://github.com/tensorflow/models/issues/5719#issuecomment-437323963

anonym24 on 9 Nov 2018

👎2

Hi, I am also training a SS Mobile net model, can I ask if the changing the Batch_size affects the accuracy of the model or not? and how many steps does it needs in order for it to be accurate ?

Twikkxx on 4 Mar 2019

@shamanez can you advise on how you enabled the histograms from the SSD network to show up in Tensorboard? They don't show up during my training. The histograms tab is missing.
Screen Shot 2020-04-21 at 9 03 35 PM

Am using the following command to train:

python object_detection/model_main.py \
      --pipeline_config_path=<> \
      --model_dir=<> \
      --log_dir=<>

tispratik on 22 Apr 2020

Figured it out. Added the following lines to get the histograms and distributions.

--- a/python/object_detection/model_lib.py
+++ b/python/object_detection/model_lib.py
@@ -334,6 +334,7 @@ def create_model_fn(detection_model_fn, configs, hparams, use_tpu=False):
         for var in optimizer_summary_vars:
           tf.summary.scalar(var.op.name, var)
       summaries = [] if use_tpu else None
+      summaries = ['gradients', 'gradient_norm', 'global_gradient_norm']

tispratik on 2 May 2020

Was this page helpful?

0 / 5 - 0 ratings