Models: API ObjectDetection size of input images issues

Created on 6 Jul 2017  Â·  43Comments  Â·  Source: tensorflow/models

GPU : GeForce GTX 1080 Ti/PCIe/SSE2 (11 GB)
Tensorflow version: 1.1.0
Python version: 2.7.12
Model checkpoint : ssd_mobilenet_v1_coco_11_06_2017

Context :After read the tutorial of ObjectDetection, I converted my own image data sets to tfRecord files by create the *.xml file to each image. As the sizes of images of my own data sets are very large : about width=4000pixels and height=2000pixels for each , the train.tfRecord is about 55G.

Issues: When I began to train with train.py by using the lightest model ssd_mobilenet_v1_coco_11_06_2017, after 4-5 steps, it crashed by error OOM.
The error message is below:
125
Il seems like that the OOM error happened when allocating tensor with shape[1,2969,3546,3],
As the capacity of my GPU is 11GB, I didn't understand why it causes this problem..

awaiting model gardener bug

Most helpful comment

Hi @scotthong,
I really cannot remember how I solved this problem. But I can provide some experience which may not be correct but works for me.

  1. I used fixed_shape_resizer
  2. Total loss around 4.0 is OK, try to evaluate your model and see how it works with your validation set.
  3. If image is too small after resize, it is hard to detect small objects.

With regards!
Yeephycho

All 43 comments

Hi @chenyuZha - These resolutions are problematic because we keep an entire queue of images in memory, not just the batch that you are currently training on. See e.g. the queue_capacity and min_after_dequeue parameters in https://github.com/tensorflow/models/blob/master/object_detection/protos/input_reader.proto

Though note that we typically resize in SSD immediately to 300x300. So given this, it makes sense to just resize your input images to be smaller.

@jch1
Thanks for your response !

Hi @jch1 .
I'm training SSD on my own dataset, the result I think is good but not as good as I expected, I'm trying to finetune the model to increase the performace.
I noticed that SSD resize the image to 300 by 300 using API tf.image.resize_images()
which, according to official document,

Resized images will be distorted if their original aspect ratio is not the same as size.

And I did some experiments, it's true.

So, I changed resize API to resize_image_with_crop_or_pad().
Result is that the LOSS gets very high and very hard to converge.

My question is did you take resize distortion into consideration during the development of the object detection API?
If you did, whether the mismatch of the aspect ratio of original image will affect the final result?

With thanks and regards!

@yeephycho you should first resize image and keep original aspect ratio to e.g. 300px by larger side and then you shoudl pad image to square by resizing it with resize_image_with_crop_or_pad(). Code of aspect ratio preserving resize with TF's ops you can find in magenta repository, where neural style transfer is done

Hi @yeephycho,
I am also trying to train an object detector using my own images with various sizes and aspect ratio. I tried the following two image_resizer configurations in the pipeline.config and a pre-trained model from the model zoo, and it seems that model is not converging with the TotalLoss fluctuating around 4.0. Which image_resizer configuration did you use to make the model converge faster? Or do you pre-scale the dataset and bounding boxed as suggested by @Luonic to train your object detector.

Thanks!

    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 300
        max_dimension: 300
      }
    }

Hi @scotthong,
I really cannot remember how I solved this problem. But I can provide some experience which may not be correct but works for me.

  1. I used fixed_shape_resizer
  2. Total loss around 4.0 is OK, try to evaluate your model and see how it works with your validation set.
  3. If image is too small after resize, it is hard to detect small objects.

With regards!
Yeephycho

Hi all,

Maybe someone can help me with this issue:

https://github.com/tensorflow/models/issues/3196

Regards,
Hao

Hi @yeephycho @scotthong ,
I am encountering the same problem that the TotalLoss fluctuating around a high value(4.0). And is there anything I can do to avoid small objects miss with high resolution training images?

Regards,
Lee

Hi @syndec
Try to study the network and especially the file "core/preprocessor.py". You should be able to find clues and determine if it is possible to resolve the problem you are facing.

Hello @yeephycho and @syndec ,
Is there a definition of "small objects"?
For example, if the ratio of the object's size to the whole image is 0.2, we consider it is a "small object".
I am facing the same issue, the total loss fluctuating around 4.0.
Also, the localization_loss/classification_loss is affected by the number of ground-truth in the image.
For example, the loss increases when the number of ground-truth in an image increases.
Because those ground-truth boxes are small, I wonder if I should exclude them from my training set.
I use the pre-trained SSD-InceptionV2-Coco model to train with my dataset for vehicle detection.
The performance of the model trained with my dataset is worse than the pre-trained SSD-InceptionV2-Coco.

Thank you for precious time on my question.

I am facing similar issues but only when I use the ssd_random_crop data augmentation option in the pipeline config. My total loss converges much lower when I don't use the crop preprocessing. Also, I'm finding that the position of the object in the image has a huge impact on score - the detector behaves like it has blind spots. My training images are 1920x1200 and I'm using the fixed_shape_resizer to 300x300. My training runs have about 100 images and it converges to a total loss of about 0.26 after 10K steps. When I add the ssd_random_crop the total loss is around 2.5 even after 20K steps. I've tried variations with random_crop_image as well with no luck.
edit: the same training set with ssd_random_crop produced a lower mAP, even with twice as many steps - this doesn't seem right -is there a bug in the cropping algorithm?

Hello @funkysandman ,
Thank you for sharing "ssd_random_crop" idea,
unfortunately, it does not resolve my issue.
My total loss still remains around 4.

Would you mind to share what "good data" for SSD-InceptionV2 training in your opinion?

  1. Does "resolution" matter? ( my image from Udacity is kind of blurred )
  2. Will the number of objects per image affect the Box predictor?

Because SSD-InceptionV2 has limitation to detect the small objects,
I think to provide a 300x300 image within the target object should be "big" enough to allow the model learn the features.
So I prepare the image data from Udacity:

  1. 300x300( crop the car object from Udacity image dataset )
  2. the object to be classified at least occupy 10% in the image. ( make sure the car is big enough )
  3. Only one object per image.

The total loss is close to 1.0 after 200k iteration.

But after training with this kind of data,
the trained model cannot detect any object in my evaluation set.

I am working on the root cause.
So I wonder what the good data should be for SSD-InceptionV2's training.

P.S.
My Tensflow is 1.8.0 and the Tensorflow Object Detection repo is pulled from 2018.05.03

Thank you.

Make sure your pipeline.config is matched perfectly to the model you are using. I've accidentally run training with the wrong config file. Maybe share your pipeline.config file? Also, the detection program must be tailored to the specific model as well and the images normalized properly for the detector or you may get weird results

Hello @funkysandman ,
I made a mistake that in the label map the item's "name' is capitalized but in my transfer-tfrecord script,
I use the non-capitalized character.
Now I am working on the bad performance on SSD-InceptionV2 now.

Question. Say all the train and test images from the raw data is of size (512, 7000). Can anyone who works on the object detection api tell me if it is ok to leave the image_resizer set to the 300x300 setting? Or should i change it to the following to match the dimension of my images. I'm not sure what the purpose of that piece of code is. Currently training a model with the setting as:
image_resizer {
fixed_shape_resizer {
height: 512
width: 7000
}
}

Hello @cmbowyer13 ,

If you use SSD from the model zoo,

you need either resize your image to 300x300 before feeding to the model or let the model does it for you.

Good luck!

I don't know what youre saying at willSapgreen. Why can't i do as i am doing or why cannot the model accept arbitrary sized images?

You can provide any size image for detection. I would not resize it so small that you cannot see the features you're after (this depends on what your objects are). During training the images are resized to 300x300. If the images you are using to train your model lose their features at that resolution, you can increase it to something like 600x600 in the pipeline.config file. This has an effect on training time as more memory is needed. Resizing images at detection time can help speed things up.

But what does resizing to 300x300 in the config file do to all my images during training which are of size (512x7000)? You're saying it's fine to leave it as 300x300, and then be able to detect any size image I want as long as it contains the same class of objects? And I will need to be detection matrices of raw data in real time. Are there any guidelines for doing realtime object-detection with our trained object detection model? Any good links or references are appreciated. Also, do you know of the best way to convert the model to a C++ format once it has created and performs as I like?

@cmbowyer13 I would guess you can leave it at 512x7000 for training, as long as you have the memory + gpu to handle it. As long as the model loss converges it should be ok. When it comes to using the trained model, you can give it pictures in their original size I believe. You can resize them if it is too slow.

Hi. I have a huge set of images for training that ranges in size from 300x300 to 1184x1410 (different sizes). I'm training a fastRcnn model with the pipline configuration file resizing images to 600x1024

My question is: is it better to resize all the images to 600 x1024 before training using the preprocessing script found in https://github.com/tensorflow/models/blob/f87a58cd96d45de73c9a8330a06b2ab56749a7fa/research/object_detection/core/preprocessor.py#L1426 or it's Ok to train with different sizes? and is it too hight to resize the images to 600x1024 when i have small images (300x300)

Also will this resizing affect the bounding boxes in the annotations or they will be resized accordingly?

Thanks

are you sure it resizes to 600x1024? I thought it was making sure it was resizing so that the longest side is no smaller than 600 and no bigger than 1024 . It the pipeline.config it just says min_dimension=600 max_dimension=1024 without being specific to width or height. So, your 300x300 would be resized to 600x600 and 1184x1410 would be resized to 864x1024. I could be wrong

@funkysandman Thanks for your reply, and yes, you are right it's about the min and max dimension.
my question is does this resizing affect the bounding boxes (annotations) accordingly in the training set? or do i have to preprocess them before training?

I would leave the images as they are, training should respect the bounding boxes that you specify in xml. The bounding boxes are converted to percentages i think so they're not affected by image resize

Thanks @funkysandman

I'm also questioning if it is better to resize prior to creating the bounding boxes as opposed to using larger images, sometimes much larger images, create the bounding boxes, and then allow the training to resize the photos to 300x300. In my use case, the photos are also resized in the same manner to 300x300 prior to sending them through the model for object detection in production.
Has anyone done any testing to determine which is better? Also if anyone knows of a white paper that discusses this issue please post the link.

I've tested this and it seems that whatever size I train it on is the size it works best on... when I cropped my large pics down to 300px the model was lousy at detecting in the larger images. I ended up including original and resized images to ensure better detection. I'm still experimenting. I've also found the reducing the batch size can impact the overall training accuracy. I've tried batch sizes of 5,10,12,20,24...5 seems to work pretty good for my data

I am failing to wrap my head around why resizing would be an issue to begin with unless the object is too small to detect. In all other cases, won't the distortion due to resizing be taken care of because we would be resizing the test image too ?

The original pics I'm using are 4,000 x 3,000 approximately. Drawing boundary boxes on such large images and then training works well but the training is very slow and requires a lot of memory. If I resize the same images to 300x300 and then draw the boundary boxes - like funkysandman - I found that the image detection was terrible. I ended up cropping the images to approximately 800x800, some smaller and some larger, that held my objects and then drew the boundary boxes. The objects I am looking for filled most if not all the 800x800 cropped pictures. This works well. However it seems like I need more negative space. I was thinking of including pics with just negative space (no known objects) just to help the training.

"However it seems like I need more negative space. I was thinking of including pics with just negative space (no known objects) just to help the training."

This is something that I was thinking about also, I would like to know the answer to this.

i am using labellmg for annotating the images for traffic lights for training on tensorflow using ssd_inception_coco model, how can i add color of the traffic lights while creating the xml file as same option is not available on labellmg. kindly help

@bharat77s when you create a rectangle(shortcut: w), a list of labels comes up. you can just add the name of your label there. So basically the color of the traffic light can be a label.
Alternatively, you can add your labels names in data/predefined_classes.txt such that your labels will appear in the list of labels in labelImg.
I hope that answers your question.

But is this the only way as I have to create thousands of such labels to
train my desire model

On Thu, 27 Sep 2018, 14:27 MittalNeha, notifications@github.com wrote:

@bharat77s https://github.com/bharat77s when you create a
rectangle(shortcut: w), a list of labels comes up. you can just add the
name of your label there. So basically the color of the traffic light can
be a label.
Alternatively, you can add your labels names in
data/predefined_classes.txt such that your labels will appear in the list
of labels in labelImg.
I hope that answers your question.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/1876#issuecomment-425013015,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMZzWmvPQzmpW3byfFl13tBDMqGf1G7Mks5ufJMGgaJpZM4OPr2I
.

I have a query regarding the image size used for training.I try training SSD_inception_v2 using a train dataset. I have the original images(1280/720)´and cropped images (640/640) in the training set.My models seem not to converge and the losses fluctuate.Should i train the network only on the crops?

@funkysandman @skhater it is my understanding from reading docs and forums, it resize image to min dimension and then makes sure other dimension is less than max. Ex:If keeping aspect ratio 300x300 would be resized to 600x600 and 1184x1410 would be resized to 600x714

@giridhar13 Would suggest having your training images as close to the dimension of the images you plan to run through the model. Ex: If you are going to test/utilize cropped images, train on cropped images.

@bharat77s Yes would do 4 different labels red, yellow, green, off. Make sure you edit label map and config file to reflect changes. If this presents issues my train model to pull out stop lights and then retrain on the cropped images to detect color.

Does anyone know how fixed_shape_resizer interacts with ssd_random_crop? Does it take a random crop of size defined in fixed_shape_resizer?

These SSD512 pre-trained would help you to start with larger images though.
https://github.com/lambdal/lambda-deep-learning-demo
OR (this is what I use...)
https://github.com/balancap/SSD-Tensorflow

From all this I conclude that the original images to be labeled must be in proportion, close to 1: 1 (ssd_mobilenet model ... and their resolutions must be for example: 900x1000 or 1200x1300, etc.).
That's right ?

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

Hi everyone, was just wondering if the image resizer param from the config file had an impact only on the training phase or also on the inference phase ? By that I mean that do my model, once trained, will resize each images to 300*300 for example if I use SSD with it's first convolution layers, or not ?

Thanks !

Hi everyone, I am facing a problem which could have a similar reason: https://github.com/tensorflow/tensorflow/issues/45148

Was this page helpful?
0 / 5 - 0 ratings