Darknet: The network size, image size and object size

Created on 23 May 2019 · 31Comments · Source: AlexeyAB/darknet

Hi @AlexeyAB

My testing and training image size is 320 x 240 px. Because of the limitation on the computing on the processor (Atom E3845 - Quad core - 1.91 GHz), i have to reduce the network size to 160 x 160 to increase the detection time. I use the tiny-yolo configuration for my network, would it affect the accuracy of the training model ?

Thank you so much!!!
I am a new one on YOLO. If you need more information about this question. please leave the comment.

Thank you so much

Source

trannhutle

Most helpful comment

@isgursoy

Can you show examples?

Cropped objects that are inserted in another image increases accuracy - is known as CutMix: https://arxiv.org/pdf/1905.04899v2.pdf

Also read:
https://github.com/AlexeyAB/darknet/issues/4264

AlexeyAB on 25 Nov 2019

👍2

All 31 comments

Hi @AlexeyAB !!!
Could you please to show me, how to reduce the detection time and keep maintain the accuracy of the model.
Thank you so much!!!

trannhutle on 26 May 2019

@trannhutle Hi,

You can use width=320 height=224 in yolov3-tiny.cfg to achive high speed without accuracy drop.
If you use random=1 in cfg-file, then you should use only this repository for Training, and any for Detection.

If you use width=160 height=160 then it will lead to slightly loss of accuracy.

AlexeyAB on 26 May 2019

@AlexeyAB Hi Alexey,

Thank you so much for your answer. It increase the accuracy of the model so much!!!

Because the limitation on the processor, for detection, the network resolution that I could set would be width 192 and height: 192. Could you give me some advice for setting configuration training and detect without dropping in detection accuracy ?
Does the image resolution for the training have to be bigger or same size with the network resolution?
Do we have to maintain the image resolution for both training and testing ?

Thank you so much for your help!!!

trannhutle on 28 May 2019

@AlexeyAB Hi Alexey,
For increase the time for recognition, when i build the libdarknet.so, on the Makefile i change the AVX = 1. It does not work, do you know how to fix it?

Thank you so much !

trannhutle on 28 May 2019

@AlexeyAB Hi Alexey,

I used your modify cfg file (Tiny-model: 3 yolo layers: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-tiny_3l.cfg). The result is so amazing, but it costs over 5 secs to detect objects. Could you please teach me how to change the cfg file to improve the calculation time and avoid accuracy reduction. Thank you so much!!!! Alexey!!!

trannhutle on 30 May 2019

@trannhutle Hi,

To speedup Detection on CPU set OPENMP=1 or OPENMP=1 AVX=1 in the Makefile.
Try to train with width 192 and height: 160

For increase the time for recognition, when i build the libdarknet.so, on the Makefile i change the AVX = 1. It does not work, do you know how to fix it?

Can you show screenshot?
What CPU do you use?

AlexeyAB on 31 May 2019

@AlexeyAB Hi Alexey,

I did try with 'width 192 and height: 160' on training and it work really well, Thank you so much.

About the detection, I think it would be better when we setup the network resolution to

width 224 and height 192 on detection.

About
> To speedup Detection on CPU set OPENMP=1 or OPENMP=1 AVX=1 in the Makefile.

There is no error when i build with 'OPENMP=1', but with the 'OPENMP=1 AVX=1', there is no error on build the so lib. However, when i initialize the network, it show Try to load cfg: ./config/cfg/test_so.cfg, weights: ./config/weights/test_so.weights, clear = 0 Illegal instruction

About
> What CPU do you use?

This is my CPU : (Atom E3845 - Quad core - 1.91 GHz)

Thank you so much!!!

trannhutle on 1 Jun 2019

Hi @AlexeyAB ,

I do not acctualy understand the meaning of Network Resolution, could you please give me some document to understand about that. Thank you so much!!!

trannhutle on 1 Jun 2019

@trannhutle Hi,
width= and height= in cfg-file is a network resolution

Atom E3845 doesn't have AVX2, since it is old CPU: https://ark.intel.com/content/www/ru/ru/ark/products/78475/intel-atom-processor-e3845-2m-cache-1-91-ghz.html

So you should compile with OPENMP=1 AVX=0

AlexeyAB on 2 Jun 2019

Hi @AlexeyAB ,

I have changed the configuration and It does work really well.

I have another question, is the background (except from the bounding box) from the images in training data set, affect learning of YOLO ? Or the learning is affected by the region inside of the bounding box ?
About the overexposed and underexposed images on the detection image, how could we train the model (including capturing the images) to deal with overexposed and underexposed on the image ?

What if the network just learn the objects with the same color ? like (apple, cucumber, avocado, green capsicum, ...) How could we deal with those kind of problems?

Thank you so much for your strong support!!!

trannhutle on 12 Jun 2019

HI @AlexeyAB ,

Why do i train the Tiny Yolo for 4 objects, data set for every object around 160 images the accuracy is very low, while I train with the same configuration for 14 objects It work better?

What does the factor affect to the training model?

trannhutle on 12 Jun 2019

Hi @AlexeyAB,

About your comment on this, https://github.com/AlexeyAB/darknet/issues/3001#issuecomment-485773915, Does the background affect training, even though it does not include the objects ?

trannhutle on 12 Jun 2019

I have another question, is the background (except from the bounding box) from the images in training data set, affect learning of YOLO ? Or the learning is affected by the region inside of the bounding box ?

Background from the images affects learning Yolo.

About the overexposed and underexposed images on the detection image, how could we train the model (including capturing the images) to deal with overexposed and underexposed on the image ?

Use data augmentation, set exposure=3.0 in cfg: https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-%5Bnet%5D-section

What if the network just learn the objects with the same color ? like (apple, cucumber, avocado, green capsicum, ...) How could we deal with those kind of problems?

What is the problem?

About your comment on this, #3001 (comment), Does the background affect training, even though it does not include the objects ?

Yes.

AlexeyAB on 12 Jun 2019

Hi @AlexeyAB,

What i would like to do next is capturing the background images and i will crop the objects on different angles and location. Next I would apply those cropped objects into different background. Does it help to improve the accuracy of training the network ?

Thank for your quick response!!!

trannhutle on 12 Jun 2019

@AlexeyAB,

Although I know that the reflection of the color from the background affect the object, does applying the cropped objects increase the training and detecting?
Someone says It would not help the network learn more feature about the object. Could you please give me some idea about that ?

Thank you so much Alexey!

trannhutle on 12 Jun 2019

Although I know that the reflection of the color from the background affect the object, does applying the cropped objects increase the training and detecting?

No (in this case).

AlexeyAB on 12 Jun 2019

Next I would apply those cropped objects into different background. Does it help to improve the accuracy of training the network ?

It can improve accuracy.

AlexeyAB on 12 Jun 2019

Although I know that the reflection of the color from the background affect the object, does applying the cropped objects increase the training and detecting?

No (in this case).

In this case you mean that increasing the training time and detecting time or what ? I do not very much understand ?

Thank you so much Alexey!!!

trannhutle on 12 Jun 2019

Hi @AlexeyAB ,

About the function in image.c
void draw_detections(image im, int num, float thresh, box *boxes, float **probs, char **names, image **alphabet, int classes),
How could use it in python ? Because now when i get the detection result, i draw the bounding box is so bad ?

If we could use it on Python, how could i use it ? What parameters do i have to pass on ?

Thank you so much @AlexeyAB

trannhutle on 21 Jun 2019

@trannhutle

Use this in Python: https://github.com/AlexeyAB/darknet/blob/c9129c207823a96f0a1b3a840883a6c510073347/darknet_video.py#L18-L33

Or this: https://github.com/AlexeyAB/darknet/blob/c9129c207823a96f0a1b3a840883a6c510073347/darknet.py#L413-L424

AlexeyAB on 22 Jun 2019

Next I would apply those cropped objects into different background. Does it help to improve the accuracy of training the network ?

It can improve accuracy.

@AlexeyAB
So cropping positive rectangle and putting it randomly on different background does not hurt accuracy?
There will be strong borders and region in the box will be totally different than outside. It will allow us to reduce labeling errors but I am not sure if this is beneficial.
What if there are many annotations? Or what if I leave some padding inside box before moving to a new background?

For example, we use a pseudo labeler to detect detectable objects and putting them on random or its own clean bg and there are claims in team that this hurts accuracy.

isgursoy on 25 Nov 2019

@isgursoy

Can you show examples?

Cropped objects that are inserted in another image increases accuracy - is known as CutMix: https://arxiv.org/pdf/1905.04899v2.pdf

Also read:
https://github.com/AlexeyAB/darknet/issues/4264

AlexeyAB on 25 Nov 2019

👍2

We will be back with examples from our case in few hours. Thanks for your time.

isgursoy on 25 Nov 2019

In addition to isgursoy's post:

Putting cropped object to a different background improves the model? By cropping an object we mean to take the object from its original background by its bounding box, we don’t mean a technique like CutMix. In case of a human detection problem, we mean cropping the entire human object and putting it to a different background. My question is about three cases:

Does it improve the model to put the cropped human to a completely different background?
We automatically detected humans and labelled them in a pseudo way. Then we cropped them and located the detected boxes back onto a specified general background that is slightly different from its original background. Does it affect accuracy?
Original image sizes may be different from the network size. For example, image size can be 512x512 (square) while the network size can be 416x416 (square) and they are proportional. What if the image size is rectangle and network size is square or vice versa? Does it affect the accuracy?

ekarabulut on 25 Nov 2019

@ekarabulut

If we believe the results of the article https://arxiv.org/pdf/1905.04899v2.pdf , yes, it increases accuracy.
Yes, it increases accuracy
If network size 416x416 and image size was 640x480 during both Training and Detection - then this is normal.
It’s bad when objects have different aspect ratios during training and detection, after that image is resized to the network size, for example, training image 1000x100, while detection image 100x1000

AlexeyAB on 25 Nov 2019

in addition to @ekarabulut 's post:
1-2) @AlexeyAB Strong gradient makes me think. May model learn borders and wants to use this trick? In my opinion, small padding for positive makes me feel better.
3) Images in varying sizes in many aspect ratios.

a) Square model: Distort 16:9 validation set by resizing it to a square and use the knowledge model learned by distorting many 16:9 and 4:3 images to a square. The train set mostly contains 16:9.
b) 16:9 model: See 16:9 samples as they are, distort 4:3 samples (minority) to 16:9 in training. The validation set is in 16:9.
Option B looks better but we can use ~300px for training height in 16:9 model, instead ~500 px in the square. Also, we can only use small batches in training because of GPU memory.

isgursoy on 25 Nov 2019

@AlexeyAB First off, thanks for the quick reply.

In CutMix, a part of bounding box (e.g. human's leg) is inserted into another bounding box. In the above example image (1), whole bounding box is put into another background (e.g. context is removed or replaced for the box). Is your comment still valid for this situation?

ekarabulut on 25 Nov 2019

@ekarabulut

It depends on your task.

In general it improves accuracy, like any variety improves accuracy.

But to be more precise, in your training dataset:

there should be images as similar as possible to the ones you will use for Detection
and there should not be those that you will not use for Detection

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more

AlexeyAB on 25 Nov 2019

👍1

@isgursoy @ekarabulut

1-2) @AlexeyAB Strong gradient makes me think. May model learn borders and want to use this trick? In my opinion, small padding for positive makes me feel better.
3) Images in varying sizes in many aspect ratios.

Yes, a model can simply be overfitted to boundaries (Strong gradient), in the end it will just look for sharp boundaries instead of the objects themselves - it will degrade accuracy.

May be later I will add something like this with Blending by using Pyramids (if OPENCV=1): https://docs.opencv.org/master/dc/dff/tutorial_py_pyramids.html

I added this issue: #4378

About different aspect-ratios there are pros and cons for different resize approaches: #232 (comment)

What do you think about leave some padding after a positive box in crop and move? What changes in this case in your opinion?

isgursoy on 25 Nov 2019

@isgursoy @ekarabulut

1-2) @AlexeyAB Strong gradient makes me think. May model learn borders and want to use this trick? In my opinion, small padding for positive makes me feel better.
3) Images in varying sizes in many aspect ratios.

Yes, a model can simply be overfitted to boundaries (Strong gradient), in the end it will just look for sharp boundaries instead of the objects themselves - it will degrade accuracy.

May be later I will add something like this with Blending by using Pyramids (if OPENCV=1): https://docs.opencv.org/master/dc/dff/tutorial_py_pyramids.html
orapple

I added this issue: https://github.com/AlexeyAB/darknet/issues/4378

About different aspect-ratios there are pros and cons for different resize approaches: https://github.com/AlexeyAB/darknet/issues/232#issuecomment-336955485

AlexeyAB on 25 Nov 2019

Thanks.

isgursoy on 22 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings