Darknet: Deeper Training Questions/Theory

Created on 6 Feb 2018 · 6Comments · Source: AlexeyAB/darknet

Hello Alexey, anybody,

Thanks for your previous answers. Hopefully you can help.

I have a bigger question this time. I'm working with +/- 20.000 classes, for simplicity let's say 500..

Given the following challenge (picture 1 below) and the following solution (picture 2 below):

Question 1
I have to be able to see the difference between 1A-1B-1C (3 different objects) but I also have to be able to see the difference between 2A-2B-2C (3 different objects but a different difference)..
[I've marked with a blue line so you can see what the different difference is]

I have to be able to see that 3A-3B-3C (the same object) are exactly the same object.

Can this be done by darknet/yolo? Can this all be done in 1 same model? I am thinking this is possible. My solution would be to train using the template backgrounds (second picture) and then use the chosen background in production environment.

Question 2

Which background would be best to train on from 1-10 ?

Ofcourse in the production environment i will then use the same background as the one used to train. Ofcourse in training I will try to make around 1000-2000 pictures per class on the same background.

Question 3

Does color or color contrast affect /make a difference for the backgrounds? Should I use black background with white lines or just grey/wood.

Question 4

Do I use softmaxing to Object Detect between superclasses (i.e. seperate Knife vs Scissor) and then Image Classify between subclasses (i.e. seperate Knife 1 vs Knife 2)?

challenge

template

Source

SJRogue

Most helpful comment

Thanks, Alexey.

For taking the time and understanding the environment I'm putting down. I think you gave me what I need to know.

I will have questions about
- capture-stage, training-stage & detection-stage resolution
- detection-stage signalisation registration (when is a signal emitted/interrupted)
-> this is already for version 2 of my platform [not soon], where I will be
moving objects into and outside of camera scope, trying to avoid the
same object being detected twice but have to register two different objects of
the same class as 2 different entries of 1 class (possibly belonging to a different
group but that will be handled on my level once I can understand registration)

I will refer to you, no doubt, in my graduation project : )

SJRogue on 7 Feb 2018

👍3

All 6 comments

@SJRogue Hi,

Yes it can be done using 1 model. Your training dataset should contain each object with all: scales, rotations, lightings, from different sides - from which yolo should detect objects.

If object scale and photographic shooting point is the same for all images in training and detection datasets - then train with these params in your cfg-file:

jitter=0
random=0

If not, then use:

jitter=0.3
random=1

Also if in your task you must distinguish the color of objects (i.e. if you should to detect objects with the same shape but with different colours - as different classes), then train with:

saturation = 0
exposure = 1.5
hue=0

I think better to use background-1.

Other backrounds can be used only if you want to know real size of object (not only proportions), but:

if photographic shooting point can vary - backround 2-4
if photographic shooting point can vary and the object can be rotated much - backround 8-10

It is important that the object does not merge with the background. You should clearly see the outline of the object.

But for example, if some of your objects are white and some are black - i.e. if you can not pick the color of a uniform background, so that all objects are equally well visible on it, then use the background 5-7 (with contrast lines)

In general, in this case better to use yolo9000.cfg with softmax-tree (superclasses and subclasses) https://github.com/AlexeyAB/darknet#using-yolo9000 but this model is more difficult to train. You can try to train both and compare results:

Based on:

yolo-voc.2.0.cfg: darknet.exe detector train data/obj.data yolo-obj.cfg darknet19_448.conv.23
yolo9000.cfg: darknet.exe detector train data/obj9k.data yolo-obj.cfg yolo9000.conv.22
Where yolo9000.conv.22 you can get using: darknet.exe partial yolo9000.cfg yolo9000.weights yolo9000.conv.22 22

AlexeyAB on 6 Feb 2018

Alexey.. Big thank you.

I need time to respond with more questions but for now:

Q1:
For detection I can determine scale, resolution, shooting point (distance, angle). I can not always control lighting, every environment will be different.

For me, if shape + size + proportions are the same, then color does not matter, the object is the same object.
In casu: I mean that a scissor with a red handle = scissor with a blue handle if the shape and the real life size/proportion is the same.

Q2:
In detection: I can not determine rotation of the object placed on the background by the users. They will not always place instruments precise.

Q3:
Users will place one object at a time (in version 1..).
So I'm thinking I need a background with a high contrast (background color vs lines/grid/circle colors) and I need to pick 2 colors which are never blending with the instruments.

Or.. maybe as you say.. since I can control distance, shooting point, angle.. I can take background 1.. (you sure this not a problem for determining size/proportions?)

Q4:

Exactly! I think I need 9k because if I do this without softmaxing it's going to really be a problem to categorize and differentiate.

SJRogue on 6 Feb 2018

So use in your cfg-file:
in the [net]

saturation = 1.5
exposure = 1.5
hue=.1

in the [region_layer] (jitter=0 or 0.05)

jitter=0
random=0

So, if you can not determine rotation of the object placed, then your training dataset should contain every possible rotation for each object (that are possible on detection-stage).
background-1 isn't problem for proportions. And it isn't problem for size (if scale and shooting point the same).
It can be problem, if some of objects have the same color as backgorund-1.
Yes, Softmax is used in any cases: yoloV2 and yolo9000, also yolo9000 has softmax-tree, so it will use different softmax for each subclass - this is very suitable for classifying a large number of classes.
yolo9000: https://github.com/AlexeyAB/darknet/blob/64aa0180bb74e84a75958b3da0061a9f5615729d/src/region_layer.c#L165
yoloV2: https://github.com/AlexeyAB/darknet/blob/64aa0180bb74e84a75958b3da0061a9f5615729d/src/region_layer.c#L172