Hello,
I have some images which the target object might be as small as 20*20 in pixels. I changed two places in the code which you have noted in the comments, I mean
mmod_options options(face_boxes_train, 40, 40); to
mmod_options options(face_boxes_train, 20, 20);
and
cropper.set_min_object_size(40, 40); to
cropper.set_min_object_size(20, 20);
but the model does not converge, I mean the average loss does not fall below 1.0.
With default values, (40, 40), the model converges and the average loss falls around 0.16, but the model is incapable of detecting small objects.
That network in the example has 3 downsampling layers. Think about what
that means for the resolution of the detector. It makes it stride the
detection window 8 pixels at a time. So that network isn't going to be
able to detect really small objects due to that resolution loss. You need
to change the network if you want to detect small objects.
Or you could upsample your image and leave the network alone.
I think I have to change the network, because if I up-sample the images, then the network will get trained by the bigger objects and it will not detect smaller ones in the test images, if I am right.
That network in the example has 3 downsampling layers. Think about what that means for the resolution of the detector. It makes it stride the detection window 8 pixels at a time. So that network isn't going to be able to detect really small objects due to that resolution loss. You need to change the network if you want to detect small objects.
Where should I study to change the network? is there somewhere in the examples?
You could also upsample the images when testing. Then it would be the same.
You could also upsample the images when testing. Then it would be the same.
it is the same concept since we still make the objects bigger, something like focusing. There is no way to change the model?
Not it isn't. If you upsampled all images, training and testing, by 2x
then you could use the example as is since your objects would all be 40x40
instead of 20x20. That would work fine. The only issue would then be the
additional computation involved.
Yes, you are right for images, but let me explain the problem by an example.
Assume that we have an installed fixed camera to track a tennis ball or a ping pong ball in a field. if we "zoom in" to make the objects bigger, the we lose or limit the view of the field.
Do you have a suggestion?
I'm not talking about zooming in. Just upsample the image. Call
dlib::pyramid_up() for instance.
Okay, thank you very much. I got the point. That's a good trick.
I was thinking to write an external code, so if the Dlib provides methods such as dlib::pyramid_up(), we can use it.
you have used it here also:
// Now lets run the detector on the testing images and look at the outputs.
image_window win;
for (auto&& img : images_test)
{
pyramid_up(img);
auto dets = net(img);
win.clear_overlay();
win.set_image(img);
for (auto&& d : dets)
win.add_overlay(d);
cin.get();
}
return 0;
Warning: this issue has been inactive for 216 days and will be automatically closed on 2018-09-07 if there is no further activity.
If you are waiting for a response but haven't received one it's likely your question is somehow inappropriate. E.g. you didn't follow the issue submission instructions, or your question is easily answerable by reading the FAQ, dlib's documentation, or a Google search.
Notice: this issue has been closed because it has been inactive for 220 days. You may reopen this issue if it has been closed in error.