Darknet: Framework(Training) Questions/Theory pt. 3

Created on 10 Feb 2018 · 7Comments · Source: AlexeyAB/darknet

Hello Alexey, Mikey, everyone,

I've got some more questions..

If I'm asking things that are already known or documented, if u refer me i'll be happy.

Question 1:
[EDIT: I know we can train and continute from a specific iteration, from a saved .weights-file in the backup folder, without starting from 0, but this is about adding objects]

How stackable/modular is the training? Let's say we train a model to recognize 5 objects and all goes well. Let's say we want to add 3 more objects.. Do we then have to train a new model with all the 8 objects or is there a way to train only the 3 new objects and insert them into the model with 5 objects?

Can the same be done for the Yolo9000 with softmax-tree? Insertion on both levels (super & sub)?

Question 2:
Can we tweak the framework enough to do the following? Let's say at runtime, we want to put groups of items back together. Let's say our catalogue has 10.000 classes where sometimes the difference is tiny (#377). Let's say this is going to get annoying and prone for errors at runtime detection.

If the following is possible can I get some directions in the code so I know which are the crucial classes for this:

Can we:
-Tell the framework: look, I'm going to reassemble a group of items and I'm expecting item f, h, m, o , p , v, x, z, and only those. Please look for these items and if u find an item which you think is item "a" (not in the selected group) but you have also a 40% or more rate for one of the items in the group, please suggest my item.

Or:
-Can we go further and limit the knowledge and tell the framework to really only use a part of it's trained model? Only the selected group of items.

Can we do all this without relaunching app? It's crucial that at runtime I can do multiple groups of items, finish a group, and do the next one.
I'm drowning in C++ but if this is possible or already provided in our framework.. I'd like to do this.

Question 3

Can we fetch the detection videostream (with the bounding boxes, so NOT THE SOURCE video from the IP cam) that is used in the darknet_net_cam_coco example and show it (live) in another app? More specificly, can we send this video to a Java webapp. Frontend framework should not matter too much, probably Vaadin or JSF or Angular 4.. concern is if there is a way to capture the detection video in RealTime and pass it as Yolo is being run.

Question 4

Alexey, for the problematic described in #377 I'm thinking.. a white background or black, with 3 dots in it. 2 dots in the upper corners, one left , one right (with a bit of padding from the edges ofcourse), and 1 dot in the lower middle.

That, or 4 dots (in each corner of the background).

Sound strong/correct? This way I won't pollute my object bounding boxes with pieces of lines, grids, circles but I will still maintain a means of measurement in the overall view/full image for optimalizing proportion differentiation during training and detection.

Thank you very much for being there.

question

Source

SJRogue

Most helpful comment

Question 1: IT is best train from beginning when adding new classes. Check this for more info https://github.com/AlexeyAB/darknet/issues/20#issuecomment-277525791
Question 2: Not currently possible. Need to be added as a new feature.
Question 3: You need to customise the demo code e.g. make your own stream function to get the stream with detections sent to a webapp. I've done this in a hacky way where I use the detections with python bindings and have everything else processed with OpenCV. I think you can create the function in demo.c, but AlexeyAB will probably be able to support you better (I'm a c++ scrub)

TheMikeyR on 13 Feb 2018

👍2

All 7 comments

TheMikeyR on 13 Feb 2018

👍2

Thanks

SJRogue on 15 Feb 2018

I've read that post where you did it in Python i think : ) Sounds nice. Or that's something different. This will be a nice bit of work :D

SJRogue on 15 Feb 2018

Hi,

So you can add C-code with groups here: https://github.com/AlexeyAB/darknet/blob/a1af57d8d60b50e8188f36b7f74752c8cc124177/src/image.c#L184

int group1[] = { 0, 5, 7 };
int group2[] = { 14, 6, 8 };
int group3[] = { 1, 4, 9 };
int *current_group;
int current_group_len;

void switch_group() {
    int key = cvWaitKey(3);
    if (key >= '0' && key <= '9') {
        switch (key) {
        case '1': current_group = group1;
            current_group_len = sizeof(group1) / sizeof(group1[0]);
            printf("group-1\n");
            break;
        case '2': current_group = group2;
            current_group_len = sizeof(group2) / sizeof(group2[0]);
            printf("group-2\n");
            break;
        case '3': current_group = group3;
            current_group_len = sizeof(group3) / sizeof(group3[0]);
            printf("group-3\n");
            break;
        }
    }
}

int max_group_index(float *probs, int classes)
{
    int class = -1;
    float max_prob = 0;
    int k;
    for (k = 0; k < current_group_len; ++k) {
        float cur_prob = probs[current_group[k]];
        if (cur_prob  > 0.4 && cur_prob  > max_prob) {
            max_prob = cur_prob;
            class = current_group[k];
        }
    }
    if (class < 0) class = max_index(probs, classes);
    return class;
}

And change line int class = max_index(probs[i], classes); to this line int class = max_group_index(probs[i], classes); in both functions:
- draw_detections() for images: https://github.com/AlexeyAB/darknet/blob/a1af57d8d60b50e8188f36b7f74752c8cc124177/src/image.c#L190
- draw_detections_cv() for video: https://github.com/AlexeyAB/darknet/blob/a1af57d8d60b50e8188f36b7f74752c8cc124177/src/image.c#L248
And add this line switch_group(); in these 2 places:
- https://github.com/AlexeyAB/darknet/blob/a1af57d8d60b50e8188f36b7f74752c8cc124177/src/image.c#L525
- https://github.com/AlexeyAB/darknet/blob/a1af57d8d60b50e8188f36b7f74752c8cc124177/src/image.c#L540

So you can long press 1, 2 or 3 on the keyboard when Window is active, and groups will be changed.
In the groupX[] = { ... } arrays you can write required classes ID of objects for this group.

AlexeyAB on 16 Feb 2018

👍1

Do you want to implement this stream: avi/mpeg-VideoFile -> Darknet-Yolo -> RTSP/HTTP-MJPEG-stream -> Vaadin / JSF / Angular?
Yes, it can be implemented:
- You can read this link: https://stackoverflow.com/questions/25845219/c-opencv-streaming-camera-video-images-mjpeg-from-socket-into-browser-windo
- And this link: http://answers.opencv.org/question/6976/display-iplimage-in-webbrowsers/

Yes, you can add points to background. This will not pollute the image. Although this is not necessary, until neural networks with full scale invariance are invented :)

AlexeyAB on 16 Feb 2018

👍1

Good Afternoon Alexey,

Thank you so very much for providing an answer. Soon I have to start programming (not much time left), so when I have results or further elaborations on this I will post back.

2.
Sweet.. The code you put down seems pretty strong and open, this allows me to work inside one model if i'm not mistaking. The groups only switch between groups of class-id's inside one model. This is perfect. Does this influence GPU/CPU usage? This will keep the performance nice and clean because yolo will ignore a lot of classes.

--> Very nice. I will try this hardcoded version, and then in a second phase I will have the groups parametrised through an ini-file or other type of file, so yolo can read at launch-time. Third phase I will inject the groups at runtime through http-requests/other way.

Cool, I will investigate and try to convert the yolo-output to a simple video-tag on a web page.

Nice. First phase, I am trying a soft grid background.

SJRogue on 16 Feb 2018

I added output mjpeg-video-stream with detected objects using command like this: darknet.exe detector demo data/voc.data yolo-voc.cfg yolo-voc.weights test.mp4 -i 0 -http_port 8090

Examples:

Then you can open in the WebBrowser (Chrome/Firefox) localhost:8090 from the same computer.
Or you can open in the WebBrowser (Chrome/Firefox) <ip-address>:8090 from another computer.
Also you can see this video stream using other OpenCV-application on another computer using C++ code like this with your IP-address:

cv::Mat frame;
for (cv::VideoCapture cap(filename); cap >> frame, cap.isOpened();) {
    cv::imshow("window name", frame);
    cv::waitKey(5);
}

So now you can get HTTP-mjpeg or RTSP video stream from the videocamera, then detect objects and send result HTTP-mjpeg video stream to the web-browser or another application using command like this: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights http://192.168.0.80:8080/video?dummy=param.mjpg -i 0 -http_port 8090

There are some bugs:

IE doesn't see this video-stream (works well on Chrome/Firefix)
On Linux, after closing the tab in the browser on which the video stream is browsing, the application is terminated

AlexeyAB on 18 Feb 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings