Hello Alexey, Mikey, everyone,
I've got some more questions..
If I'm asking things that are already known or documented, if u refer me i'll be happy.
Question 1:
[EDIT: I know we can train and continute from a specific iteration, from a saved .weights-file in the backup folder, without starting from 0, but this is about adding objects]
How stackable/modular is the training? Let's say we train a model to recognize 5 objects and all goes well. Let's say we want to add 3 more objects.. Do we then have to train a new model with all the 8 objects or is there a way to train only the 3 new objects and insert them into the model with 5 objects?
Can the same be done for the Yolo9000 with softmax-tree? Insertion on both levels (super & sub)?
Question 2:
Can we tweak the framework enough to do the following? Let's say at runtime, we want to put groups of items back together. Let's say our catalogue has 10.000 classes where sometimes the difference is tiny (#377). Let's say this is going to get annoying and prone for errors at runtime detection.
If the following is possible can I get some directions in the code so I know which are the crucial classes for this:
Can we:
-Tell the framework: look, I'm going to reassemble a group of items and I'm expecting item f, h, m, o , p , v, x, z, and only those. Please look for these items and if u find an item which you think is item "a" (not in the selected group) but you have also a 40% or more rate for one of the items in the group, please suggest my item.
Or:
-Can we go further and limit the knowledge and tell the framework to really only use a part of it's trained model? Only the selected group of items.
Can we do all this without relaunching app? It's crucial that at runtime I can do multiple groups of items, finish a group, and do the next one.
I'm drowning in C++ but if this is possible or already provided in our framework.. I'd like to do this.
Question 3
Can we fetch the detection videostream (with the bounding boxes, so NOT THE SOURCE video from the IP cam) that is used in the darknet_net_cam_coco example and show it (live) in another app? More specificly, can we send this video to a Java webapp. Frontend framework should not matter too much, probably Vaadin or JSF or Angular 4.. concern is if there is a way to capture the detection video in RealTime and pass it as Yolo is being run.
Question 4
Alexey, for the problematic described in #377 I'm thinking.. a white background or black, with 3 dots in it. 2 dots in the upper corners, one left , one right (with a bit of padding from the edges ofcourse), and 1 dot in the lower middle.
That, or 4 dots (in each corner of the background).
Sound strong/correct? This way I won't pollute my object bounding boxes with pieces of lines, grids, circles but I will still maintain a means of measurement in the overall view/full image for optimalizing proportion differentiation during training and detection.
Thank you very much for being there.
Question 1: IT is best train from beginning when adding new classes. Check this for more info https://github.com/AlexeyAB/darknet/issues/20#issuecomment-277525791
Question 2: Not currently possible. Need to be added as a new feature.
Question 3: You need to customise the demo code e.g. make your own stream function to get the stream with detections sent to a webapp. I've done this in a hacky way where I use the detections with python bindings and have everything else processed with OpenCV. I think you can create the function in demo.c, but AlexeyAB will probably be able to support you better (I'm a c++ scrub)
Thanks
I've read that post where you did it in Python i think : ) Sounds nice. Or that's something different. This will be a nice bit of work :D
Hi,
int group1[] = { 0, 5, 7 };
int group2[] = { 14, 6, 8 };
int group3[] = { 1, 4, 9 };
int *current_group;
int current_group_len;
void switch_group() {
int key = cvWaitKey(3);
if (key >= '0' && key <= '9') {
switch (key) {
case '1': current_group = group1;
current_group_len = sizeof(group1) / sizeof(group1[0]);
printf("group-1\n");
break;
case '2': current_group = group2;
current_group_len = sizeof(group2) / sizeof(group2[0]);
printf("group-2\n");
break;
case '3': current_group = group3;
current_group_len = sizeof(group3) / sizeof(group3[0]);
printf("group-3\n");
break;
}
}
}
int max_group_index(float *probs, int classes)
{
int class = -1;
float max_prob = 0;
int k;
for (k = 0; k < current_group_len; ++k) {
float cur_prob = probs[current_group[k]];
if (cur_prob > 0.4 && cur_prob > max_prob) {
max_prob = cur_prob;
class = current_group[k];
}
}
if (class < 0) class = max_index(probs, classes);
return class;
}
And change line int class = max_index(probs[i], classes); to this line int class = max_group_index(probs[i], classes); in both functions:
draw_detections() for images: https://github.com/AlexeyAB/darknet/blob/a1af57d8d60b50e8188f36b7f74752c8cc124177/src/image.c#L190draw_detections_cv() for video: https://github.com/AlexeyAB/darknet/blob/a1af57d8d60b50e8188f36b7f74752c8cc124177/src/image.c#L248And add this line switch_group(); in these 2 places:
So you can long press 1, 2 or 3 on the keyboard when Window is active, and groups will be changed.
In the groupX[] = { ... } arrays you can write required classes ID of objects for this group.
Good Afternoon Alexey,
Thank you so very much for providing an answer. Soon I have to start programming (not much time left), so when I have results or further elaborations on this I will post back.
2.
Sweet.. The code you put down seems pretty strong and open, this allows me to work inside one model if i'm not mistaking. The groups only switch between groups of class-id's inside one model. This is perfect. Does this influence GPU/CPU usage? This will keep the performance nice and clean because yolo will ignore a lot of classes.
--> Very nice. I will try this hardcoded version, and then in a second phase I will have the groups parametrised through an ini-file or other type of file, so yolo can read at launch-time. Third phase I will inject the groups at runtime through http-requests/other way.
darknet.exe detector demo data/voc.data yolo-voc.cfg yolo-voc.weights test.mp4 -i 0 -http_port 8090Examples:
localhost:8090 from the same computer.<ip-address>:8090 from another computer.cv::Mat frame;
for (cv::VideoCapture cap(filename); cap >> frame, cap.isOpened();) {
cv::imshow("window name", frame);
cv::waitKey(5);
}
So now you can get HTTP-mjpeg or RTSP video stream from the videocamera, then detect objects and send result HTTP-mjpeg video stream to the web-browser or another application using command like this: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights http://192.168.0.80:8080/video?dummy=param.mjpg -i 0 -http_port 8090
There are some bugs:
Most helpful comment
Question 1: IT is best train from beginning when adding new classes. Check this for more info https://github.com/AlexeyAB/darknet/issues/20#issuecomment-277525791
Question 2: Not currently possible. Need to be added as a new feature.
Question 3: You need to customise the demo code e.g. make your own stream function to get the stream with detections sent to a webapp. I've done this in a hacky way where I use the detections with python bindings and have everything else processed with OpenCV. I think you can create the function in demo.c, but AlexeyAB will probably be able to support you better (I'm a c++ scrub)