Darknet: Some problems with tracking

Created on 2 Apr 2019  路  11Comments  路  Source: AlexeyAB/darknet

I am trying to track vehicles with recent version of yolo_console_dll but i am facing some problems:

  • The detection results using darknet.exe detector demo is much better than it is in tracking. I realize in tracking the detection is not performed on each frame. Is it possible to make it run detection on more frames?

  • some times the tracking box stays even after the object has left the scene.

this is detection results:
ezgif com-video-to-gif(1)

but the tracking is like this:
ezgif com-video-to-gif

  • How to reduce the size of colored boxes containing labels of the detected objects? Is it possible to make them transparent?

  • The fps of the resulting video is the same as the input video, but the resulting video is a bit slower.

Thanks

All 11 comments

in my case it was really fast. Fatsrer then the normal input video, tracking video. whats the frame story and max distance u r using

how long is ur video?

@buzdarbalooch I change here std::max(35, video_fps) to std::max(1, video_fps):
https://github.com/AlexeyAB/darknet/blob/0543278a5bd7064fae6538afd1761b06b10f73ee/src/yolo_console_dll.cpp#L296

The resulting video that you see while running the program is different than the one that is saved as result.avi. The one you see is faster.
My fps is 15 and the video is about 15 minutes long.

@hadi-ghnd

  • Optical-flow tracker should be used for video-stream from Camera instead of Video-file if your GPU can't process each frame from Camera. Optical-flow allows to achive more FPS than Neural Network, it tracks objects between detections.





@AlexeyAB thank you for your helpful response.
I tried Kalman filter as you suggested and the results got better:
ezgif com-video-to-gif(2)

But I still have some questions:

  • As you can see there are double detections for some objects. Can this be solved by tracking or increasing nms?

  • some times the tracking box stays behind the object. I think this can be because of the Kalman filter. Is there an option to fix this like the one you mentioned for TRACK_OPTFLOW?

  • My last question: Is there an option to count the objects of each class or should I add it myself?

As you can see there are double detections for some objects. Can this be solved by tracking or increasing nms?

some times the tracking box stays behind the object. I think this can be because of the Kalman filter. Is there an option to fix this like the one you mentioned for TRACK_OPTFLOW?

No. You should train your model better. Collect much more images from such video with the same point of view and the same relative sizes of objects.
And read: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
Or you should use better model, for example yolov3-spp.cfg / weights


You should implement it by yourself - some hints:

                    track_id_state_id_time[i].track_id = ++track_id_vec[result_vec_pred[i].obj_id];
                    result_vec_pred[i].track_id = track_id_vec[result_vec_pred[i].obj_id];

..... etc

Dear @AlexeyAB and @hadi-ghnd . i shall be greatful if u can help me debug the tracking issues i am face.

i have experimented on two datasets of football(soccer), ap of both datasets are below. 1st datset is for four classes, second is for seven classes.
debug1
debug2

Below are the result when i try to run tracking:
LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ./uselib data/obj2.names cfg/yolov2.cfg Backup1/yolov2_3700.weights cornerkick1.mov

frame_id = 184
track_id = 441, obj_id = 3, x = 57, y = 235, w = 6, h = 143, prob = 0.316

track_id = 446, obj_id = 3, x = 244, y = 103, w = 3, h = 268, prob = 0.273

track_id = 276, obj_id = 3, x = 51, y = 116, w = 0, h = 156, prob = 0.254

track_id = 297, obj_id = 3, x = 55, y = 333, w = 0, h = 167, prob = 0.254

track_id = 443, obj_id = 3, x = 251, y = 174, w = 67, h = 149, prob = 0.253

track_id = 540, obj_id = 3, x = 244, y = 9, w = 0, h = 245, prob = 0.252

track_id = 263, obj_id = 1, x = 251, y = 301, w = 0, h = 120, prob = 0.262

track_id = 409, obj_id = 3, x = 154, y = 356, w = 0, h = 119, prob = 0.247

track_id = 270, obj_id = 1, x = 152, y = 172, w = 0, h = 250, prob = 0.285

track_id = 277, obj_id = 1, x = 156, y = 272, w = 0, h = 178, prob = 0.292

track_id = 535, obj_id = 3, x = 152, y = 69, w = 0, h = 250, prob = 0.227

track_id = 1503, obj_id = 0, x = 1168, y = 77, w = 167, h = 0, prob = 0.271

track_id = 542, obj_id = 3, x = 933, y = 32, w = 0, h = 208, prob = 0.221

track_id = 572, obj_id = 3, x = 464, y = 186, w = 175, h = 0, prob = 0.221

track_id = 95, obj_id = 3, x = 251, y = 57, w = 0, h = 282, prob = 0.217

track_id = 1516, obj_id = 0, x = 867, y = 131, w = 168, h = 0, prob = 0.314

track_id = 1509, obj_id = 0, x = 767, y = 131, w = 164, h = 0, prob = 0.295

track_id = 444, obj_id = 3, x = 48, y = 81, w = 0, h = 118, prob = 0.212

track_id = 1671, obj_id = 0, x = 760, y = 408, w = 73, h = 79, prob = 0.23

track_id = 543, obj_id = 3, x = 146, y = 38, w = 0, h = 212, prob = 0.211

track_id = 573, obj_id = 3, x = 245, y = 429, w = 0, h = 83, prob = 0.208

track_id = 1795, obj_id = 0, x = 48, y = 188, w = 224, h = 0, prob = 0.214

track_id = 1052, obj_id = 0, x = 1077, y = 635, w = 130, h = 35, prob = 0.415

track_id = 1733, obj_id = 0, x = 1042, y = 186, w = 187, h = 0, prob = 0.218

track_id = 266, obj_id = 2, x = 1039, y = 11, w = 0, h = 145, prob = 0.201

track_id = 278, obj_id = 1, x = 648, y = 233, w = 0, h = 135, prob = 0.278

if u notice, system is producing wrong bounding boxes for objects even with very less probability

another error which i caught is below , when i try to implement tracking on aa image. Here, if u notice the command i execute and the weight it loads are different , weights are loaded from the previous model i trained. see the weights the system loads Loading weights from backup/yolo_10100.weights...

akhan@tensorflow-System-Product-Name:~/darknet-master$ LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ./uselib data/obj2.names cfg/yolov2.cfg Backup1/yolov2_3700.weights experi.jpg
Used GPU 0
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF
1 max 2 x 2 / 2 416 x 416 x 32 -> 208 x 208 x 32 0.006 BF
2 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF
3 max 2 x 2 / 2 208 x 208 x 64 -> 104 x 104 x 64 0.003 BF
4 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
5 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF
6 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF
7 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128 0.001 BF
8 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
9 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF
10 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF
11 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256 0.001 BF
12 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
13 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
14 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
15 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF
16 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF
17 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512 0.000 BF
18 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
19 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
20 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
21 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF
22 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF
23 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BF
24 conv 1024 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x1024 3.190 BF
25 route 16
26 conv 64 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 64 0.044 BF
27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024 3.987 BF
30 conv 45 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 45 0.016 BF
31 detection
mask_scale: Using default '1.000000'
Total BFLOPS 29.343
Loading weights from backup/yolo_10100.weights...
seen 64
Done!
input image or video filename: Time: 0.200338 sec
: cannot connect to X server

@AlexeyAB Thank you for your suggestions. The modifications you mentioned were enough to count instances of different classes.

@AlexeyAB I got a dataset with a bunch of videos with linear movements (the camera is hovering above and facing the ground in a linear pattern) so I give a try to Kalman Filter and it seems to work pretty well.

False Positives are greatly reduced and box sizes are more consistent and stable. Currently I'm using your C++ library through yolo_console_dll.cpp and I'm wondering if Kalman Filtering is available through the Python API or if I must implement it myself using OpenCV?

Last question, are the changes I made to yolo_console_dll.cpp to enable Kalman Fitering passed on yolo_cpp_dll.dll (through this API: yolo_v2_class.hpp) i.e when using Detector class, are track_ids tracked by Kalman Filter?

@laclouis5

False Positives are greatly reduced and box sizes are more consistent and stable. Currently I'm using your C++ library through yolo_console_dll.cpp and I'm wondering if Kalman Filtering is available through the Python API or if I must implement it myself using OpenCV?

You should implement it by yourselft using OpenCV.

Last question, are the changes I made to yolo_console_dll.cpp to enable Kalman Fitering are passed on yolo_cpp_dll.dll (through this API: yolo_v2_class.hpp) i.e when using Detector class, are track_ids tracked by Kalman Filter?

No. File yolo_console_dll.cpp is used only for ./uselib. It isn't used in yolo_cpp_dll.dll

Was this page helpful?
0 / 5 - 0 ratings