Darkflow: Increase the framerate for real time detection.

Created on 23 Mar 2017 · 20Comments · Source: thtrieu/darkflow

Hi, first of all, thanks for the wonderful work!

I want to use it for webcam real-time detection. The framerate currently is around 4fps. Is there any way to increase it? I noticed that in the original paper it can reach 60 or even high fps. Thank you so much.

help wanted

Source

lliu25

Most helpful comment

I've started work on this. Here are the preliminary results after cythonizing the findboxes function in yolo2->test

These results are for tiny-yolo-voc, on a Titanxp. Once I've debugged the code I'll put in a pull request

Before cythonizing

using tfnet.predict()

Forwarding 11 inputs ...
Total time = 0.478069067001s / 11 inps = 23.0092276603 ips
Post processing 11 inputs ...
Total time = 0.64615893364s / 11 inps = 17.0236754881 ips

using tfnet.return_predict(dog_im)

[Time to output return predict]    0.0479979515076 s

After cythonizing

using tfnet.predict()

Forwarding 11 inputs ...
Total time = 0.468509197235s / 11 inps = 23.4787279842 ips
Post processing 11 inputs ...
Total time = 0.192795038223s / 11 inps = 57.05541025 ips

using tfnet.return_predict(dog_im)

[Time to output return predict]    0.00974702835083 s

Edit: Submitted the pull request

Dhruv-Mohan on 28 Mar 2017

👍5 🎉2

All 20 comments

@lliu25 , can you share your computer hardware details?
for higher fps, you need to run on GPU, pass this argument --gpu 1.0 . if you are using Nvidia Video card then its compute capability should be greater than 3.0 (also make sure u install all prerequisites like Cuda, Cudnn e.t.c)
if you want to get 60fps then u'll need Nvidia gtx 1080 or similar

Prakash19921206 on 24 Mar 2017

The speed can be much much better than the current speed if I or someone else write the post-processing in Cython instead of the current numpy.

thtrieu on 24 Mar 2017

@Prakash19921206 I'm running it on an Alienware laptop with gtx 1060, 6GB GPU RAM. I'm using cuda 8.0, cudnn 5.1. I set --gpu 0.9. I feel like the fps should be much high

lliu25 on 24 Mar 2017

@thtrieu so is it normal to have this low fps at least for now? The bottleneck is numpy?

lliu25 on 24 Mar 2017

Yes. In detail:

There are two stages to go from image to bounding boxes:

First one is the deep net forwarding, which is done by tensorflow, so there is no way for us to optimize this part.
Second one is post-processing, which is done by numpy operations, python for loops, etc, which can be significantly speed up (to approximately the speed of darknet) using Cython.

By forwarding the images to output files, you can actually see how much time it took for each stage. The second stage is usually very expensive.

thtrieu on 24 Mar 2017

@thtrieu I see. I did try a windows version, which is c based, and it reaches 30 fps under the same settings. And conversion to cython seems kinda tricky. May I kindely ask if any recent plan on using cython?

lliu25 on 24 Mar 2017

I do, but I'd like to take a one or two days off completely to finish that (which I cannot now). The main plan is to just simply reuse code from darknet.

I see the following repo has Cython implementation for post-processing: https://github.com/longcw/yolo2-pytorch, one can surely reuse theirs and give appropriate credit.

thtrieu on 24 Mar 2017

👍1

@thtrieu sounds good. C is so much faster than python:) again thank you for your wonderful work!

lliu25 on 24 Mar 2017

@lliu25 are you running on linux or widows? What version of tensorflow are you using. I am having trouble getting the webcam demo working on my windows 10 desktop setup with a gtx 1060. I am using tensorflow 1.0.1. It works with the cpu version but not the gpu version.

strickon on 24 Mar 2017

@strickon I'm running tensorflow on Linux. It is the latest version. It works with cuda 8.0 and cudnn 5.1. Did you make sure your gpu is set up correctly? When you run nvidia-smi you should see the info of your gpu.

lliu25 on 24 Mar 2017

@lliu25 There appears to be some sort of issue on windows with tensorflow gpu. I may have to install linux. There isn't a problem with my gpu setup. Some gpu functions work. There is something with this particular cnn that is causing issues.

E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windowstensorflow\stream_executor\cuda\cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windowstensorflow\stream_executor\cuda\cuda_dnn.cc:404] error retrieving driver version: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windowstensorflow\stream_executor\cuda\cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windowstensorflow\core\kernels\conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

I even tried running without cudnn but no luck since conv2d requires it with the gpu version of tensorflow.

I hvae also tried this yolo implementation and it works on my gpu but their setup is for yolo v1.
https://github.com/gliese581gg/YOLO_tensorflow

So it could still be a bug with darkflow.

I believe that it works on the CPU as I think it may be more tolerant of whatever error is happening or that the conv2d gpu implementation is bad on windows

strickon on 25 Mar 2017

I've started work on this. Here are the preliminary results after cythonizing the findboxes function in yolo2->test

These results are for tiny-yolo-voc, on a Titanxp. Once I've debugged the code I'll put in a pull request

Before cythonizing

using tfnet.predict()

Forwarding 11 inputs ...
Total time = 0.478069067001s / 11 inps = 23.0092276603 ips
Post processing 11 inputs ...
Total time = 0.64615893364s / 11 inps = 17.0236754881 ips

using tfnet.return_predict(dog_im)

[Time to output return predict]    0.0479979515076 s

After cythonizing

using tfnet.predict()

Forwarding 11 inputs ...
Total time = 0.468509197235s / 11 inps = 23.4787279842 ips
Post processing 11 inputs ...
Total time = 0.192795038223s / 11 inps = 57.05541025 ips

using tfnet.return_predict(dog_im)

[Time to output return predict]    0.00974702835083 s

Edit: Submitted the pull request

Dhruv-Mohan on 28 Mar 2017

👍5 🎉2

@Dhruv-Mohan this looks very promising. Did you get a chance to test on a webcam? Thank you for the implementation!

lliu25 on 29 Mar 2017

@lliu25 you can pull the new code now for cythonizing v2!

thtrieu on 29 Mar 2017

@lliu25 Sorry, haven't tested it on a webcam.
Constructing the boxes and doing NMS takes about 400 us with the cython update. So, I don't think you'll have a problem reaching higher framerate now

Dhruv-Mohan on 29 Mar 2017

@Dhruv-Mohan I will try it out asap. Thank you for the great work! @thtrieu thank you for the update as well!

lliu25 on 29 Mar 2017

😄1

@lliu25 what version of linux are you using? I just installed elementary loki which uses ubuntu 16.04 I installed the cuda drivers as well as cudnn. I am tried to run the webcam demo and got the same exact crash as on windows. This is a gtx1060 so it is very similar to your setup.

strickon on 8 Apr 2017

@strickon I have ubuntu 14 I believe. The crash might due to out of memory error? Have you checked the details of the error message?

lliu25 on 8 Apr 2017

First of all for the splendid work @thtrieu.
@lliu25 @Dhruv-Mohan @thtrieu What framerate you guys were able to achieve?