Darkflow: Increase the framerate for real time detection.

Created on 23 Mar 2017  路  20Comments  路  Source: thtrieu/darkflow

Hi, first of all, thanks for the wonderful work!

I want to use it for webcam real-time detection. The framerate currently is around 4fps. Is there any way to increase it? I noticed that in the original paper it can reach 60 or even high fps. Thank you so much.

help wanted

Most helpful comment

I've started work on this. Here are the preliminary results after cythonizing the findboxes function in yolo2->test

These results are for tiny-yolo-voc, on a Titanxp. Once I've debugged the code I'll put in a pull request

Before cythonizing

using tfnet.predict()

Forwarding 11 inputs ...
Total time = 0.478069067001s / 11 inps = 23.0092276603 ips
Post processing 11 inputs ...
Total time = 0.64615893364s / 11 inps = 17.0236754881 ips

using tfnet.return_predict(dog_im)

[Time to output return predict]    0.0479979515076 s

After cythonizing

using tfnet.predict()

Forwarding 11 inputs ...
Total time = 0.468509197235s / 11 inps = 23.4787279842 ips
Post processing 11 inputs ...
Total time = 0.192795038223s / 11 inps = 57.05541025 ips

using tfnet.return_predict(dog_im)

[Time to output return predict]    0.00974702835083 s

Edit: Submitted the pull request

All 20 comments

@lliu25 , can you share your computer hardware details?
for higher fps, you need to run on GPU, pass this argument --gpu 1.0 . if you are using Nvidia Video card then its compute capability should be greater than 3.0 (also make sure u install all prerequisites like Cuda, Cudnn e.t.c)
if you want to get 60fps then u'll need Nvidia gtx 1080 or similar

The speed can be much much better than the current speed if I or someone else write the post-processing in Cython instead of the current numpy.

@Prakash19921206 I'm running it on an Alienware laptop with gtx 1060, 6GB GPU RAM. I'm using cuda 8.0, cudnn 5.1. I set --gpu 0.9. I feel like the fps should be much high

@thtrieu so is it normal to have this low fps at least for now? The bottleneck is numpy?

Yes. In detail:

There are two stages to go from image to bounding boxes:

  • First one is the deep net forwarding, which is done by tensorflow, so there is no way for us to optimize this part.
  • Second one is post-processing, which is done by numpy operations, python for loops, etc, which can be significantly speed up (to approximately the speed of darknet) using Cython.

By forwarding the images to output files, you can actually see how much time it took for each stage. The second stage is usually very expensive.

@thtrieu I see. I did try a windows version, which is c based, and it reaches 30 fps under the same settings. And conversion to cython seems kinda tricky. May I kindely ask if any recent plan on using cython?

I do, but I'd like to take a one or two days off completely to finish that (which I cannot now). The main plan is to just simply reuse code from darknet.

I see the following repo has Cython implementation for post-processing: https://github.com/longcw/yolo2-pytorch, one can surely reuse theirs and give appropriate credit.

@thtrieu sounds good. C is so much faster than python:) again thank you for your wonderful work!

@lliu25 are you running on linux or widows? What version of tensorflow are you using. I am having trouble getting the webcam demo working on my windows 10 desktop setup with a gtx 1060. I am using tensorflow 1.0.1. It works with the cpu version but not the gpu version.

@strickon I'm running tensorflow on Linux. It is the latest version. It works with cuda 8.0 and cudnn 5.1. Did you make sure your gpu is set up correctly? When you run nvidia-smi you should see the info of your gpu.

@lliu25 There appears to be some sort of issue on windows with tensorflow gpu. I may have to install linux. There isn't a problem with my gpu setup. Some gpu functions work. There is something with this particular cnn that is causing issues.

E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windowstensorflow\stream_executor\cuda\cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windowstensorflow\stream_executor\cuda\cuda_dnn.cc:404] error retrieving driver version: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windowstensorflow\stream_executor\cuda\cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windowstensorflow\core\kernels\conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

I even tried running without cudnn but no luck since conv2d requires it with the gpu version of tensorflow.

I hvae also tried this yolo implementation and it works on my gpu but their setup is for yolo v1.
https://github.com/gliese581gg/YOLO_tensorflow

So it could still be a bug with darkflow.

I believe that it works on the CPU as I think it may be more tolerant of whatever error is happening or that the conv2d gpu implementation is bad on windows

I've started work on this. Here are the preliminary results after cythonizing the findboxes function in yolo2->test

These results are for tiny-yolo-voc, on a Titanxp. Once I've debugged the code I'll put in a pull request

Before cythonizing

using tfnet.predict()

Forwarding 11 inputs ...
Total time = 0.478069067001s / 11 inps = 23.0092276603 ips
Post processing 11 inputs ...
Total time = 0.64615893364s / 11 inps = 17.0236754881 ips

using tfnet.return_predict(dog_im)

[Time to output return predict]    0.0479979515076 s

After cythonizing

using tfnet.predict()

Forwarding 11 inputs ...
Total time = 0.468509197235s / 11 inps = 23.4787279842 ips
Post processing 11 inputs ...
Total time = 0.192795038223s / 11 inps = 57.05541025 ips

using tfnet.return_predict(dog_im)

[Time to output return predict]    0.00974702835083 s

Edit: Submitted the pull request

@Dhruv-Mohan this looks very promising. Did you get a chance to test on a webcam? Thank you for the implementation!

@lliu25 you can pull the new code now for cythonizing v2!

@lliu25 Sorry, haven't tested it on a webcam.
Constructing the boxes and doing NMS takes about 400 us with the cython update. So, I don't think you'll have a problem reaching higher framerate now

@Dhruv-Mohan I will try it out asap. Thank you for the great work! @thtrieu thank you for the update as well!

@lliu25 what version of linux are you using? I just installed elementary loki which uses ubuntu 16.04 I installed the cuda drivers as well as cudnn. I am tried to run the webcam demo and got the same exact crash as on windows. This is a gtx1060 so it is very similar to your setup.

@strickon I have ubuntu 14 I believe. The crash might due to out of memory error? Have you checked the details of the error message?

First of all for the splendid work @thtrieu.
@lliu25 @Dhruv-Mohan @thtrieu What framerate you guys were able to achieve?

Hi, i am using windows 10 and geforce 920M card ( compute capability 3.5). i am getting a frame rate of 1.6, is there any way to take it to 10 fps?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xunkaixin picture xunkaixin  路  4Comments

jubjamie picture jubjamie  路  4Comments

realityzero picture realityzero  路  3Comments

wonny2001 picture wonny2001  路  4Comments

Khobzer picture Khobzer  路  5Comments