The Tensorflow Object Detection API is >3x slower in inference than comparable Tensorflow implementations.
The paper "Speed/accuracy trade-offs for modern convolutional object detectors" states:
"Postprocessing can take up the bulk of the running time for the fastest models at ∼40ms and currently caps our maximum framerate to 25 frames per second."
What justifies the bulky Postprocessing? Can you please make it optional or faster?
Benchmark SSD300 on Nvidia GTX 1080, Ubuntu 16.04:
ssd_mobilenet_v1_coco 15.55 FPS
ssd_inception_v2_coco 14.07 FPS
https://github.com/balancap/SSD-Tensorflow(VGG) 55.55 FPS
Test mAP COCO
ssd_mobilenet_v1_coco 21 mAP
ssd_inception_v2_coco 24 mAP
https://github.com/balancap/SSD-Tensorflow(VGG) 25.1 mAP
@jch1 Here's some feedback on the performance of the object_detection API.
Hi @speyside42 --- we have changed the tf.image.non_max_suppression op to be significantly faster since that paper was written. This being said, it is still run on CPU. Future work might be to do some of this work on the GPU.
@jch1 , I am also unable to reproduce the numbers listed by the Google team on the model zoo. On the Resnet101 coco model, I am seeing ~3x slower times on a Titan X with TF 1.4. Created a stack overflow question.
what i experienced with the API is very low GPU Usage while detecting/inference.
Is there any option to optimize that? Is there a way to switch to GPU Mode?
For example i got following performances detecting a openCV Webcam Stream on SSD Mobile Net:
My project repo is:(https://github.com/GustavZ/realtime_object_detection)
I would appreciate any hints on how to increase performance!
As GustavZ has found out in the meantime, you can simply change the non-maxima threshold from 1e-8 to something like 0.5 in the .config file to drastically improve speed without loosing considerable performance. Then just export your model as a frozen graph. You will receive the timings they present in the table. (still slower than balancap)
GustavZ introduced more tricks to speed up the inference, I recommend checking out his repo.
I'm disappointed that this is not the default configuration and that issues are mostly ignored even when simple fixes exist.
Most helpful comment
what i experienced with the API is very low GPU Usage while detecting/inference.
Is there any option to optimize that? Is there a way to switch to GPU Mode?
For example i got following performances detecting a openCV Webcam Stream on SSD Mobile Net:
My project repo is:(https://github.com/GustavZ/realtime_object_detection)
I would appreciate any hints on how to increase performance!