Models: Very slow Postprocessing in Object Detection API

Created on 4 Nov 2017 · 5Comments · Source: tensorflow/models

The Tensorflow Object Detection API is >3x slower in inference than comparable Tensorflow implementations.

The paper "Speed/accuracy trade-offs for modern convolutional object detectors" states:
"Postprocessing can take up the bulk of the running time for the fastest models at ∼40ms and currently caps our maximum framerate to 25 frames per second."

What justifies the bulky Postprocessing? Can you please make it optional or faster?

Benchmark SSD300 on Nvidia GTX 1080, Ubuntu 16.04:

ssd_mobilenet_v1_coco 15.55 FPS
ssd_inception_v2_coco 14.07 FPS
https://github.com/balancap/SSD-Tensorflow(VGG) 55.55 FPS

Test mAP COCO
ssd_mobilenet_v1_coco 21 mAP
ssd_inception_v2_coco 24 mAP
https://github.com/balancap/SSD-Tensorflow(VGG) 25.1 mAP

Source

MartinSmeyer

👍7

Most helpful comment

what i experienced with the API is very low GPU Usage while detecting/inference.
Is there any option to optimize that? Is there a way to switch to GPU Mode?
For example i got following performances detecting a openCV Webcam Stream on SSD Mobile Net:

Dell Laptop with Intel i7 CPU and Nvidia GTX 1050 runs at 30-40% GPU Usage with 25fps
Nvidia Jetson Tx2 runs at 5-10% GPU Usage with 5fps

My project repo is:(https://github.com/GustavZ/realtime_object_detection)

I would appreciate any hints on how to increase performance!

gustavz on 8 Jan 2018

👍9

All 5 comments

@jch1 Here's some feedback on the performance of the object_detection API.

jart on 7 Nov 2017

Hi @speyside42 --- we have changed the tf.image.non_max_suppression op to be significantly faster since that paper was written. This being said, it is still run on CPU. Future work might be to do some of this work on the GPU.

jch1 on 9 Nov 2017

@jch1 , I am also unable to reproduce the numbers listed by the Google team on the model zoo. On the Resnet101 coco model, I am seeing ~3x slower times on a Titan X with TF 1.4. Created a stack overflow question.

siddharthm83 on 20 Dec 2017

👍2

Dell Laptop with Intel i7 CPU and Nvidia GTX 1050 runs at 30-40% GPU Usage with 25fps
Nvidia Jetson Tx2 runs at 5-10% GPU Usage with 5fps

My project repo is:(https://github.com/GustavZ/realtime_object_detection)

I would appreciate any hints on how to increase performance!

gustavz on 8 Jan 2018

👍9

As GustavZ has found out in the meantime, you can simply change the non-maxima threshold from 1e-8 to something like 0.5 in the .config file to drastically improve speed without loosing considerable performance. Then just export your model as a frozen graph. You will receive the timings they present in the table. (still slower than balancap)

GustavZ introduced more tricks to speed up the inference, I recommend checking out his repo.

I'm disappointed that this is not the default configuration and that issues are mostly ignored even when simple fixes exist.

MartinSmeyer on 22 Aug 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings