Darknet: Yolo v3 TensorRT Implementation — Super accurate low latency object detection on a surveillance UAV

Created on 17 Dec 2019 · 9Comments · Source: AlexeyAB/darknet

Jetnet proposes a very similar implementation of what is done in this repo. It tries to achieve low latency using Yolo V3 on Nvidia embedded platforms (TX2 and Jetson) using TensorRT optimisations.

Screenshot 2019-12-17 at 14 02 40

Results on Visdrone:

Screenshot 2019-12-17 at 14 11 21

Is the speed similar the the one achieved in this repo?

want enhancement

Source

laclouis5

Most helpful comment

@laclouis5 Yes.
You will get the same relatrive improvement relative to default yolov3.cfg (Leaky & FP32).

To get absoule speed/accuracy as in paper:

Also set width=608 height=352 for training - it will be 2x faster than 608x608.
extract from MS COCO dataset only 6 classes: person, car, bicycle, motorbike, bus, truck - it will improve accuracy 2x

AlexeyAB on 17 Dec 2019

👍2

All 9 comments

The main changes why do they get high accuracy:

Quantization INT8 on TenroRT - with calibration only, my experimental implementation: https://github.com/AlexeyAB/yolo2_light

We use a calibration set of 1000 images, randomly sampled from our training set.

Are used only 6 classes from 80 (MS COCO), custom network size 608x352, and custom anchors
Using only layers supported by TenorRT: ReLU and custom-upsample layer, to avoid FP32->INT8->FP32 conversions

Or just use ReLU instead of Leaky-ReLU

Or use Leaky_ReLU = 2 scale-layers + ReLU + shortcut-layer:

if (x >= 0) out = x*a + x*(1-a) = x
if (x < 0) out = x*a + x*0 = x*a

AlexeyAB on 17 Dec 2019

@laclouis5

You can change https://github.com/AlexeyAB/darknet/blob/63396082d7e77f4b460bdb2540469f5f1a3c7c48/cfg/yolov3-spp.cfg model

set width=608 height=352 https://github.com/AlexeyAB/darknet/blob/63396082d7e77f4b460bdb2540469f5f1a3c7c48/cfg/yolov3-spp.cfg#L8-L9
replace all activation=leaky to activation=relu in cfg-file
extract from MS COCO dataset only 6 classes: person, car, bicycle, motorbike, bus, truck
train this cfg-file by using this repository https://github.com/AlexeyAB/darknet
quantize and run this model on TenorRT: https://news.developer.nvidia.com/deepstream-sdk-4-now-available/

You should get approximately the same result.

AlexeyAB on 17 Dec 2019

👍1

@AlexeyAB Ok thanks, so changing to relu, training and then quantize to TensorRT should improve network latency for a small accuracy drop?

laclouis5 on 17 Dec 2019

👍1

@laclouis5 Yes.
You will get the same relatrive improvement relative to default yolov3.cfg (Leaky & FP32).

To get absoule speed/accuracy as in paper:

Also set width=608 height=352 for training - it will be 2x faster than 608x608.
extract from MS COCO dataset only 6 classes: person, car, bicycle, motorbike, bus, truck - it will improve accuracy 2x

AlexeyAB on 17 Dec 2019

👍2

@AlexeyAB
What do you mean by quantize and run this model on TenorRT: https://news.developer.nvidia.com/deepstream-sdk-4-now-available/

uday60 on 20 Dec 2019

@AlexeyAB
set width=608 height=352 for training - it will affect the accuracy?

ou525 on 23 Dec 2019

Hi, I'm currently running yolo3-tiny on xavier. My inputsize is 576*352 I already converted yolo to tensorRT. As JetNet's parer mentioned we can achieve 60% speed up if we alter LeakeyRelu to Relu. However, I don't see any difference between them in my tests. In my case the speed for yolo-tiny at fp16 is about 13ms and int8 is about 10ms. As I know the yolo3-tiny is several times faster than yolo3 which means yolo-tiny should be at about 3-6ms in tensorRT at int8. Are there anyone seeing the same problem? Any help in speeding up yolo-tiny in tensorRT is welcomed. Thanks!

zeyuDai2018 on 16 Jan 2020

Hi, I'm currently running yolo3-tiny on xavier. My inputsize is 576*352 I already converted yolo to tensorRT. As JetNet's parer mentioned we can achieve 60% speed up if we alter LeakeyRelu to Relu. However, I don't see any difference between them in my tests. In my case the speed for yolo-tiny at fp16 is about 13ms and int8 is about 10ms. As I know the yolo3-tiny is several times faster than yolo3 which means yolo-tiny should be at about 3-6ms in tensorRT at int8. Are there anyone seeing the same problem? Any help in speeding up yolo-tiny in tensorRT is welcomed. Thanks!

Hi, how did you converted yolo to tensorRT please ?

Kmarconi on 28 Apr 2020

@Kmarconi Hi, I refered to this repo https://github.com/lewes6369/TensorRT-Yolov3. Basically it converts the darknet model to caffe and use TesnsorRt to parse the caffe model.

zeyuDai2018 on 7 May 2020

Was this page helpful?

0 / 5 - 0 ratings