Darknet: CUDNN_HALF=1 slow performance on Xavier and Volta V100

Created on 14 Jan 2020  路  8Comments  路  Source: AlexeyAB/darknet

Hi, i have a decrease in performance with CUDNN_HALF=1

Last git code (Date: Tue Jan 14 00:21:39 2020 +0300)

Xavier

GPU=1
CUDNN=1
CUDNN_HALF=1
OPENCV=0
AVX=0
OPENMP=1
LIBSO=0
ZED_CAMERA=0

ARCH= -gencode arch=compute_72,code=[sm_72,compute_72]
and
ARCH= -gencode arch=compute_70,code=[sm_70,compute_70]

Xavier

CUDNN_HALF=0

Loading weights from /usr/local/webeye/yolov3/yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
./data/dog.jpg: Predicted in 62.901000 milli-seconds.

CUDNN_HALF=1

Loading weights from /usr/local/webeye/yolov3/yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
./data/dog.jpg: Predicted in 314.281000 milli-seconds.

Volta V100
NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2

CUDNN_HALF=0

Loading weights from yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
./data/dog.jpg: Predicted in 17.511000 milli-seconds.

CUDNN_HALF=1

Loading weights from yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
./data/dog.jpg: Predicted in 24.970000 milli-seconds.

Solved

Most helpful comment

@ggenny @vitotsai I fixed it. Download the latest Darknet version.

All 8 comments

Show AVG_FPS for each case by using such command:
./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights test.mp4 -benchmark

Hi, i got the same issue on my jetson AGX Xavier
Last git code (Date: Tue Jan 7 01:17:28 2020 +0300)

using darknet benchmark result:
CUDNN_HALF=0
yolov3.cfg size=416 - FPS:15.8
csdarknet53-panet-spp.cfg size=416 - FPS:15.5

CUDNN_HALF=1
yolov3.cfg size=416 - FPS:13.1
csdarknet53-panet-spp.cfg size=416 - FPS:13.2

Hi

Xavier

CUDNN_HALF=1

Loading weights from /usr/local/webeye/yolov3/yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
video file: out.mp4
Video stream: 1280 x 720
Objects:

FPS:13.0 AVG_FPS:13.0

CUDNN_HALF=0

Loading weights from /usr/local/webeye/yolov3/yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
video file: out.mp4
Video stream: 1280 x 720
Objects:

FPS:15.5 AVG_FPS:15.5

v100 (as soon as possible I insert the benchmark)

@ggenny @vitotsai I fixed it. Download the latest Darknet version.

Hi AlexeyAB, thanks a lot for your reply!
using the latest version test result:
CUDNN_HALF=0
yolov3.cfg size=416 - FPS:14.6
csdarknet53-panet-spp.cfg size=416 - FPS:14.4

CUDNN_HALF=1
yolov3.cfg size=416 - FPS:21.6
csdarknet53-panet-spp.cfg size=416 - FPS:19.3

CUDNN_HALF=1 can run faster than CUDNN_HALF=0 now!

But the CUDNN_HALF=0 performance seem slower than before(fps 15.8 -> 14.6)

Hi AlexeyAB, you are amazing

now is perfect on both gpu arch

Hi, i got the same issue on my jetson AGX Xavier
Last git code (Date: Tue Jan 7 01:17:28 2020 +0300)

using darknet benchmark result:
CUDNN_HALF=0
yolov3.cfg size=416 - FPS:15.8
csdarknet53-panet-spp.cfg size=416 - FPS:15.5

CUDNN_HALF=1
yolov3.cfg size=416 - FPS:13.1
csdarknet53-panet-spp.cfg size=416 - FPS:13.2

hello,the cfg file that you use is csresnext50-panet-spp or others?

Hi @AlexeyAB I am having the same issue on Jetson Xavier AGX - Jetpack 4.3 (latest).

Building with CUDNN_HALF=0 or =1 gives the same AVG_FPS 14.8 when using the demo - benchmark.

Built using the latest repo cloned today. Note if I build with CMAKE it compiles with CUDNN_HALF=0. So I have deleted repo and compiled again with make and by adjusting the makefile as below.

Any ideas to fix would be greatly appreciated. I see the FPS performance is exactly the same as @vitotsai HALF=0 performance.

Make with:

GPU=1
CUDNN=1
CUDNN_HALF=0 or 1
OPENCV=1
AVX=0
OPENMP=1
LIBSO=0
ZED_CAMERA=0 # ZED SDK 3.0 and above
ZED_CAMERA_v2_8=0 # ZED SDK 2.X

ARCH= -gencode arch=compute_72,code=[sm_72,compute_72]

CUDNN_HALF=0
FPS:14.8 AVG_FPS:14.8

./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights cartest.mp4 -benchmark
CUDA-version: 10000 (10000), cuDNN: 7.6.3, GPU count: 1
OpenCV version: 4.1.1
Demo
compute_capability = 720, cudnn_half = 0
net.optimized_memory = 0
mini_batch = 1, batch = 1, time_steps = 1, train = 0
.....
Total BFLOPS 65.879
avg_outputs = 532444
Allocate additional workspace_size = 52.43 MB
Loading weights from yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
video file: cartest.mp4
Video stream: 1280 x 720

CUDNN_HALF=1:
FPS:14.8 AVG_FPS:14.7

./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights cartest.mp4 -benchmark
CUDA-version: 10000 (10000), cuDNN: 7.6.3, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV version: 4.1.1
Demo
compute_capability = 720, cudnn_half = 1
net.optimized_memory = 0
mini_batch = 1, batch = 1, time_steps = 1, train = 0
....
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
Total BFLOPS 65.879
avg_outputs = 532444
Allocate additional workspace_size = 52.43 MB
Loading weights from yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
video file: cartest.mp4
Video stream: 1280 x 720

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yongcong1415 picture yongcong1415  路  3Comments

qianyunw picture qianyunw  路  3Comments

kebundsc picture kebundsc  路  3Comments

louisondumont picture louisondumont  路  3Comments

shootingliu picture shootingliu  路  3Comments