Hi, i have a decrease in performance with CUDNN_HALF=1
Last git code (Date: Tue Jan 14 00:21:39 2020 +0300)
Xavier
GPU=1
CUDNN=1
CUDNN_HALF=1
OPENCV=0
AVX=0
OPENMP=1
LIBSO=0
ZED_CAMERA=0
ARCH= -gencode arch=compute_72,code=[sm_72,compute_72]
and
ARCH= -gencode arch=compute_70,code=[sm_70,compute_70]
Xavier
CUDNN_HALF=0
Loading weights from /usr/local/webeye/yolov3/yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
./data/dog.jpg: Predicted in 62.901000 milli-seconds.
CUDNN_HALF=1
Loading weights from /usr/local/webeye/yolov3/yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
./data/dog.jpg: Predicted in 314.281000 milli-seconds.
Volta V100
NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2
CUDNN_HALF=0
Loading weights from yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
./data/dog.jpg: Predicted in 17.511000 milli-seconds.
CUDNN_HALF=1
Loading weights from yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
./data/dog.jpg: Predicted in 24.970000 milli-seconds.
Show AVG_FPS for each case by using such command:
./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights test.mp4 -benchmark
Hi, i got the same issue on my jetson AGX Xavier
Last git code (Date: Tue Jan 7 01:17:28 2020 +0300)
using darknet benchmark result:
CUDNN_HALF=0
yolov3.cfg size=416 - FPS:15.8
csdarknet53-panet-spp.cfg size=416 - FPS:15.5
CUDNN_HALF=1
yolov3.cfg size=416 - FPS:13.1
csdarknet53-panet-spp.cfg size=416 - FPS:13.2
Hi
Xavier
CUDNN_HALF=1
Loading weights from /usr/local/webeye/yolov3/yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
video file: out.mp4
Video stream: 1280 x 720
Objects:
CUDNN_HALF=0
Loading weights from /usr/local/webeye/yolov3/yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
video file: out.mp4
Video stream: 1280 x 720
Objects:
v100 (as soon as possible I insert the benchmark)
@ggenny @vitotsai I fixed it. Download the latest Darknet version.
Hi AlexeyAB, thanks a lot for your reply!
using the latest version test result:
CUDNN_HALF=0
yolov3.cfg size=416 - FPS:14.6
csdarknet53-panet-spp.cfg size=416 - FPS:14.4
CUDNN_HALF=1
yolov3.cfg size=416 - FPS:21.6
csdarknet53-panet-spp.cfg size=416 - FPS:19.3
CUDNN_HALF=1 can run faster than CUDNN_HALF=0 now!
But the CUDNN_HALF=0 performance seem slower than before(fps 15.8 -> 14.6)
Hi AlexeyAB, you are amazing
now is perfect on both gpu arch
Hi, i got the same issue on my jetson AGX Xavier
Last git code (Date: Tue Jan 7 01:17:28 2020 +0300)using darknet benchmark result:
CUDNN_HALF=0
yolov3.cfg size=416 - FPS:15.8
csdarknet53-panet-spp.cfg size=416 - FPS:15.5CUDNN_HALF=1
yolov3.cfg size=416 - FPS:13.1
csdarknet53-panet-spp.cfg size=416 - FPS:13.2
hello,the cfg file that you use is csresnext50-panet-spp or others?
Hi @AlexeyAB I am having the same issue on Jetson Xavier AGX - Jetpack 4.3 (latest).
Building with CUDNN_HALF=0 or =1 gives the same AVG_FPS 14.8 when using the demo - benchmark.
Built using the latest repo cloned today. Note if I build with CMAKE it compiles with CUDNN_HALF=0. So I have deleted repo and compiled again with make and by adjusting the makefile as below.
Any ideas to fix would be greatly appreciated. I see the FPS performance is exactly the same as @vitotsai HALF=0 performance.
Make with:
GPU=1
CUDNN=1
CUDNN_HALF=0 or 1
OPENCV=1
AVX=0
OPENMP=1
LIBSO=0
ZED_CAMERA=0 # ZED SDK 3.0 and above
ZED_CAMERA_v2_8=0 # ZED SDK 2.X
ARCH= -gencode arch=compute_72,code=[sm_72,compute_72]
CUDNN_HALF=0
FPS:14.8 AVG_FPS:14.8
./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights cartest.mp4 -benchmark
CUDA-version: 10000 (10000), cuDNN: 7.6.3, GPU count: 1
OpenCV version: 4.1.1
Demo
compute_capability = 720, cudnn_half = 0
net.optimized_memory = 0
mini_batch = 1, batch = 1, time_steps = 1, train = 0
.....
Total BFLOPS 65.879
avg_outputs = 532444
Allocate additional workspace_size = 52.43 MB
Loading weights from yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
video file: cartest.mp4
Video stream: 1280 x 720
CUDNN_HALF=1:
FPS:14.8 AVG_FPS:14.7
./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights cartest.mp4 -benchmark
CUDA-version: 10000 (10000), cuDNN: 7.6.3, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV version: 4.1.1
Demo
compute_capability = 720, cudnn_half = 1
net.optimized_memory = 0
mini_batch = 1, batch = 1, time_steps = 1, train = 0
....
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
Total BFLOPS 65.879
avg_outputs = 532444
Allocate additional workspace_size = 52.43 MB
Loading weights from yolov3.weights...
seen 64, trained: 32013 K-images (500 Kilo-batches_64)
Done! Loaded 107 layers from weights-file
video file: cartest.mp4
Video stream: 1280 x 720
Most helpful comment
@ggenny @vitotsai I fixed it. Download the latest Darknet version.