Darknet: Training fails only if cuDNN enabled with RTX3090

Created on 19 Oct 2020  路  8Comments  路  Source: AlexeyAB/darknet

Bug Overview

  • Layer loading is extremely slow, takes about 30 minutes
  • After layer loading, training fails: avg loss = nan
  • The bug occurs only if darknet is built with CUDNN=1

Reproduction

  1. Created Dockerfile to reproduce the bug. I built the docker image on a Workstation(refer a environment section).
  2. Prepared my custom dataset and set up on the Workstation. I uploaded a configuration file here.
  3. Run a docker container, with dataset folder mounted and gpu provided.

Log

$ docker run -v ~/darknet/mydata:/app/darknet/data -it --gpus device=0 myname/darknet:1.0.3 detector train -dont_show ./data/obj.data ./data/yolo4.cfg ./data/backup/yolov4.conv.137
 CUDA-version: 10020 (11010), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1  
 CUDNN_HALF=1 
 OpenCV version: 3.2.0
valid: Using default 'data/train.txt'
yolo4
 0 : compute_capability = 860, cudnn_half = 1, GPU: GeForce RTX 3090 
net.optimized_memory = 0 
mini_batch = 2, batch = 64, time_steps = 1, train = 1 
   layer   filters  size/strd(dil)      input                output
   0 
### HANG UP HERE FOR 30 MINUTES ###
conv     32       3 x 3/ 1    768 x 768 x   3 ->  768 x 768 x  32 1.019 BF
   1 conv     64       3 x 3/ 2    768 x 768 x  32 ->  384 x 384 x  64 5.436 BF
   2 conv     64       1 x 1/ 1    384 x 384 x  64 ->  384 x 384 x  64 1.208 BF
   3 route  1                                  ->  384 x 384 x  64 
   4 conv     64       1 x 1/ 1    384 x 384 x  64 ->  384 x 384 x  64 1.208 BF
   5 conv     32       1 x 1/ 1    384 x 384 x  64 ->  384 x 384 x  32 0.604 BF
   6 conv     64       3 x 3/ 1    384 x 384 x  32 ->  384 x 384 x  64 5.436 BF
   7 Shortcut Layer: 4,  wt = 0, wn = 0, outputs: 384 x 384 x  64 0.009 BF
   8 conv     64       1 x 1/ 1    384 x 384 x  64 ->  384 x 384 x  64 1.208 BF
   9 route  8 2                                ->  384 x 384 x 128 
### ... ###
Total BFLOPS 203.057 
avg_outputs = 1670200 
 Allocate additional workspace_size = 52.43 MB 
Loading weights from ./data/backup/yolov4.conv.137...
 seen 64, trained: 0 K-images (0 Kilo-batches_64) 
Done! Loaded 137 layers from weights-file 
Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005
 Detection layer: 139 - type = 28 
 Detection layer: 150 - type = 28 
 Detection layer: 161 - type = 28 
Resizing, random_coef = 1.40 

 1120 x 1120 
 Create 6 permanent cpu-threads 
 try to allocate additional workspace_size = 52.43 MB 
 CUDA allocate done! 
Loaded: 0.000061 seconds
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.509620, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 15572.158203, iou_loss = 0.000000, total_loss = 15572.158203 
### ... ###
 1: -nan, -nan avg loss, 0.000000 rate, 5.492178 seconds, 64 images, -1.000000 hours left
### ... ###
 2: -nan, -nan avg loss, 0.000000 rate, 5.540169 seconds, 128 images, 45.767176 hours left

see here for full log

Environment

Docker container on the Workstation

Workstation

  • Ubuntu 20.04.1 LTS
  • 2x RTX 3090 installed
  • Docker version 19.03.13, build 4484c46d9d

Client (for investigation)

  • Windows 10 build 19608
  • 1x GTX 1080 installed
  • CUDA 10.0
  • cuDNN 7.6.0

Investigation

Validation of dataset

Apart from the workstation, I installed darknet using vcpkg on the client(windows), and trained the same dataeset with gpu. It worked well and trained model seems to be fine, therefore I think there is nothing wrong with dataset.

Identification of causes

I changed compile option and tested.

  • CUDNN=0 CUDNN_HALF=0: fine(ofcource it's slow since cuDNN is disabled)
  • CUDNN=1 CUDNN_HALF=0: bug occurs
  • CUDNN=1 CUDNN_HALF=1: bug occurs

Thus, I concluded this bug occurs only if darknet is built with CUDNN=1.

Also, I tested some CUDA version.

  • CUDA 10.0 bug occurs
  • CUDA 10.1 bug occurs
  • CUDA 10.2 bug occurs
  • CUDA 11.0 compile fail Unsupported gpu architecture 'compute_30'

CUDA versions do not matter. It seems to be a problem with CUDA 11 is another issue.

And then, I tested some darknet version. I'm sorry that I forget the version I tested, but it still failed with version around 2020.03 . Therefore I think it's not caused by recent change. This might be a compatibility issue with RTX 3090, docker, or cuDNN 7.6.5.

Most helpful comment

I tried
CUDNN=1 CUDNN_HALF=1
cudnn 8.0.4
cuda 11.1
CV 4.5

comment out compute_30
"+" -gencode arch=compute_86,code=[sm_86,compute_86]

Build is OK!

All 8 comments

I'm facing the exact same issue. Also running on RTX 3090. Should not be docker-related issue, since CUDNN is working fine on AWS GPU accelerated instances (using Tesla cards) through docker

I tested cuDNN with TITAN X, it worked well. Maybe darknet does not fully support RTX3090...

I tried
CUDNN=1 CUDNN_HALF=1
cudnn 8.0.4
cuda 11.1
CV 4.5

comment out compute_30
"+" -gencode arch=compute_86,code=[sm_86,compute_86]

Build is OK!

@takashide Thanks for the solution. Works great on CUDA 11.1 CUDNN Docker Image, with your suggested modifications

Thank you @takashide 馃憤

@takashide Thanks for the solution. Works great on CUDA 11.1 CUDNN Docker Image, with your suggested modifications

Do you have inference performace metrics with rtx3090?

@takashide Thanks for the solution. Works great on CUDA 11.1 CUDNN Docker Image, with your suggested modifications

Do you have inference performace metrics with rtx3090?

In my experience, it's more than twice as much as 1080ti.

@AlgirdasKartavicius I recently compared RTX3090 vs MSI gs65 laptop running RTX 2070

Inference:
RTX 3090 - 0,047 seconds
RTX 2070 Laptop card - 0,11 seconds

Planning to compare it also with 3070, 3080 and any other NVIDIA cards I can get my hands on, since it's hard to find good comparisons for deep learning, and yolo specifically.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Cipusha picture Cipusha  路  3Comments

Mididou picture Mididou  路  3Comments

Yumin-Sun-00 picture Yumin-Sun-00  路  3Comments

Jacky3213 picture Jacky3213  路  3Comments

HanSeYeong picture HanSeYeong  路  3Comments