avg loss = nanCUDNN=1$ docker run -v ~/darknet/mydata:/app/darknet/data -it --gpus device=0 myname/darknet:1.0.3 detector train -dont_show ./data/obj.data ./data/yolo4.cfg ./data/backup/yolov4.conv.137
CUDA-version: 10020 (11010), cuDNN: 7.6.5, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1
OpenCV version: 3.2.0
valid: Using default 'data/train.txt'
yolo4
0 : compute_capability = 860, cudnn_half = 1, GPU: GeForce RTX 3090
net.optimized_memory = 0
mini_batch = 2, batch = 64, time_steps = 1, train = 1
layer filters size/strd(dil) input output
0
### HANG UP HERE FOR 30 MINUTES ###
conv 32 3 x 3/ 1 768 x 768 x 3 -> 768 x 768 x 32 1.019 BF
1 conv 64 3 x 3/ 2 768 x 768 x 32 -> 384 x 384 x 64 5.436 BF
2 conv 64 1 x 1/ 1 384 x 384 x 64 -> 384 x 384 x 64 1.208 BF
3 route 1 -> 384 x 384 x 64
4 conv 64 1 x 1/ 1 384 x 384 x 64 -> 384 x 384 x 64 1.208 BF
5 conv 32 1 x 1/ 1 384 x 384 x 64 -> 384 x 384 x 32 0.604 BF
6 conv 64 3 x 3/ 1 384 x 384 x 32 -> 384 x 384 x 64 5.436 BF
7 Shortcut Layer: 4, wt = 0, wn = 0, outputs: 384 x 384 x 64 0.009 BF
8 conv 64 1 x 1/ 1 384 x 384 x 64 -> 384 x 384 x 64 1.208 BF
9 route 8 2 -> 384 x 384 x 128
### ... ###
Total BFLOPS 203.057
avg_outputs = 1670200
Allocate additional workspace_size = 52.43 MB
Loading weights from ./data/backup/yolov4.conv.137...
seen 64, trained: 0 K-images (0 Kilo-batches_64)
Done! Loaded 137 layers from weights-file
Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005
Detection layer: 139 - type = 28
Detection layer: 150 - type = 28
Detection layer: 161 - type = 28
Resizing, random_coef = 1.40
1120 x 1120
Create 6 permanent cpu-threads
try to allocate additional workspace_size = 52.43 MB
CUDA allocate done!
Loaded: 0.000061 seconds
v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.509620, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 15572.158203, iou_loss = 0.000000, total_loss = 15572.158203
### ... ###
1: -nan, -nan avg loss, 0.000000 rate, 5.492178 seconds, 64 images, -1.000000 hours left
### ... ###
2: -nan, -nan avg loss, 0.000000 rate, 5.540169 seconds, 128 images, 45.767176 hours left
Apart from the workstation, I installed darknet using vcpkg on the client(windows), and trained the same dataeset with gpu. It worked well and trained model seems to be fine, therefore I think there is nothing wrong with dataset.
I changed compile option and tested.
CUDNN=0 CUDNN_HALF=0: fine(ofcource it's slow since cuDNN is disabled)CUDNN=1 CUDNN_HALF=0: bug occursCUDNN=1 CUDNN_HALF=1: bug occursThus, I concluded this bug occurs only if darknet is built with CUDNN=1.
Also, I tested some CUDA version.
CUDA 10.0 bug occursCUDA 10.1 bug occursCUDA 10.2 bug occursCUDA 11.0 compile fail Unsupported gpu architecture 'compute_30'CUDA versions do not matter. It seems to be a problem with CUDA 11 is another issue.
And then, I tested some darknet version. I'm sorry that I forget the version I tested, but it still failed with version around 2020.03 . Therefore I think it's not caused by recent change. This might be a compatibility issue with RTX 3090, docker, or cuDNN 7.6.5.
I'm facing the exact same issue. Also running on RTX 3090. Should not be docker-related issue, since CUDNN is working fine on AWS GPU accelerated instances (using Tesla cards) through docker
I tested cuDNN with TITAN X, it worked well. Maybe darknet does not fully support RTX3090...
I tried
CUDNN=1 CUDNN_HALF=1
cudnn 8.0.4
cuda 11.1
CV 4.5
comment out compute_30
"+" -gencode arch=compute_86,code=[sm_86,compute_86]
Build is OK!
@takashide Thanks for the solution. Works great on CUDA 11.1 CUDNN Docker Image, with your suggested modifications
Thank you @takashide 馃憤
@takashide Thanks for the solution. Works great on CUDA 11.1 CUDNN Docker Image, with your suggested modifications
Do you have inference performace metrics with rtx3090?
@takashide Thanks for the solution. Works great on CUDA 11.1 CUDNN Docker Image, with your suggested modifications
Do you have inference performace metrics with rtx3090?
In my experience, it's more than twice as much as 1080ti.
@AlgirdasKartavicius I recently compared RTX3090 vs MSI gs65 laptop running RTX 2070
Inference:
RTX 3090 - 0,047 seconds
RTX 2070 Laptop card - 0,11 seconds
Planning to compare it also with 3070, 3080 and any other NVIDIA cards I can get my hands on, since it's hard to find good comparisons for deep learning, and yolo specifically.
Most helpful comment
I tried
CUDNN=1 CUDNN_HALF=1
cudnn 8.0.4
cuda 11.1
CV 4.5
comment out compute_30
"+" -gencode arch=compute_86,code=[sm_86,compute_86]
Build is OK!