Darknet: Resizing 448 darknet: ./src/network.c:392: resize_network: Assertion `0' failed.

Created on 21 Jan 2018  Â·  12Comments  Â·  Source: pjreddie/darknet

anyone can help me to solve this issue?
i change yolo.2.0cfg and add one cnn layer.

Most helpful comment

I have the same same problem and I solved it by two steps:

  1. edit the Makefile and rebuild the project
    ARCH= -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=[sm_50,compute_50] \ -gencode arch=compute_52,code=[sm_52,compute_52] \ -gencode arch=compute_60,code=sm_60 \ -gencode arch=compute_61,code=sm_61
    because my GPU is GTX 1080 and it's corresponding compute is 6.1
  2. edit src/network.c and comment the sentence out
    if(l.workspace_size > 2000000000) assert(0);

and after this two steps, I solved the problem.

All 12 comments

The way you ask for help is funny.You don't post the changes, the ghost knows what you've changed.

I have the same issue on a NVIDIA V100 (I choose -gencode arch=compute_70,code=sm_70) while everything works well on NVIDIA 1080 TI:

Region Avg IOU: 0.141527, Class: 0.026888, Obj: 0.652683, No Obj: 0.566986, Avg Recall: 0.000000, count: 4
10: 363.796783, 444.599030 avg, 0.000000 rate, 0.045145 seconds, 10 images
Resizing
544
darknet: ./src/network.c:392: resize_network: Assertion `0' failed.
Aborted (core dumped)

@Ahagpp @christopher5106 You can try to use this fork, I fixed excessively memory allocation for several (unfortunate) network sizes: https://github.com/AlexeyAB/darknet

Also if you use GPU V100 - you can use Tensor Cores for Mixed Precision calculations - how to use it: (now mixed precision supported for 1xGPu and for multi-GPU): https://github.com/AlexeyAB/darknet/issues/407

Sounds good, working well on DGX-Station with V100. On Power9 with V100, I have a problem when using CUDNN=1 with CUDNN 7.0

27 reorg / 2 26 x 26 x 64 -> 13 x 13 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 13 x 13 x1280 -> 13 x 13 x1024
30 conv 125 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 125
31 detection
Loading weights from darknet19_448.conv.23...
seen 32

after what it freezes. Without CUDNN it works well but I cannot benefit from half precision.

@christopher5106 To localize the problem, there are a few questions:

  • Does it freez only for training, or for detection too?
  • Does it work with GPU=1 CUDNN=0 in the Makefile?
  • Does it work with GPU=0 CUDNN=0?
  • Do you use OpenCV?
  • Did you try to use mixed-precision -DCUDNN_HALF in the Makefile to train on V100? (now it supports multi-GPU for DGX)
  • Do you use little endian 64-bit Linux?

it seems like there was a performance issue, we did a complete reinstall and the problem sounds to have disappeared. thanks a lot, I ll tell you more about this next week

On some runs, I get

Region Avg IOU: nan, Class: nan, Obj: -nan, No Obj: -nan, Avg Recall: 0.000000, count: 6
78444: -nan, -nan avg, 0.000010 rate, 0.180000 seconds, 78444 images

Is that normal ? When I re run it, it is ok.

@christopher5106

Region Avg IOU: nan, Class: nan, Obj: -nan, No Obj: -nan, Avg Recall: 0.000000, count: 6

If these lines occur sometimes - then this is normal.
If at some point all the lines contain nan, then the training went wrong.


78444: -nan, -nan avg, 0.000010 rate, 0.180000 seconds, 78444 images

This is always - the training went wrong.

The training went wrong indeed... is that normal ?

@christopher5106 No, this is not normal. Something wrong in the: dataset, model or source code.

@Ahagpp I've got same prob. Increase subdivision in cfg file. Its solve this problem.

I have the same same problem and I solved it by two steps:

  1. edit the Makefile and rebuild the project
    ARCH= -gencode arch=compute_30,code=sm_30 \ -gencode arch=compute_35,code=sm_35 \ -gencode arch=compute_50,code=[sm_50,compute_50] \ -gencode arch=compute_52,code=[sm_52,compute_52] \ -gencode arch=compute_60,code=sm_60 \ -gencode arch=compute_61,code=sm_61
    because my GPU is GTX 1080 and it's corresponding compute is 6.1
  2. edit src/network.c and comment the sentence out
    if(l.workspace_size > 2000000000) assert(0);

and after this two steps, I solved the problem.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sujithm picture sujithm  Â·  3Comments

spaul13 picture spaul13  Â·  3Comments

sayanmutd picture sayanmutd  Â·  3Comments

ivomarvan picture ivomarvan  Â·  3Comments

HoracceFeng picture HoracceFeng  Â·  3Comments