Darknet: network_resize cause crash on GPU

Created on 15 Jan 2019  路  8Comments  路  Source: AlexeyAB/darknet

Hello @AlexeyAB

After doing resize_network to fit image resolution (according 32 pixels rules), it will crash upon network_predict. The GPU has enough memory, I've tried with VGA resolution as well. This works on pjreddie/darknet branch or with GPU=0.

Thanks,

 Try to load cfg: darknet/models/YOLOv3-coco/network.cfg, weights: darknet/models/YOLOv3-coco/network.weights, clear = 0 
layer     filters    size              input                output
   0 [New Thread 0x7fff416c4700 (LWP 1775)]
[New Thread 0x7fff40ec3700 (LWP 1776)]
conv     32  3 x 3 / 1   640 x 640 x   3   ->   640 x 640 x  32 0.708 BF
   1 conv     64  3 x 3 / 2   640 x 640 x  32   ->   320 x 320 x  64 3.775 BF
   2 conv     32  1 x 1 / 1   320 x 320 x  64   ->   320 x 320 x  32 0.419 BF
   3 conv     64  3 x 3 / 1   320 x 320 x  32   ->   320 x 320 x  64 3.775 BF
   4 Shortcut Layer: 1
   5 conv    128  3 x 3 / 2   320 x 320 x  64   ->   160 x 160 x 128 3.775 BF
   6 conv     64  1 x 1 / 1   160 x 160 x 128   ->   160 x 160 x  64 0.419 BF
   7 conv    128  3 x 3 / 1   160 x 160 x  64   ->   160 x 160 x 128 3.775 BF
   8 Shortcut Layer: 5
   9 conv     64  1 x 1 / 1   160 x 160 x 128   ->   160 x 160 x  64 0.419 BF
  10 conv    128  3 x 3 / 1   160 x 160 x  64   ->   160 x 160 x 128 3.775 BF
  11 Shortcut Layer: 8
  12 conv    256  3 x 3 / 2   160 x 160 x 128   ->    80 x  80 x 256 3.775 BF
  13 conv    128  1 x 1 / 1    80 x  80 x 256   ->    80 x  80 x 128 0.419 BF
  14 conv    256  3 x 3 / 1    80 x  80 x 128   ->    80 x  80 x 256 3.775 BF
  15 Shortcut Layer: 12
  16 conv    128  1 x 1 / 1    80 x  80 x 256   ->    80 x  80 x 128 0.419 BF
  17 conv    256  3 x 3 / 1    80 x  80 x 128   ->    80 x  80 x 256 3.775 BF
  18 Shortcut Layer: 15
  19 conv    128  1 x 1 / 1    80 x  80 x 256   ->    80 x  80 x 128 0.419 BF
  20 conv    256  3 x 3 / 1    80 x  80 x 128   ->    80 x  80 x 256 3.775 BF
  21 Shortcut Layer: 18
  22 conv    128  1 x 1 / 1    80 x  80 x 256   ->    80 x  80 x 128 0.419 BF
  23 conv    256  3 x 3 / 1    80 x  80 x 128   ->    80 x  80 x 256 3.775 BF
  24 Shortcut Layer: 21
  25 conv    128  1 x 1 / 1    80 x  80 x 256   ->    80 x  80 x 128 0.419 BF
  26 conv    256  3 x 3 / 1    80 x  80 x 128   ->    80 x  80 x 256 3.775 BF
  27 Shortcut Layer: 24
  28 conv    128  1 x 1 / 1    80 x  80 x 256   ->    80 x  80 x 128 0.419 BF
  29 conv    256  3 x 3 / 1    80 x  80 x 128   ->    80 x  80 x 256 3.775 BF
  30 Shortcut Layer: 27
  31 conv    128  1 x 1 / 1    80 x  80 x 256   ->    80 x  80 x 128 0.419 BF
  32 conv    256  3 x 3 / 1    80 x  80 x 128   ->    80 x  80 x 256 3.775 BF
  33 Shortcut Layer: 30
  34 conv    128  1 x 1 / 1    80 x  80 x 256   ->    80 x  80 x 128 0.419 BF
  35 conv    256  3 x 3 / 1    80 x  80 x 128   ->    80 x  80 x 256 3.775 BF
  36 Shortcut Layer: 33
  37 conv    512  3 x 3 / 2    80 x  80 x 256   ->    40 x  40 x 512 3.775 BF
  38 conv    256  1 x 1 / 1    40 x  40 x 512   ->    40 x  40 x 256 0.419 BF
  39 conv    512  3 x 3 / 1    40 x  40 x 256   ->    40 x  40 x 512 3.775 BF
  40 Shortcut Layer: 37
  41 conv    256  1 x 1 / 1    40 x  40 x 512   ->    40 x  40 x 256 0.419 BF
  42 conv    512  3 x 3 / 1    40 x  40 x 256   ->    40 x  40 x 512 3.775 BF
  43 Shortcut Layer: 40
  44 conv    256  1 x 1 / 1    40 x  40 x 512   ->    40 x  40 x 256 0.419 BF
  45 conv    512  3 x 3 / 1    40 x  40 x 256   ->    40 x  40 x 512 3.775 BF
  46 Shortcut Layer: 43
  47 conv    256  1 x 1 / 1    40 x  40 x 512   ->    40 x  40 x 256 0.419 BF
  48 conv    512  3 x 3 / 1    40 x  40 x 256   ->    40 x  40 x 512 3.775 BF
  49 Shortcut Layer: 46
  50 conv    256  1 x 1 / 1    40 x  40 x 512   ->    40 x  40 x 256 0.419 BF
  51 conv    512  3 x 3 / 1    40 x  40 x 256   ->    40 x  40 x 512 3.775 BF
  52 Shortcut Layer: 49
  53 conv    256  1 x 1 / 1    40 x  40 x 512   ->    40 x  40 x 256 0.419 BF
  54 conv    512  3 x 3 / 1    40 x  40 x 256   ->    40 x  40 x 512 3.775 BF
  55 Shortcut Layer: 52
  56 conv    256  1 x 1 / 1    40 x  40 x 512   ->    40 x  40 x 256 0.419 BF
  57 conv    512  3 x 3 / 1    40 x  40 x 256   ->    40 x  40 x 512 3.775 BF
  58 Shortcut Layer: 55
  59 conv    256  1 x 1 / 1    40 x  40 x 512   ->    40 x  40 x 256 0.419 BF
  60 conv    512  3 x 3 / 1    40 x  40 x 256   ->    40 x  40 x 512 3.775 BF
  61 Shortcut Layer: 58
  62 conv   1024  3 x 3 / 2    40 x  40 x 512   ->    20 x  20 x1024 3.775 BF
  63 conv    512  1 x 1 / 1    20 x  20 x1024   ->    20 x  20 x 512 0.419 BF
  64 conv   1024  3 x 3 / 1    20 x  20 x 512   ->    20 x  20 x1024 3.775 BF
  65 Shortcut Layer: 62
  66 conv    512  1 x 1 / 1    20 x  20 x1024   ->    20 x  20 x 512 0.419 BF
  67 conv   1024  3 x 3 / 1    20 x  20 x 512   ->    20 x  20 x1024 3.775 BF
  68 Shortcut Layer: 65
  69 conv    512  1 x 1 / 1    20 x  20 x1024   ->    20 x  20 x 512 0.419 BF
  70 conv   1024  3 x 3 / 1    20 x  20 x 512   ->    20 x  20 x1024 3.775 BF
  71 Shortcut Layer: 68
  72 conv    512  1 x 1 / 1    20 x  20 x1024   ->    20 x  20 x 512 0.419 BF
  73 conv   1024  3 x 3 / 1    20 x  20 x 512   ->    20 x  20 x1024 3.775 BF
  74 Shortcut Layer: 71
  75 conv    512  1 x 1 / 1    20 x  20 x1024   ->    20 x  20 x 512 0.419 BF
  76 conv   1024  3 x 3 / 1    20 x  20 x 512   ->    20 x  20 x1024 3.775 BF
  77 conv    512  1 x 1 / 1    20 x  20 x1024   ->    20 x  20 x 512 0.419 BF
  78 conv   1024  3 x 3 / 1    20 x  20 x 512   ->    20 x  20 x1024 3.775 BF
  79 conv    512  1 x 1 / 1    20 x  20 x1024   ->    20 x  20 x 512 0.419 BF
  80 conv   1024  3 x 3 / 1    20 x  20 x 512   ->    20 x  20 x1024 3.775 BF
  81 conv    255  1 x 1 / 1    20 x  20 x1024   ->    20 x  20 x 255 0.209 BF
  82 yolo
  83 route  79
  84 conv    256  1 x 1 / 1    20 x  20 x 512   ->    20 x  20 x 256 0.105 BF
  85 upsample            2x    20 x  20 x 256   ->    40 x  40 x 256
  86 route  85 61
  87 conv    256  1 x 1 / 1    40 x  40 x 768   ->    40 x  40 x 256 0.629 BF
  88 conv    512  3 x 3 / 1    40 x  40 x 256   ->    40 x  40 x 512 3.775 BF
  89 conv    256  1 x 1 / 1    40 x  40 x 512   ->    40 x  40 x 256 0.419 BF
  90 conv    512  3 x 3 / 1    40 x  40 x 256   ->    40 x  40 x 512 3.775 BF
  91 conv    256  1 x 1 / 1    40 x  40 x 512   ->    40 x  40 x 256 0.419 BF
  92 conv    512  3 x 3 / 1    40 x  40 x 256   ->    40 x  40 x 512 3.775 BF
  93 conv    255  1 x 1 / 1    40 x  40 x 512   ->    40 x  40 x 255 0.418 BF
  94 yolo
  95 route  91
  96 conv    128  1 x 1 / 1    40 x  40 x 256   ->    40 x  40 x 128 0.105 BF
  97 upsample            2x    40 x  40 x 128   ->    80 x  80 x 128
  98 route  97 36
  99 conv    128  1 x 1 / 1    80 x  80 x 384   ->    80 x  80 x 128 0.629 BF
 100 conv    256  3 x 3 / 1    80 x  80 x 128   ->    80 x  80 x 256 3.775 BF
 101 conv    128  1 x 1 / 1    80 x  80 x 256   ->    80 x  80 x 128 0.419 BF
 102 conv    256  3 x 3 / 1    80 x  80 x 128   ->    80 x  80 x 256 3.775 BF
 103 conv    128  1 x 1 / 1    80 x  80 x 256   ->    80 x  80 x 128 0.419 BF
 104 conv    256  3 x 3 / 1    80 x  80 x 128   ->    80 x  80 x 256 3.775 BF
 105 conv    255  1 x 1 / 1    80 x  80 x 256   ->    80 x  80 x 255 0.836 BF
 106 yolo
Total BFLOPS 155.891 
Loading weights from darknet/models/YOLOv3-coco/network.weights...
 seen 64 
Done!
[New Thread 0x7fff2cdfe700 (LWP 1777)]
[New Thread 0x7ffef4dfd700 (LWP 1778)]
[New Thread 0x7ffee8dfd700 (LWP 1779)]
[New Thread 0x7ffee0dfd700 (LWP 1780)]
[New Thread 0x7ffed4dfd700 (LWP 1781)]
[New Thread 0x7ffec5fff700 (LWP 1782)]
[New Thread 0x7ffec57fe700 (LWP 1783)]
Resized to 1920:1088
 try to allocate workspace = 150405121 * sizeof(float),  CUDA allocate done! 
CUDA Error: invalid argument
python: ./src/cuda.c:36: check_error: Assertion `0' failed.

Thread 20 "python" received signal SIGABRT, Aborted.
Bug fixed

Most helpful comment

Hi All
The main problem is incorrect memory allocation for input_state_gpu

network.c: parse_network_cfg_custom
            net.input_state_gpu = cuda_make_array(0, size);  <-- size from cfg file

In resize_network input_state_gpu is not reallocated and used in network_predict_gpu with old size.
Quick, but not tested and validated solution: add some code to resize_network

int resize_network(network *net, int w, int h)
{
#ifdef GPU
    cuda_set_device(net->gpu_index);
    if(gpu_index >= 0){
        cuda_free(net->workspace);
        if (net->input_gpu) {
            cuda_free(*net->input_gpu);
            *net->input_gpu = 0;
            cuda_free(*net->truth_gpu);
            *net->truth_gpu = 0;
        }
// Insert start
        if (net->input_state_gpu) cuda_free(net->input_state_gpu);
        if (net->input_pinned_cpu) {   // CPU
            if (net->input_pinned_cpu_flag) cudaFreeHost(net->input_pinned_cpu);
            else free(net->input_pinned_cpu);**
        }
// Insert end
    }
#endif
...

#ifdef GPU
    if(gpu_index >= 0){
        printf(" try to allocate workspace = %zu * sizeof(float), ", workspace_size / sizeof(float) + 1);
        net->workspace = cuda_make_array(0, workspace_size/sizeof(float) + 1);

// Insert start
        int size = get_network_input_size(*net) * net->batch;
        net->input_state_gpu = cuda_make_array(0, size);
        if (cudaSuccess == cudaHostAlloc(&net->input_pinned_cpu, size * sizeof(float), cudaHostRegisterMapped)) net->input_pinned_cpu_flag = 1;
        else net->input_pinned_cpu = calloc(size, sizeof(float));**
// Insert end

        printf(" CUDA allocate done! \n");
    }else {
        free(net->workspace);
        net->workspace = calloc(1, workspace_size);
    }
#else
    free(net->workspace);
    net->workspace = calloc(1, workspace_size);
#endif
    //fprintf(stderr, " Done!\n");
    return 0;
}

All 8 comments

@kossolax Hi,

  • Yes, I'm calling this function from python.
  • Resizing from 1920:1088 to 640:480 seems to works. (Same as 640:640 to 640:480)
  • I don't have a dataset ready to train yet, detection works correctly. ( ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./models/YOLOv3-coco/network.weights ~/test/test.jpg )

Did you add LIB_API int resize_network(network *net, int w, int h); to the darknet.h, since there is no resize_network API-function in this repo:
https://github.com/AlexeyAB/darknet/blob/920d792a0c34c70c722caf1db23792e81e51af5b/include/darknet.h

Resizing from 1920:1088 to 640:480 seems to works. (Same as 640:640 to 640:480)

So resize from 1920x1088 to 640:640 works. But resize from 640:640 to 1920x1088 doesn't work?

I guess python is linking to this: https://github.com/AlexeyAB/darknet/blob/master/src/network.h#L140 / https://github.com/AlexeyAB/darknet/blob/master/src/network.c#L393

self.lib.resize_network.argtypes = [c_void_p, c_int, c_int]
self.lib.resize_network.restype = c_int

network_height and network_width has a correct value after resizing the network.

So resize from 1920x180 to 640:640 works. But resize from 640:640 to 1920x180 doesn't work?

Resize from 1920x1088 to 640:640 works, but resize from 640:640 to 1920x1088 doesn't work.

Hi @AlexeyAB ,

Thanks for your commit https://github.com/AlexeyAB/darknet/commit/6e99e852ffce7d6cf9e9ec427ff3acd003cc8d5b , but sadly it doesn't fix my issue. I've pulled to your last commit and added input_pinned_cpu and input_pinned_cpu_flag into network structure to dump value in python. input_pinned_cpu_flag = 1 before and after the resize.

I've tried with and without HALF_CUDNN, the crash still occur on network_predict

Hi All
The main problem is incorrect memory allocation for input_state_gpu

network.c: parse_network_cfg_custom
            net.input_state_gpu = cuda_make_array(0, size);  <-- size from cfg file

In resize_network input_state_gpu is not reallocated and used in network_predict_gpu with old size.
Quick, but not tested and validated solution: add some code to resize_network

int resize_network(network *net, int w, int h)
{
#ifdef GPU
    cuda_set_device(net->gpu_index);
    if(gpu_index >= 0){
        cuda_free(net->workspace);
        if (net->input_gpu) {
            cuda_free(*net->input_gpu);
            *net->input_gpu = 0;
            cuda_free(*net->truth_gpu);
            *net->truth_gpu = 0;
        }
// Insert start
        if (net->input_state_gpu) cuda_free(net->input_state_gpu);
        if (net->input_pinned_cpu) {   // CPU
            if (net->input_pinned_cpu_flag) cudaFreeHost(net->input_pinned_cpu);
            else free(net->input_pinned_cpu);**
        }
// Insert end
    }
#endif
...

#ifdef GPU
    if(gpu_index >= 0){
        printf(" try to allocate workspace = %zu * sizeof(float), ", workspace_size / sizeof(float) + 1);
        net->workspace = cuda_make_array(0, workspace_size/sizeof(float) + 1);

// Insert start
        int size = get_network_input_size(*net) * net->batch;
        net->input_state_gpu = cuda_make_array(0, size);
        if (cudaSuccess == cudaHostAlloc(&net->input_pinned_cpu, size * sizeof(float), cudaHostRegisterMapped)) net->input_pinned_cpu_flag = 1;
        else net->input_pinned_cpu = calloc(size, sizeof(float));**
// Insert end

        printf(" CUDA allocate done! \n");
    }else {
        free(net->workspace);
        net->workspace = calloc(1, workspace_size);
    }
#else
    free(net->workspace);
    net->workspace = calloc(1, workspace_size);
#endif
    //fprintf(stderr, " Done!\n");
    return 0;
}

@Napilnic Thanks! I fixed it.

I confirm it's fixed :tada: thanks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

qianyunw picture qianyunw  路  3Comments

rezaabdullah picture rezaabdullah  路  3Comments

Jacky3213 picture Jacky3213  路  3Comments

Greta-A picture Greta-A  路  3Comments

yongcong1415 picture yongcong1415  路  3Comments