Hello @AlexeyAB
After doing resize_network to fit image resolution (according 32 pixels rules), it will crash upon network_predict. The GPU has enough memory, I've tried with VGA resolution as well. This works on pjreddie/darknet branch or with GPU=0.
Thanks,
Try to load cfg: darknet/models/YOLOv3-coco/network.cfg, weights: darknet/models/YOLOv3-coco/network.weights, clear = 0
layer filters size input output
0 [New Thread 0x7fff416c4700 (LWP 1775)]
[New Thread 0x7fff40ec3700 (LWP 1776)]
conv 32 3 x 3 / 1 640 x 640 x 3 -> 640 x 640 x 32 0.708 BF
1 conv 64 3 x 3 / 2 640 x 640 x 32 -> 320 x 320 x 64 3.775 BF
2 conv 32 1 x 1 / 1 320 x 320 x 64 -> 320 x 320 x 32 0.419 BF
3 conv 64 3 x 3 / 1 320 x 320 x 32 -> 320 x 320 x 64 3.775 BF
4 Shortcut Layer: 1
5 conv 128 3 x 3 / 2 320 x 320 x 64 -> 160 x 160 x 128 3.775 BF
6 conv 64 1 x 1 / 1 160 x 160 x 128 -> 160 x 160 x 64 0.419 BF
7 conv 128 3 x 3 / 1 160 x 160 x 64 -> 160 x 160 x 128 3.775 BF
8 Shortcut Layer: 5
9 conv 64 1 x 1 / 1 160 x 160 x 128 -> 160 x 160 x 64 0.419 BF
10 conv 128 3 x 3 / 1 160 x 160 x 64 -> 160 x 160 x 128 3.775 BF
11 Shortcut Layer: 8
12 conv 256 3 x 3 / 2 160 x 160 x 128 -> 80 x 80 x 256 3.775 BF
13 conv 128 1 x 1 / 1 80 x 80 x 256 -> 80 x 80 x 128 0.419 BF
14 conv 256 3 x 3 / 1 80 x 80 x 128 -> 80 x 80 x 256 3.775 BF
15 Shortcut Layer: 12
16 conv 128 1 x 1 / 1 80 x 80 x 256 -> 80 x 80 x 128 0.419 BF
17 conv 256 3 x 3 / 1 80 x 80 x 128 -> 80 x 80 x 256 3.775 BF
18 Shortcut Layer: 15
19 conv 128 1 x 1 / 1 80 x 80 x 256 -> 80 x 80 x 128 0.419 BF
20 conv 256 3 x 3 / 1 80 x 80 x 128 -> 80 x 80 x 256 3.775 BF
21 Shortcut Layer: 18
22 conv 128 1 x 1 / 1 80 x 80 x 256 -> 80 x 80 x 128 0.419 BF
23 conv 256 3 x 3 / 1 80 x 80 x 128 -> 80 x 80 x 256 3.775 BF
24 Shortcut Layer: 21
25 conv 128 1 x 1 / 1 80 x 80 x 256 -> 80 x 80 x 128 0.419 BF
26 conv 256 3 x 3 / 1 80 x 80 x 128 -> 80 x 80 x 256 3.775 BF
27 Shortcut Layer: 24
28 conv 128 1 x 1 / 1 80 x 80 x 256 -> 80 x 80 x 128 0.419 BF
29 conv 256 3 x 3 / 1 80 x 80 x 128 -> 80 x 80 x 256 3.775 BF
30 Shortcut Layer: 27
31 conv 128 1 x 1 / 1 80 x 80 x 256 -> 80 x 80 x 128 0.419 BF
32 conv 256 3 x 3 / 1 80 x 80 x 128 -> 80 x 80 x 256 3.775 BF
33 Shortcut Layer: 30
34 conv 128 1 x 1 / 1 80 x 80 x 256 -> 80 x 80 x 128 0.419 BF
35 conv 256 3 x 3 / 1 80 x 80 x 128 -> 80 x 80 x 256 3.775 BF
36 Shortcut Layer: 33
37 conv 512 3 x 3 / 2 80 x 80 x 256 -> 40 x 40 x 512 3.775 BF
38 conv 256 1 x 1 / 1 40 x 40 x 512 -> 40 x 40 x 256 0.419 BF
39 conv 512 3 x 3 / 1 40 x 40 x 256 -> 40 x 40 x 512 3.775 BF
40 Shortcut Layer: 37
41 conv 256 1 x 1 / 1 40 x 40 x 512 -> 40 x 40 x 256 0.419 BF
42 conv 512 3 x 3 / 1 40 x 40 x 256 -> 40 x 40 x 512 3.775 BF
43 Shortcut Layer: 40
44 conv 256 1 x 1 / 1 40 x 40 x 512 -> 40 x 40 x 256 0.419 BF
45 conv 512 3 x 3 / 1 40 x 40 x 256 -> 40 x 40 x 512 3.775 BF
46 Shortcut Layer: 43
47 conv 256 1 x 1 / 1 40 x 40 x 512 -> 40 x 40 x 256 0.419 BF
48 conv 512 3 x 3 / 1 40 x 40 x 256 -> 40 x 40 x 512 3.775 BF
49 Shortcut Layer: 46
50 conv 256 1 x 1 / 1 40 x 40 x 512 -> 40 x 40 x 256 0.419 BF
51 conv 512 3 x 3 / 1 40 x 40 x 256 -> 40 x 40 x 512 3.775 BF
52 Shortcut Layer: 49
53 conv 256 1 x 1 / 1 40 x 40 x 512 -> 40 x 40 x 256 0.419 BF
54 conv 512 3 x 3 / 1 40 x 40 x 256 -> 40 x 40 x 512 3.775 BF
55 Shortcut Layer: 52
56 conv 256 1 x 1 / 1 40 x 40 x 512 -> 40 x 40 x 256 0.419 BF
57 conv 512 3 x 3 / 1 40 x 40 x 256 -> 40 x 40 x 512 3.775 BF
58 Shortcut Layer: 55
59 conv 256 1 x 1 / 1 40 x 40 x 512 -> 40 x 40 x 256 0.419 BF
60 conv 512 3 x 3 / 1 40 x 40 x 256 -> 40 x 40 x 512 3.775 BF
61 Shortcut Layer: 58
62 conv 1024 3 x 3 / 2 40 x 40 x 512 -> 20 x 20 x1024 3.775 BF
63 conv 512 1 x 1 / 1 20 x 20 x1024 -> 20 x 20 x 512 0.419 BF
64 conv 1024 3 x 3 / 1 20 x 20 x 512 -> 20 x 20 x1024 3.775 BF
65 Shortcut Layer: 62
66 conv 512 1 x 1 / 1 20 x 20 x1024 -> 20 x 20 x 512 0.419 BF
67 conv 1024 3 x 3 / 1 20 x 20 x 512 -> 20 x 20 x1024 3.775 BF
68 Shortcut Layer: 65
69 conv 512 1 x 1 / 1 20 x 20 x1024 -> 20 x 20 x 512 0.419 BF
70 conv 1024 3 x 3 / 1 20 x 20 x 512 -> 20 x 20 x1024 3.775 BF
71 Shortcut Layer: 68
72 conv 512 1 x 1 / 1 20 x 20 x1024 -> 20 x 20 x 512 0.419 BF
73 conv 1024 3 x 3 / 1 20 x 20 x 512 -> 20 x 20 x1024 3.775 BF
74 Shortcut Layer: 71
75 conv 512 1 x 1 / 1 20 x 20 x1024 -> 20 x 20 x 512 0.419 BF
76 conv 1024 3 x 3 / 1 20 x 20 x 512 -> 20 x 20 x1024 3.775 BF
77 conv 512 1 x 1 / 1 20 x 20 x1024 -> 20 x 20 x 512 0.419 BF
78 conv 1024 3 x 3 / 1 20 x 20 x 512 -> 20 x 20 x1024 3.775 BF
79 conv 512 1 x 1 / 1 20 x 20 x1024 -> 20 x 20 x 512 0.419 BF
80 conv 1024 3 x 3 / 1 20 x 20 x 512 -> 20 x 20 x1024 3.775 BF
81 conv 255 1 x 1 / 1 20 x 20 x1024 -> 20 x 20 x 255 0.209 BF
82 yolo
83 route 79
84 conv 256 1 x 1 / 1 20 x 20 x 512 -> 20 x 20 x 256 0.105 BF
85 upsample 2x 20 x 20 x 256 -> 40 x 40 x 256
86 route 85 61
87 conv 256 1 x 1 / 1 40 x 40 x 768 -> 40 x 40 x 256 0.629 BF
88 conv 512 3 x 3 / 1 40 x 40 x 256 -> 40 x 40 x 512 3.775 BF
89 conv 256 1 x 1 / 1 40 x 40 x 512 -> 40 x 40 x 256 0.419 BF
90 conv 512 3 x 3 / 1 40 x 40 x 256 -> 40 x 40 x 512 3.775 BF
91 conv 256 1 x 1 / 1 40 x 40 x 512 -> 40 x 40 x 256 0.419 BF
92 conv 512 3 x 3 / 1 40 x 40 x 256 -> 40 x 40 x 512 3.775 BF
93 conv 255 1 x 1 / 1 40 x 40 x 512 -> 40 x 40 x 255 0.418 BF
94 yolo
95 route 91
96 conv 128 1 x 1 / 1 40 x 40 x 256 -> 40 x 40 x 128 0.105 BF
97 upsample 2x 40 x 40 x 128 -> 80 x 80 x 128
98 route 97 36
99 conv 128 1 x 1 / 1 80 x 80 x 384 -> 80 x 80 x 128 0.629 BF
100 conv 256 3 x 3 / 1 80 x 80 x 128 -> 80 x 80 x 256 3.775 BF
101 conv 128 1 x 1 / 1 80 x 80 x 256 -> 80 x 80 x 128 0.419 BF
102 conv 256 3 x 3 / 1 80 x 80 x 128 -> 80 x 80 x 256 3.775 BF
103 conv 128 1 x 1 / 1 80 x 80 x 256 -> 80 x 80 x 128 0.419 BF
104 conv 256 3 x 3 / 1 80 x 80 x 128 -> 80 x 80 x 256 3.775 BF
105 conv 255 1 x 1 / 1 80 x 80 x 256 -> 80 x 80 x 255 0.836 BF
106 yolo
Total BFLOPS 155.891
Loading weights from darknet/models/YOLOv3-coco/network.weights...
seen 64
Done!
[New Thread 0x7fff2cdfe700 (LWP 1777)]
[New Thread 0x7ffef4dfd700 (LWP 1778)]
[New Thread 0x7ffee8dfd700 (LWP 1779)]
[New Thread 0x7ffee0dfd700 (LWP 1780)]
[New Thread 0x7ffed4dfd700 (LWP 1781)]
[New Thread 0x7ffec5fff700 (LWP 1782)]
[New Thread 0x7ffec57fe700 (LWP 1783)]
Resized to 1920:1088
try to allocate workspace = 150405121 * sizeof(float), CUDA allocate done!
CUDA Error: invalid argument
python: ./src/cuda.c:36: check_error: Assertion `0' failed.
Thread 20 "python" received signal SIGABRT, Aborted.
@kossolax Hi,
Do you try to call int resize_network(network *net, int w, int h); from your Python code? https://github.com/pjreddie/darknet/blob/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/include/darknet.h#L706
Does it work for resizing to smaller resolution?
Does it work if you try to use width=1920 height=1088 random=1 and just run training as usual?
/darknet detector train cfg/coco.data yolov3.cfg darknet53.conv.74
Did you add LIB_API int resize_network(network *net, int w, int h); to the darknet.h, since there is no resize_network API-function in this repo:
https://github.com/AlexeyAB/darknet/blob/920d792a0c34c70c722caf1db23792e81e51af5b/include/darknet.h
Resizing from 1920:1088 to 640:480 seems to works. (Same as 640:640 to 640:480)
So resize from 1920x1088 to 640:640 works. But resize from 640:640 to 1920x1088 doesn't work?
I guess python is linking to this: https://github.com/AlexeyAB/darknet/blob/master/src/network.h#L140 / https://github.com/AlexeyAB/darknet/blob/master/src/network.c#L393
self.lib.resize_network.argtypes = [c_void_p, c_int, c_int]
self.lib.resize_network.restype = c_int
network_height and network_width has a correct value after resizing the network.
So resize from 1920x180 to 640:640 works. But resize from 640:640 to 1920x180 doesn't work?
Resize from 1920x1088 to 640:640 works, but resize from 640:640 to 1920x1088 doesn't work.
Hi @AlexeyAB ,
Thanks for your commit https://github.com/AlexeyAB/darknet/commit/6e99e852ffce7d6cf9e9ec427ff3acd003cc8d5b , but sadly it doesn't fix my issue. I've pulled to your last commit and added input_pinned_cpu and input_pinned_cpu_flag into network structure to dump value in python. input_pinned_cpu_flag = 1 before and after the resize.
I've tried with and without HALF_CUDNN, the crash still occur on network_predict
Hi All
The main problem is incorrect memory allocation for input_state_gpu
network.c: parse_network_cfg_custom
net.input_state_gpu = cuda_make_array(0, size); <-- size from cfg file
In resize_network input_state_gpu is not reallocated and used in network_predict_gpu with old size.
Quick, but not tested and validated solution: add some code to resize_network
int resize_network(network *net, int w, int h)
{
#ifdef GPU
cuda_set_device(net->gpu_index);
if(gpu_index >= 0){
cuda_free(net->workspace);
if (net->input_gpu) {
cuda_free(*net->input_gpu);
*net->input_gpu = 0;
cuda_free(*net->truth_gpu);
*net->truth_gpu = 0;
}
// Insert start
if (net->input_state_gpu) cuda_free(net->input_state_gpu);
if (net->input_pinned_cpu) { // CPU
if (net->input_pinned_cpu_flag) cudaFreeHost(net->input_pinned_cpu);
else free(net->input_pinned_cpu);**
}
// Insert end
}
#endif
...
#ifdef GPU
if(gpu_index >= 0){
printf(" try to allocate workspace = %zu * sizeof(float), ", workspace_size / sizeof(float) + 1);
net->workspace = cuda_make_array(0, workspace_size/sizeof(float) + 1);
// Insert start
int size = get_network_input_size(*net) * net->batch;
net->input_state_gpu = cuda_make_array(0, size);
if (cudaSuccess == cudaHostAlloc(&net->input_pinned_cpu, size * sizeof(float), cudaHostRegisterMapped)) net->input_pinned_cpu_flag = 1;
else net->input_pinned_cpu = calloc(size, sizeof(float));**
// Insert end
printf(" CUDA allocate done! \n");
}else {
free(net->workspace);
net->workspace = calloc(1, workspace_size);
}
#else
free(net->workspace);
net->workspace = calloc(1, workspace_size);
#endif
//fprintf(stderr, " Done!\n");
return 0;
}
@Napilnic Thanks! I fixed it.
I confirm it's fixed :tada: thanks.
Most helpful comment
Hi All
The main problem is incorrect memory allocation for input_state_gpu
In resize_network input_state_gpu is not reallocated and used in network_predict_gpu with old size.
Quick, but not tested and validated solution: add some code to resize_network