Darknet: Dropblock: A regularization method for convolutional networks +1.6 [email protected] (and +1.6 Top1)

Created on 11 Dec 2019 · 26Comments · Source: AlexeyAB/darknet

Dropblock: A regularization method for convolutional networks +1.6 [email protected] (and +1.6 Top1):

paper: https://arxiv.org/abs/1810.12890v1
is used in conjunction with ASFF ASFF: https://github.com/AlexeyAB/darknet/issues/4382 paper: https://arxiv.org/abs/1911.09516v2
is used in conjunction with ASFF label-smoothing: https://github.com/AlexeyAB/darknet/issues/3272#issuecomment-497149618

Is implemented - use: https://github.com/AlexeyAB/darknet/commit/1df3ddc7d6a3efe9401948d3f527f432f3001476 and https://github.com/AlexeyAB/darknet/commit/642c065c0e7c681b90f10394edce9ce315aa60d8

[dropout]
dropblock=1
dropblock_size_abs=7  # block size 7x7
probability=0.1       # this is drop probability = (1 - keep_probability)

alternative way is to use relative block size (for using in Classifier or feature-extractor-backbone):

[dropout]
dropblock=1
dropblock_size=0.6  # 60% of width and height
probability=0.1     # this is drop probability = (1 - keep_probability)

enhancement

Source

AlexeyAB

👍2 👀1

Most helpful comment

Is implemented - use: https://github.com/AlexeyAB/darknet/commit/1df3ddc7d6a3efe9401948d3f527f432f3001476

[dropout]
dropblock=1
dropblock_size=0.6  # 60% of width and height
probability=0.1     # this is drop probability = (1 - keep_probability)

AlexeyAB on 11 Dec 2019

👍2

All 26 comments

Is implemented - use: https://github.com/AlexeyAB/darknet/commit/1df3ddc7d6a3efe9401948d3f527f432f3001476

[dropout]
dropblock=1
dropblock_size=0.6  # 60% of width and height
probability=0.1     # this is drop probability = (1 - keep_probability)

AlexeyAB on 11 Dec 2019

👍2

please give an corresponding cfg file.

tuteming on 11 Dec 2019

@WongKinYiu

I accelerated DropBlock on GPU.

Also,

for small mini_batch size, the batchnormalization does regularization, so DropBlock isn't required
for big mini_batch size, the batchnormalization doesn't do regularization, so DropBlock is required

So if Intra-Batch-Normalization (IBN part of CBN) will work well https://github.com/AlexeyAB/darknet/issues/4386#issuecomment-587981103
then we can increase mini_batch and accuracy IBN +~1-2% AP/Top1 by increasing batch= in cfg,
and also we can use DropBlock +~1-2% AP/Top1 for increasing accuracy more, since for big mini_batch DropBlock is required.

https://medium.com/@ilango100/batch-normalization-speed-up-neural-network-training-245e39a62f85

Regularization by BatchNorm
In addition to fastening up the learning of neural networks, BatchNorm also provides a weak form of regularization. How does it introduce Regularization? Regularization may be caused by introduction of noise to the data. Since the normalization is not performed on the whole dataset and just on the mini-batch, they act as noise.
However BatchNorm provides only a weak regularization, it must not be fully relied upon to avoid over-fitting. Yet, other regularization could be reduced accordingly. For example, if dropout of 0.6 (drop rate) is to be given, with BatchNorm, you can reduce the drop rate to 0.4. BatchNorm provides regularization only when the batch size is small.

AlexeyAB on 21 Feb 2020

@AlexeyAB Thanks,

I have trained the model with dropblock, but it does not improve the accuracy.
I followed the same strategy as what we have done in efficientnet - only add drop layers before shortcut layer.
could you help for providing the better cfg with dropblock layers?
or after i check the performance of cbn, i can modify my previous cfg with cbn if it works well.

WongKinYiu on 22 Feb 2020

@WongKinYiu

I have trained the model with dropblock, but it does not improve the accuracy.

May be because the mini_batch size was small.

The DropBlock can increase accuracy only if it is used with Batch-norm with large mini_batch (with IBN / CBN).

could you help for providing the better cfg with dropblock layers?

Attach your cfg-file with DropBlock.

or after i check the performance of cbn, i can modify my previous cfg with cbn if it works well.

I think we should check DropBlock+CBN after checking CBN.

AlexeyAB on 22 Feb 2020

@AlexeyAB

Our building is being disinfected today, will share the cfg after tomorrow.

WongKinYiu on 22 Feb 2020

@WongKinYiu
I fixed dropblock.

Show cfg-files that you used for training

with DropBlock
with CBN

PS what is the result of ASFF?

AlexeyAB on 23 Feb 2020

@AlexeyAB Hello,

For CBN, I just replace all of batch_normalize=1 to batch_normalize=2 in csresnext50-gamma.cfg.

Will share cfg of dropblock after finish my breakfast.

ASFF can not converge, the loss become higher and higher after 100k epochs.
Same situation occurs on ASFF+RFB after 250k epochs.

WongKinYiu on 24 Feb 2020

@WongKinYiu

ASFF can not converge, the loss become higher and higher after 100k epochs.
Same situation occurs on ASFF+RFB after 250k epochs.

What value of avg loss?
Share cfg-file.

This is strange that @Kyuuki93 trained ASSF successfully: https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-561064425

AlexeyAB on 24 Feb 2020

if the epoch number is same as https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-561064425, it does not have nan issue.

csresnext50-alpha.cfg.txt

WongKinYiu on 24 Feb 2020

@WongKinYiu

Same situation occurs on ASFF+RFB after 250k epochs

Do you mean that you got Nan at 250k iterations?
What mAP did you get at 100k and 200k iterations?
Attach cfg-file with ASFF+RFB

https://github.com/AlexeyAB/darknet/issues/4406#issuecomment-583789600

80k: 9.2
90k: 13.6
100k: 20.3
250k: Nan

Your DropBlock usage isn't the same as in the original paper: https://arxiv.org/abs/1810.12890v1

original paper: DropBlock is applied only in 2 groups 3 & 4, on conv and on residual-connections
your cfg-file: DropBlock is applied in 13 places before residual-connection

AlexeyAB on 24 Feb 2020

@WongKinYiu

could you help for providing the better cfg with dropblock layers?

Try to run training this cfg = CBN + DropBlock: csresnext50-gamma_dropblock_cbn.cfg.txt

Run training now, do not wait for the completion of the training of the CBN-model.

There are used:

[net]
batch=512
subdivisions=16
max_batches=300000

....
[convolutional]
batch_normalize=2
....
# for Group-3
[dropout]
dropblock=1
dropblock_size_abs=7
probability=0.025
....
# for Group-4
[dropout]
dropblock=1
dropblock_size_abs=7
probability=0.1

AlexeyAB on 24 Feb 2020

WongKinYiu on 24 Feb 2020

@WongKinYiu
I fixed gradient calculation for ASFF, so you can train activation=normalize_channels_softmax_maxval with the new code.

AlexeyAB on 24 Feb 2020

@AlexeyAB

Thanks, my previous modified version is training about 110k epochs, if it still get nan, i ll use new code to retrain.

WongKinYiu on 25 Feb 2020

@WongKinYiu Hi,

Have you restarted training models: ASFF, BiFPN csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt, weighted-shortcut csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt after this commit https://github.com/AlexeyAB/darknet/commit/f6baa62c9b6151b9f615a1e56434d237553fd4af Feb 24, 2020?

Also it seems that iou_thresh=0.213 degrades accuracy: https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/coco/results.md#mscoco

While scale_x_y=1.05/1.20 decreases AP50 & AP75, but keeps the same [email protected]. It seems that it increases AP95. Can you check AP95 for baseline model and scale_x_y model?
Or better show whole accuracy output from evaluation server for both models.

AlexeyAB on 27 Feb 2020

@AlexeyAB Hello,

Yes, I restart training bifpn models, but i use leaky instead linear activation function.
as i remember, csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt using Feb 21, 2020's repo. csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt using Feb 18, 2020's repo.

Currently, i can only make sure genetic , mosaic, and ciou loss benefit ap on coco.
i ll train model with genetic , mosaic, and ciou loss after get free gpu.

the results are obtained by test-dev set on codalab, so i can not get ap95.
i can check ap95 on val set next Monday.

WongKinYiu on 27 Feb 2020

@WongKinYiu

as i remember, i use Feb 21, 2020's repo.

If you use Feb 21, please update code to Feb 24 and restart training, there was fixed ASFF, BiFPN, DropBlock: https://github.com/AlexeyAB/darknet/issues/4662#issuecomment-590438709

Yes, I restart training bifpn models,

And weighted-shortcut.

i can check ap95 on val set next Monday.

Ok, please check Ap50, 75, 95 and AP50...95 on both models.

AlexeyAB on 27 Feb 2020

@AlexeyAB

Do i need restart from the first epoch, or i can continue training from current epochs?

WongKinYiu on 27 Feb 2020

@WongKinYiu
You need to restart from the first epoch:

ASFF
BiFPN: csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt
weighted-shortcut: csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt

AlexeyAB on 27 Feb 2020

@AlexeyAB Thanks.

WongKinYiu on 27 Feb 2020

@AlexeyAB

BiFPN: restart csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt
weighted-shortcut: restart csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt and csdarknet53-ws.cfg.txt
DropBlock: restart csresnext50-gamma_dropblock_cbn.cfg.txt
ASFF: stop training csresnext50-asff-rfbn.cfg, wait for free gpu.

WongKinYiu on 27 Feb 2020

👍1

@WongKinYiu Ok!

AlexeyAB on 28 Feb 2020

csresnext50-gamma_dropblock_cbn.cfg.txt 47.3 top-1, 72.4 top-5.

WongKinYiu on 20 Mar 2020

👀2

@WongKinYiu CBN or both DropBlock and CBN seem to be working poorly.

AlexeyAB on 20 Mar 2020

👀1

@WongKinYiu @AlexeyAB
I'm curious if you manage to check the performance of cbn or cbn + DropBlock properly?
Not having to worry about mini-batch size could be great !