Darknet: Dropblock: A regularization method for convolutional networks +1.6 [email protected] (and +1.6 Top1)

Created on 11 Dec 2019  路  26Comments  路  Source: AlexeyAB/darknet

Dropblock: A regularization method for convolutional networks +1.6 [email protected] (and +1.6 Top1):

Is implemented - use: https://github.com/AlexeyAB/darknet/commit/1df3ddc7d6a3efe9401948d3f527f432f3001476 and https://github.com/AlexeyAB/darknet/commit/642c065c0e7c681b90f10394edce9ce315aa60d8

[dropout]
dropblock=1
dropblock_size_abs=7  # block size 7x7
probability=0.1       # this is drop probability = (1 - keep_probability)

alternative way is to use relative block size (for using in Classifier or feature-extractor-backbone):

[dropout]
dropblock=1
dropblock_size=0.6  # 60% of width and height
probability=0.1     # this is drop probability = (1 - keep_probability)

image


image


image


image


image

enhancement

Most helpful comment

Is implemented - use: https://github.com/AlexeyAB/darknet/commit/1df3ddc7d6a3efe9401948d3f527f432f3001476

[dropout]
dropblock=1
dropblock_size=0.6  # 60% of width and height
probability=0.1     # this is drop probability = (1 - keep_probability)

All 26 comments

Is implemented - use: https://github.com/AlexeyAB/darknet/commit/1df3ddc7d6a3efe9401948d3f527f432f3001476

[dropout]
dropblock=1
dropblock_size=0.6  # 60% of width and height
probability=0.1     # this is drop probability = (1 - keep_probability)

please give an corresponding cfg file.

@WongKinYiu

I accelerated DropBlock on GPU.


Also,

  • for small mini_batch size, the batchnormalization does regularization, so DropBlock isn't required
  • for big mini_batch size, the batchnormalization doesn't do regularization, so DropBlock is required

So if Intra-Batch-Normalization (IBN part of CBN) will work well https://github.com/AlexeyAB/darknet/issues/4386#issuecomment-587981103
then we can increase mini_batch and accuracy IBN +~1-2% AP/Top1 by increasing batch= in cfg,
and also we can use DropBlock +~1-2% AP/Top1 for increasing accuracy more, since for big mini_batch DropBlock is required.

https://medium.com/@ilango100/batch-normalization-speed-up-neural-network-training-245e39a62f85

Regularization by BatchNorm
In addition to fastening up the learning of neural networks, BatchNorm also provides a weak form of regularization. How does it introduce Regularization? Regularization may be caused by introduction of noise to the data. Since the normalization is not performed on the whole dataset and just on the mini-batch, they act as noise.
However BatchNorm provides only a weak regularization, it must not be fully relied upon to avoid over-fitting. Yet, other regularization could be reduced accordingly. For example, if dropout of 0.6 (drop rate) is to be given, with BatchNorm, you can reduce the drop rate to 0.4. BatchNorm provides regularization only when the batch size is small.

@AlexeyAB Thanks,

I have trained the model with dropblock, but it does not improve the accuracy.
I followed the same strategy as what we have done in efficientnet - only add drop layers before shortcut layer.
could you help for providing the better cfg with dropblock layers?
or after i check the performance of cbn, i can modify my previous cfg with cbn if it works well.

@WongKinYiu

I have trained the model with dropblock, but it does not improve the accuracy.

May be because the mini_batch size was small.

The DropBlock can increase accuracy only if it is used with Batch-norm with large mini_batch (with IBN / CBN).

could you help for providing the better cfg with dropblock layers?

Attach your cfg-file with DropBlock.

or after i check the performance of cbn, i can modify my previous cfg with cbn if it works well.

I think we should check DropBlock+CBN after checking CBN.

@AlexeyAB

Our building is being disinfected today, will share the cfg after tomorrow.

@WongKinYiu
I fixed dropblock.

Show cfg-files that you used for training

  • with DropBlock
  • with CBN

PS what is the result of ASFF?

@AlexeyAB Hello,

For CBN, I just replace all of batch_normalize=1 to batch_normalize=2 in csresnext50-gamma.cfg.

Will share cfg of dropblock after finish my breakfast.

ASFF can not converge, the loss become higher and higher after 100k epochs.
Same situation occurs on ASFF+RFB after 250k epochs.

@WongKinYiu

ASFF can not converge, the loss become higher and higher after 100k epochs.
Same situation occurs on ASFF+RFB after 250k epochs.

What value of avg loss?
Share cfg-file.

This is strange that @Kyuuki93 trained ASSF successfully: https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-561064425

if the epoch number is same as https://github.com/AlexeyAB/darknet/issues/3874#issuecomment-561064425, it does not have nan issue.

csresnext50-alpha.cfg.txt

@WongKinYiu

Same situation occurs on ASFF+RFB after 250k epochs

  • Do you mean that you got Nan at 250k iterations?
  • What mAP did you get at 100k and 200k iterations?
  • Attach cfg-file with ASFF+RFB

https://github.com/AlexeyAB/darknet/issues/4406#issuecomment-583789600

80k: 9.2
90k: 13.6
100k: 20.3
250k: Nan


Your DropBlock usage isn't the same as in the original paper: https://arxiv.org/abs/1810.12890v1

  • original paper: DropBlock is applied only in 2 groups 3 & 4, on conv and on residual-connections
  • your cfg-file: DropBlock is applied in 13 places before residual-connection

@WongKinYiu

could you help for providing the better cfg with dropblock layers?

Try to run training this cfg = CBN + DropBlock: csresnext50-gamma_dropblock_cbn.cfg.txt

Run training now, do not wait for the completion of the training of the CBN-model.


There are used:

[net]
batch=512
subdivisions=16
max_batches=300000

....
[convolutional]
batch_normalize=2
....
# for Group-3
[dropout]
dropblock=1
dropblock_size_abs=7
probability=0.025
....
# for Group-4
[dropout]
dropblock=1
dropblock_size_abs=7
probability=0.1

OK

@WongKinYiu
I fixed gradient calculation for ASFF, so you can train activation=normalize_channels_softmax_maxval with the new code.

@AlexeyAB

Thanks, my previous modified version is training about 110k epochs, if it still get nan, i ll use new code to retrain.

@WongKinYiu Hi,

Have you restarted training models: ASFF, BiFPN csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt, weighted-shortcut csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt after this commit https://github.com/AlexeyAB/darknet/commit/f6baa62c9b6151b9f615a1e56434d237553fd4af Feb 24, 2020?


Also it seems that iou_thresh=0.213 degrades accuracy: https://github.com/WongKinYiu/CrossStagePartialNetworks/blob/master/coco/results.md#mscoco

While scale_x_y=1.05/1.20 decreases AP50 & AP75, but keeps the same [email protected]. It seems that it increases AP95. Can you check AP95 for baseline model and scale_x_y model?
Or better show whole accuracy output from evaluation server for both models.

@AlexeyAB Hello,

Yes, I restart training bifpn models, but i use leaky instead linear activation function.
as i remember, csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt using Feb 21, 2020's repo. csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt using Feb 18, 2020's repo.

Currently, i can only make sure genetic , mosaic, and ciou loss benefit ap on coco.
i ll train model with genetic , mosaic, and ciou loss after get free gpu.

the results are obtained by test-dev set on codalab, so i can not get ap95.
i can check ap95 on val set next Monday.

@WongKinYiu

as i remember, i use Feb 21, 2020's repo.

If you use Feb 21, please update code to Feb 24 and restart training, there was fixed ASFF, BiFPN, DropBlock: https://github.com/AlexeyAB/darknet/issues/4662#issuecomment-590438709

Yes, I restart training bifpn models,

And weighted-shortcut.

i can check ap95 on val set next Monday.

Ok, please check Ap50, 75, 95 and AP50...95 on both models.

@AlexeyAB

Do i need restart from the first epoch, or i can continue training from current epochs?

@WongKinYiu
You need to restart from the first epoch:

  1. ASFF
  2. BiFPN: csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt
  3. weighted-shortcut: csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt

@AlexeyAB Thanks.

@AlexeyAB

  1. BiFPN: restart csdarknet53-bifpn-optimal.cfg.txt and csresnext50-bifpn-optimal.cfg.txt
  2. weighted-shortcut: restart csresnext50-ws-mi2.cfg.txt and csresnext50-ws.cfg.txt and csdarknet53-ws.cfg.txt
  3. DropBlock: restart csresnext50-gamma_dropblock_cbn.cfg.txt
  4. ASFF: stop training csresnext50-asff-rfbn.cfg, wait for free gpu.

@WongKinYiu Ok!

csresnext50-gamma_dropblock_cbn.cfg.txt 47.3 top-1, 72.4 top-5.

@WongKinYiu CBN or both DropBlock and CBN seem to be working poorly.

@WongKinYiu @AlexeyAB
I'm curious if you manage to check the performance of cbn or cbn + DropBlock properly?
Not having to worry about mini-batch size could be great !

Was this page helpful?
0 / 5 - 0 ratings

Related issues

HanSeYeong picture HanSeYeong  路  3Comments

rezaabdullah picture rezaabdullah  路  3Comments

PROGRAMMINGENGINEER-NIKI picture PROGRAMMINGENGINEER-NIKI  路  3Comments

yongcong1415 picture yongcong1415  路  3Comments

Jacky3213 picture Jacky3213  路  3Comments