Darknet: Stopbackward is not freezing

Created on 16 Jul 2020  路  7Comments  路  Source: AlexeyAB/darknet

Hi I need to freeze the first 19 layers of my network so i put stopbackward=1 in the layer 19.
I looked at the code and in the backward function the loop is breaking at layer 19 so it's good.
But when I compare weights between two different saved files, the weights are different even in the layers before 19.
Someone have an idea why the weights are not the same before layer 19 ?

bug

Most helpful comment

I was trying to debug this issue (having stopbackward=1 but weights still changing).

I've found that while backward() is skipped for respective layers, the update() is still called and that is the place where weights get changed.
Specifically, on GPU for conv layers in update_convolutional_layer_gpu().

@AlexeyAB Can you comment if this is a bug in stopbackward implementation?
If so, would the fix be to simply skip the update() or is it better to skip specific axpy_ongpu() calls inside the update() (so that batch normalization can still be updated)?

Thanks for any input!


We can influence if update() is executed with existing parameters like dont_update and train_only_bn (the latter will also skip backward so it's even closer to freezing the weights?).

All 7 comments

Hi there, having the same exact phenomenon. Any chance someone could shed some light?

It doesn't freeze batch-norm rolling parameters during forward for training.
But I think we should do this.

So, im seeing that all layers are not really frozen. Yes, they seem frozen for the first 100s of iterations, but, at some point is becomes easy to spot (i calculate the sum of abs-differences of the weights) that even layers before the stopbackward are changing. I even tried removing all batch-norms just to check if this still happens and it does. I would REALLY love your help in somehow freezing and proving its frozen. Here is my CFG -

[net]
# Testing
#batch=32
#subdivisions=8
# Training
batch=32
subdivisions=8
height=448
width=448
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 2000000
policy=steps
steps=200000,250000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
stopbackward=1

#######

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[route]
layers=-9

[convolutional]
batch_normalize=1
size=1
stride=1
pad=1
filters=64
activation=leaky

[reorg]
stride=2

[route]
layers=-1,-4


[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=35
activation=linear


[region]
anchors =  0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
bias_match=1
classes=2
coords=4
num=5
softmax=1
jitter=.3
rescore=1

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1

Many thanks in advance!

@AlexeyAB, any chance you can spot what im doing wrong (that is, not getting the freeze to work)

I was trying to debug this issue (having stopbackward=1 but weights still changing).

I've found that while backward() is skipped for respective layers, the update() is still called and that is the place where weights get changed.
Specifically, on GPU for conv layers in update_convolutional_layer_gpu().

@AlexeyAB Can you comment if this is a bug in stopbackward implementation?
If so, would the fix be to simply skip the update() or is it better to skip specific axpy_ongpu() calls inside the update() (so that batch normalization can still be updated)?

Thanks for any input!


We can influence if update() is executed with existing parameters like dont_update and train_only_bn (the latter will also skip backward so it's even closer to freezing the weights?).

Sorry for the bother, but, @AlexeyAB, i would really appreciate your answer to @Jmennius's latest comment. Would really want to get freezing working as expected (and correctly, of course).

Thanks again in advance!

Sorry for the bother, but, @AlexeyAB, i would really appreciate your answer to @Jmennius's latest comment. Would really want to get freezing working as expected (and correctly, of course).

Thanks again in advance!

@eyalf-st , @Jmennius
Hi, I'm not sure this'd be a correct approach. Just in case, I'm sharing a code snippet of mine.
I've changed both "update_network (of network.c)" and "update_network_gpu(of network_kernels.cu)" as below.
`
// for(i=0; i < net.n; ++i) { // original code

for(i=net.n-1; i>=0; --i) { // new
layer l = net.layers[i];
if (l.stopbackward) break; // new
if (l.onlyforward) continue; // new
...
`

Was this page helpful?
0 / 5 - 0 ratings