Caffe: [Batch Normalization]Loss is not decreased

Created on 18 Nov 2015  路  10Comments  路  Source: BVLC/caffe

I'd like to use BatchNorm at our network so I added BatchNorm Layer with our convolution layers

layer {
name: "conv1/7x7_s2"
type: "Convolution"
bottom: "data"
top: "conv1/7x7_s2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 3
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}

"BatchNorm"

layer {
bottom: "conv1/7x7_s2"
top: "conv1/7x7_s2_bn"
name: "conv1/7x7_s2_bn"
type: "BatchNorm"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "conv1/relu_7x7"
type: "ReLU"
bottom: "conv1/7x7_s2_bn"
top: "conv1/7x7_s2_bn"
}
layer {
name: "pool1/3x3_s2"
type: "Pooling"
bottom: "conv1/7x7_s2_bn"
top: "pool1/3x3_s2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}

However, Loss is not decreased.
How to use BatchNorm in caffe?

Most helpful comment

@happyharrycn please, I wanna know what this parameters mean?

param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}

All 10 comments

In training, you have to set use_global_stats to true false in your batch norm layer so the mean/var will get updated. In testing, set use_global_stats to false true . Here is an example for your layer definition in training.

layer {
bottom: "conv1/7x7_s2"
top: "conv1/7x7_s2_bn"
name: "conv1/7x7_s2_bn"
type: "BatchNorm"
batch_norm_param {
use_global_stats: truefalse
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}

Dear happyharrycn

1.Could you explain what "use_global_stats" parameter means?
2.Shoud I modified as false in deploy.txt? and maintain as true in training state?

Thanks

I actually made a mistake in my previous reply. You should set use_global_stats = False in training, and use_global_stats = True in testing (deploy.txt).

When use_global_stats is set to False, the batch normalization layer is tracking the stats (mean/var) of its inputs. This is the desired behavior during training. When use_global_stats is set to True, the layer will use pre-computed stats (learned in training) to normalize the inputs.

Dear happyharrycn

I have a error as below

I1123 22:00:20.378729 5626 caffe.cpp:212] Starting Optimization
I1123 22:00:20.378775 5626 solver.cpp:287] Solving DrivingNet
I1123 22:00:20.378782 5626 solver.cpp:288] Learning Rate Policy: step
I1123 22:00:20.394110 5626 solver.cpp:340] Iteration 0, Testing net (#0)
I1123 22:00:20.570456 5626 solver.cpp:408] Test net output #0: bb-loss = 1.99914 (* 10 = 19.9914 loss)
I1123 22:00:20.570492 5626 solver.cpp:408] Test net output #1: pixel-loss = 0.689463 (* 1 = 0.689463 loss)
F1123 22:00:21.310832 5626 batch_norm_layer.cu:95] Check failed: !use_global_stats_
* Check failure stack trace: *
@ 0x7efd1cd05ea4 (unknown)
@ 0x7efd1cd05deb (unknown)
@ 0x7efd1cd057bf (unknown)
@ 0x7efd1cd08a35 (unknown)
@ 0x7efd1d4950dd caffe::BatchNormLayer<>::Backward_gpu()
@ 0x7efd1d37c3fb caffe::Net<>::BackwardFromTo()
@ 0x7efd1d37c45f caffe::Net<>::Backward()
@ 0x7efd1d303748 caffe::Solver<>::Step()
@ 0x7efd1d3043e5 caffe::Solver<>::Solve()
@ 0x409596 train()
@ 0x40571b main
@ 0x7efd1c205a40 (unknown)
@ 0x405eb9 _start
@ (nil) (unknown)

Below is my prototxt.

train_val_obn.txt

I changed use_global_stats: false in training stage.
train_val_obn.txt

@happyharrycn please, I wanna know what this parameters mean?

param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}

For your questions see batch_norm_layer.hpp:

  • This layer computes Batch Normalization described in [1]. For
  • each channel in the data (i.e. axis 1), it subtracts the mean and divides
  • by the variance, where both statistics are computed across both spatial
  • dimensions and across the different examples in the batch.
    *
  • By default, during training time, the network is computing global mean/
  • variance statistics via a running average, which is then used at test
  • time to allow deterministic outputs for each input. You can manually
  • toggle whether the network is accumulating or using the statistics via the
  • use_global_stats option. IMPORTANT: for this feature to work, you MUST
  • set the learning rate to zero for all three parameter blobs, i.e.,
  • param {lr_mult: 0} three times in the layer definition.

This means by default (as the following is set in batch_norm_layer.cpp), you don't have to set use_global_stats at all in the prototxt.
use_global_stats_ = this->phase_ == TEST;

I am closing this thread, as this for tracking issues with caffes, which this is not. Please use Caffe users lists for that.

Is it still the case that param {lr_mult: 0} must be set three times in the BN layer definition?

@jeremy-rutman I believe it is not necessary to set lr_mult to 0 now, given the following lines in the code.

Is "set use_global_stats = False in training, and use_global_stats = True in testing (deploy.txt)" still required?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

greatgao picture greatgao  路  3Comments

lixin7895123 picture lixin7895123  路  3Comments

shiorioxy picture shiorioxy  路  3Comments

malreddysid picture malreddysid  路  3Comments

prathmeshrmadhu picture prathmeshrmadhu  路  3Comments