Caffe: [Batch Normalization]Loss is not decreased

Created on 18 Nov 2015 · 10Comments · Source: BVLC/caffe

I'd like to use BatchNorm at our network so I added BatchNorm Layer with our convolution layers

layer {
name: "conv1/7x7_s2"
type: "Convolution"
bottom: "data"
top: "conv1/7x7_s2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 3
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.2
}
}
}

"BatchNorm"

layer {
bottom: "conv1/7x7_s2"
top: "conv1/7x7_s2_bn"
name: "conv1/7x7_s2_bn"
type: "BatchNorm"
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}
layer {
name: "conv1/relu_7x7"
type: "ReLU"
bottom: "conv1/7x7_s2_bn"
top: "conv1/7x7_s2_bn"
}
layer {
name: "pool1/3x3_s2"
type: "Pooling"
bottom: "conv1/7x7_s2_bn"
top: "pool1/3x3_s2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}

However, Loss is not decreased.
How to use BatchNorm in caffe?

Source

captainzone

👍2

Most helpful comment

@happyharrycn please, I wanna know what this parameters mean?

param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}

huanghaoyu on 8 Jan 2016

👍15

All 10 comments

In training, you have to set use_global_stats to ~~true~~ false in your batch norm layer so the mean/var will get updated. In testing, set use_global_stats to ~~false~~ true . Here is an example for your layer definition in training.

layer {
bottom: "conv1/7x7_s2"
top: "conv1/7x7_s2_bn"
name: "conv1/7x7_s2_bn"
type: "BatchNorm"
batch_norm_param {
use_global_stats: ~~true~~false
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}
}

happyharrycn on 18 Nov 2015

👍4

Dear happyharrycn

1.Could you explain what "use_global_stats" parameter means?
2.Shoud I modified as false in deploy.txt? and maintain as true in training state?

Thanks

captainzone on 20 Nov 2015

I actually made a mistake in my previous reply. You should set use_global_stats = False in training, and use_global_stats = True in testing (deploy.txt).

When use_global_stats is set to False, the batch normalization layer is tracking the stats (mean/var) of its inputs. This is the desired behavior during training. When use_global_stats is set to True, the layer will use pre-computed stats (learned in training) to normalize the inputs.

happyharrycn on 20 Nov 2015

👍13

Dear happyharrycn

I have a error as below

I1123 22:00:20.378729 5626 caffe.cpp:212] Starting Optimization
I1123 22:00:20.378775 5626 solver.cpp:287] Solving DrivingNet
I1123 22:00:20.378782 5626 solver.cpp:288] Learning Rate Policy: step
I1123 22:00:20.394110 5626 solver.cpp:340] Iteration 0, Testing net (#0)
I1123 22:00:20.570456 5626 solver.cpp:408] Test net output #0: bb-loss = 1.99914 (* 10 = 19.9914 loss)
I1123 22:00:20.570492 5626 solver.cpp:408] Test net output #1: pixel-loss = 0.689463 (* 1 = 0.689463 loss)
F1123 22:00:21.310832 5626 batch_norm_layer.cu:95] Check failed: !use_global_stats_
* Check failure stack trace: *
@ 0x7efd1cd05ea4 (unknown)
@ 0x7efd1cd05deb (unknown)
@ 0x7efd1cd057bf (unknown)
@ 0x7efd1cd08a35 (unknown)
@ 0x7efd1d4950dd caffe::BatchNormLayer<>::Backward_gpu()
@ 0x7efd1d37c3fb caffe::Net<>::BackwardFromTo()
@ 0x7efd1d37c45f caffe::Net<>::Backward()
@ 0x7efd1d303748 caffe::Solver<>::Step()
@ 0x7efd1d3043e5 caffe::Solver<>::Solve()
@ 0x409596 train()
@ 0x40571b main
@ 0x7efd1c205a40 (unknown)
@ 0x405eb9 _start
@ (nil) (unknown)

Below is my prototxt.

train_val_obn.txt

captainzone on 23 Nov 2015

I changed use_global_stats: false in training stage.
train_val_obn.txt

captainzone on 23 Nov 2015

@happyharrycn please, I wanna know what this parameters mean?

param {
lr_mult: 0
}
param {
lr_mult: 0
}
param {
lr_mult: 0
}

huanghaoyu on 8 Jan 2016

👍15

For your questions see batch_norm_layer.hpp:

This layer computes Batch Normalization described in [1]. For
each channel in the data (i.e. axis 1), it subtracts the mean and divides
by the variance, where both statistics are computed across both spatial
dimensions and across the different examples in the batch.
*
By default, during training time, the network is computing global mean/
variance statistics via a running average, which is then used at test
time to allow deterministic outputs for each input. You can manually
toggle whether the network is accumulating or using the statistics via the
use_global_stats option. IMPORTANT: for this feature to work, you MUST
set the learning rate to zero for all three parameter blobs, i.e.,
param {lr_mult: 0} three times in the layer definition.

This means by default (as the following is set in batch_norm_layer.cpp), you don't have to set use_global_stats at all in the prototxt.
use_global_stats_ = this->phase_ == TEST;

I am closing this thread, as this for tracking issues with caffes, which this is not. Please use Caffe users lists for that.

rohrbach on 2 Mar 2016

Is it still the case that param {lr_mult: 0} must be set three times in the BN layer definition?

jeremy-rutman on 21 Dec 2016

👍4

@jeremy-rutman I believe it is not necessary to set lr_mult to 0 now, given the following lines in the code.

simbaforrest on 9 Feb 2017

👍3

Is "set use_global_stats = False in training, and use_global_stats = True in testing (deploy.txt)" still required?

affromero on 5 Apr 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

caffe compile error

Ruhjkg · 3Comments

Where is lmdb.h

OpenHero · 3Comments

Which python is better? Python2 or Python3

shiorioxy · 3Comments

RCNN tutorial Selective_search issue: .squeeze(axis=(2, 3)) axis out of bounds

LarsHH · 3Comments

fatal error: numpy/arrayobject.h: No such file or directory ; when 'make pycaffe' command given

amit-dat · 3Comments