Caffe: Why training loss doesn't decrease in my network?

Created on 10 Jul 2015 · 6Comments · Source: BVLC/caffe

I tried several different network structures in my dataset,while the accuracy on evaluation data stays at 0.247even now it is 208,000 iterations. and loss doesn't decrease.Here is the current output:
I0710 09:28:36.474419 11458 solver.cpp:445] Iteration 208700, lr = 0.0001
I0710 09:28:38.749763 11458 solver.cpp:209] Iteration 208720, loss = 3.27594
I0710 09:28:38.749812 11458 solver.cpp:224] Train net output #0: loss = 3.27594 (* 1 = 3.27594 loss)
I0710 09:28:38.749825 11458 solver.cpp:445] Iteration 208720, lr = 0.0001
I0710 09:28:41.018671 11458 solver.cpp:209] Iteration 208740, loss = 3.25079
I0710 09:28:41.018721 11458 solver.cpp:224] Train net output #0: loss = 3.25079 (* 1 = 3.25079 loss)
I0710 09:28:41.018733 11458 solver.cpp:445] Iteration 208740, lr = 0.0001
I0710 09:28:43.289968 11458 solver.cpp:209] Iteration 208760, loss = 3.27549
I0710 09:28:43.290014 11458 solver.cpp:224] Train net output #0: loss = 3.27549 (* 1 = 3.27549 loss)
I0710 09:28:43.290026 11458 solver.cpp:445] Iteration 208760, lr = 0.0001
I0710 09:28:45.559432 11458 solver.cpp:209] Iteration 208780, loss = 3.26328
I0710 09:28:45.559484 11458 solver.cpp:224] Train net output #0: loss = 3.26328 (* 1 = 3.26328 loss)
I0710 09:28:45.559496 11458 solver.cpp:445] Iteration 208780, lr = 0.0001
I0710 09:28:47.833880 11458 solver.cpp:209] Iteration 208800, loss = 3.51561
I0710 09:28:47.833930 11458 solver.cpp:224] Train net output #0: loss = 3.51561 (* 1 = 3.51561 loss)
I0710 09:28:47.833942 11458 solver.cpp:445] Iteration 208800, lr = 0.0001
I0710 09:28:50.103999 11458 solver.cpp:209] Iteration 208820, loss = 3.45217
I0710 09:28:50.104049 11458 solver.cpp:224] Train net output #0: loss = 3.45217 (* 1 = 3.45217 loss)
I0710 09:28:50.104061 11458 solver.cpp:445] Iteration 208820, lr = 0.0001
I0710 09:28:52.368252 11458 solver.cpp:209] Iteration 208840, loss = 3.32748
I0710 09:28:52.368422 11458 solver.cpp:224] Train net output #0: loss = 3.32748 (* 1 = 3.32748 loss)
I0710 09:28:52.368439 11458 solver.cpp:445] Iteration 208840, lr = 0.0001
I0710 09:28:54.639966 11458 solver.cpp:209] Iteration 208860, loss = 3.30762
I0710 09:28:54.640017 11458 solver.cpp:224] Train net output #0: loss = 3.30762 (* 1 = 3.30762 loss)
I0710 09:28:54.640028 11458 solver.cpp:445] Iteration 208860, lr = 0.0001
I0710 09:28:56.905133 11458 solver.cpp:209] Iteration 208880, loss = 3.35246
I0710 09:28:56.905184 11458 solver.cpp:224] Train net output #0: loss = 3.35246 (* 1 = 3.35246 loss)
I0710 09:28:56.905197 11458 solver.cpp:445] Iteration 208880, lr = 0.0001
I0710 09:28:59.176069 11458 solver.cpp:209] Iteration 208900, loss = 3.21703
I0710 09:28:59.176117 11458 solver.cpp:224] Train net output #0: loss = 3.21703 (* 1 = 3.21703 loss)
I0710 09:28:59.176129 11458 solver.cpp:445] Iteration 208900, lr = 0.0001
I0710 09:29:01.450042 11458 solver.cpp:209] Iteration 208920, loss = 3.33727
I0710 09:29:01.450090 11458 solver.cpp:224] Train net output #0: loss = 3.33727 (* 1 = 3.33727 loss)
I0710 09:29:01.450103 11458 solver.cpp:445] Iteration 208920, lr = 0.0001
I0710 09:29:03.714251 11458 solver.cpp:209] Iteration 208940, loss = 3.41124
I0710 09:29:03.714299 11458 solver.cpp:224] Train net output #0: loss = 3.41124 (* 1 = 3.41124 loss)
I0710 09:29:03.714311 11458 solver.cpp:445] Iteration 208940, lr = 0.0001
I0710 09:29:05.989387 11458 solver.cpp:209] Iteration 208960, loss = 3.22074
I0710 09:29:05.989439 11458 solver.cpp:224] Train net output #0: loss = 3.22074 (* 1 = 3.22074 loss)
I0710 09:29:05.989450 11458 solver.cpp:445] Iteration 208960, lr = 0.0001
I0710 09:29:08.256598 11458 solver.cpp:209] Iteration 208980, loss = 3.33738
I0710 09:29:08.256645 11458 solver.cpp:224] Train net output #0: loss = 3.33738 (* 1 = 3.33738 loss)
I0710 09:29:08.256657 11458 solver.cpp:445] Iteration 208980, lr = 0.0001
I0710 09:29:10.412940 11458 solver.cpp:264] Iteration 209000, Testing net (#0)
I0710 09:30:28.869468 11458 solver.cpp:315] Test net output #0: accuracy = 0.24714
I0710 09:30:28.869618 11458 solver.cpp:315] Test net output #1: loss = 3.31024 (* 1 = 3.31024 loss)
I0710 09:30:28.911180 11458 solver.cpp:209] Iteration 209000, loss = 3.44267
I0710 09:30:28.911221 11458 solver.cpp:224] Train net output #0: loss = 3.44267 (* 1 = 3.44267 loss)
I0710 09:30:28.911237 11458 solver.cpp:445] Iteration 209000, lr = 0.0001
I0710 09:30:31.177317 11458 solver.cpp:209] Iteration 209020, loss = 3.33829
I0710 09:30:31.177368 11458 solver.cpp:224] Train net output #0: loss = 3.33829 (* 1 = 3.33829 loss)
I0710 09:30:31.177381 11458 solver.cpp:445] Iteration 209020, lr = 0.0001
I0710 09:30:33.455963 11458 solver.cpp:209] Iteration 209040, loss = 3.29632
I0710 09:30:33.456009 11458 solver.cpp:224] Train net output #0: loss = 3.29632 (* 1 = 3.29632 loss)
I do not know how I should adjust next ?Can anyone give me some help on how to solve this problem?

My environment is Ubuntu 14.04with GTX 760, CUDA 6.5

Source

wqysq

Most helpful comment

If the loss doesn't decrease, assuming that it was decreasing at some point earlier, that usually means that the learning rate is too large and needs to be decreased. A common strategy is to decrease it by a factor of 10 every time the loss stagnates.

Please ask questions like this on the caffe-users group (https://groups.google.com/forum/#!forum/caffe-users). Github issues are for code issues, not usage and modeling.

seanbell on 10 Jul 2015

👍14

All 6 comments

This is my train_val.prototxt.
name: "FaceNet"
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "examples/imagenet/face_train_lmdb"
backend: LMDB
batch_size: 16
}
transform_param {
crop_size: 20
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
mirror: true
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "examples/imagenet/face_val_lmdb"
backend: LMDB
batch_size: 50
}
transform_param {
crop_size: 20
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
mirror: false
}
include: { phase: TEST }
}
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu1"
type: RELU
bottom: "conv1"
top: "conv1"
}
layers {
name: "norm1"
type: LRN
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "pool1"
type: POOLING
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 1
}
}
layers {
name: "conv2"
type: CONVOLUTION
bottom: "pool1"
top: "conv2"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 64
pad: 0
kernel_size: 1
group: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layers {
name: "relu2"
type: RELU
bottom: "conv2"
top: "conv2"
}
layers {
name: "norm2"
type: LRN
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "pool2"
type: POOLING
bottom: "norm2"
top: "pool2"
pooling_param {
pool: AVE
kernel_size: 2
stride: 1
}
}
layers {
name: "conv3"
type: CONVOLUTION
bottom: "pool2"
top: "conv3"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 128
pad: 0
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu3"
type: RELU
bottom: "conv3"
top: "conv3"
}
layers {
name: "pool3"
type: POOLING
bottom: "conv3"
top: "pool3"
pooling_param {
pool: AVE
kernel_size: 2
stride: 1
}
}
layers {
name: "fc4"
type: INNER_PRODUCT
bottom: "pool3"
top: "fc4"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
inner_product_param {
num_output: 3200
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layers {
name: "relu4"
type: RELU
bottom: "fc4"
top: "fc4"
}
layers {
name: "fc5"
type: INNER_PRODUCT
bottom: "fc4"
top: "fc5"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
inner_product_param {
num_output: 7
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "accuracy"
type: ACCURACY
bottom: "fc5"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}
layers {
name: "loss"
type: HINGE_LOSS
bottom: "fc5"
bottom: "label"
top: "loss"
hinge_loss_param{
norm:L2
}
}

wqysq on 10 Jul 2015

Learning rate is too small?

On Thu, Jul 9, 2015 at 9:47 PM, wqysq [email protected] wrote:

This is my train_val.prototxt.
name: "FaceNet"
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "examples/imagenet/face_train_lmdb"
backend: LMDB
batch_size: 16
}
transform_param {
crop_size: 20
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
mirror: true
}
include: { phase: TRAIN }
}
layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "examples/imagenet/face_val_lmdb"
backend: LMDB
batch_size: 50
}
transform_param {
crop_size: 20
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
mirror: false
}
include: { phase: TEST }
}
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu1"
type: RELU
bottom: "conv1"
top: "conv1"
}
layers {
name: "norm1"
type: LRN
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "pool1"
type: POOLING
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 1
}
}
layers {
name: "conv2"
type: CONVOLUTION
bottom: "pool1"
top: "conv2"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 64
pad: 0
kernel_size: 1
group: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layers {
name: "relu2"
type: RELU
bottom: "conv2"
top: "conv2"
}
layers {
name: "norm2"
type: LRN
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layers {
name: "pool2"
type: POOLING
bottom: "norm2"
top: "pool2"
pooling_param {
pool: AVE
kernel_size: 2
stride: 1
}
}
layers {
name: "conv3"
type: CONVOLUTION
bottom: "pool2"
top: "conv3"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 128
pad: 0
kernel_size: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "relu3"
type: RELU
bottom: "conv3"
top: "conv3"
}
layers {
name: "pool3"
type: POOLING
bottom: "conv3"
top: "pool3"
pooling_param {
pool: AVE
kernel_size: 2
stride: 1
}
}
layers {
name: "fc4"
type: INNER_PRODUCT
bottom: "pool3"
top: "fc4"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
inner_product_param {
num_output: 3200
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layers {
name: "relu4"
type: RELU
bottom: "fc4"
top: "fc4"
}
layers {
name: "fc5"
type: INNER_PRODUCT
bottom: "fc4"
top: "fc5"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
inner_product_param {
num_output: 7
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "accuracy"
type: ACCURACY
bottom: "fc5"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}
layers {
name: "loss"
type: HINGE_LOSS
bottom: "fc5"
bottom: "label"
top: "loss"
hinge_loss_param{
norm:L2
}
}

—
Reply to this email directly or view it on GitHub
https://github.com/BVLC/caffe/issues/2731#issuecomment-120198232.

Best Regards,
Zizhao

zizhaozhang on 10 Jul 2015

Please ask questions like this on the caffe-users group (https://groups.google.com/forum/#!forum/caffe-users). Github issues are for code issues, not usage and modeling.

seanbell on 10 Jul 2015

👍14

@wqysq

weight_decay: 1

I don't know for sure, but on the face of it this could mean the weights completely decay to zero after each iteration. Try deleting this or making it a small number.

jyegerlehner on 13 Jul 2015

@jyegerlehner, @wqysq this is the old format for prototext. Here, weight_decay: 1 inside a layer simply means that the local multiplier is 1. The global weight decay is set in solver.prototxt

In the newer version, the name is a little bit better (decay_mult, https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto#L266) and more obvious that it's a multiplier.

seanbell on 13 Jul 2015

@seanbell OK, thank you for correcting me.

jyegerlehner on 13 Jul 2015

Was this page helpful?

0 / 5 - 0 ratings