Caffe: L2 normalization of a vector

Created on 6 Oct 2014 · 25Comments · Source: BVLC/caffe

Before implementing one more new layer from scratch, I want do double check.
I need to implement a vector normalization of the type z / l2_norm(z) it is there any way of doing this in current caffe-dev (or a related branch in the caffe network) ?

Source

rodrigob

Most helpful comment

@hyojinie https://github.com/happynear/caffe-windows/blob/master/src/caffe/layers/normalize_layer.cpp

ducha-aiki on 10 Oct 2015

👍3

All 25 comments

A few clues about L2 normalization.

futurely on 17 Oct 2014

indeed it seems that only missing ingredient is the element wise division.
I have a cpu version that passes the unit test, will add the GPU version before doing a pull request.

rodrigob on 19 Oct 2014

@rodrigob I'm about to implement L2 normalization as well -- before I duplicate the effort, were you successful?

seanbell on 2 Jan 2015

@rodrigob @seanbell Looking forward to you guys' L2 norm layer, it'll be great if you guys could release it asap, time is money...

sunbaigui on 31 Jan 2015

Maybe try this?

https://github.com/kuprel/caffe/blob/master/src/caffe/layers/normalize_layer.cpp
https://github.com/kuprel/caffe/blob/master/src/caffe/layers/normalize_layer.cu

kuprel on 26 Feb 2015

@kuprel I've seen your code, it seems you've implemented angle loss for pair-wise learning. Does it go well, or could you share some experience on pair-wise learning?

sunbaigui on 28 Feb 2015

Please ask usage questions on the caffe-users list. Thanks!

longjon on 9 May 2015

👎1

@seanbell Hi, is l2-normalization implemented?

hyojinie on 9 Oct 2015

@hyojinie https://github.com/happynear/caffe-windows/blob/master/src/caffe/layers/normalize_layer.cpp

ducha-aiki on 10 Oct 2015

👍3

You can already do L2 normalization using something like this (untested but should work with minor syntax changes, I've used it before):

from caffe import layers as L, params as P

def l2normed(vec, dim):
    """Returns L2-normalized instances of vec; i.e., for each instance x in vec,
    computes  x / ((x ** 2).sum() ** 0.5). Assumes vec has shape N x dim."""
    denom = L.Reduction(vec, axis=1, operation=P.Reduction.SUMSQ)
    denom = L.Power(denom, power=(-0.5))
    denom = L.Reshape(denom, num_axes=0, axis=-1, shape=dict(dim=[1]))
    denom = L.Tile(denom, axis=1, tiles=dim)
    return L.Eltwise(vec, denom, operation=P.Eltwise.PROD)

For numerical stability you might want to change the Power layer to something like L.Power(denom, power=(-0.5), shift=1e-12).

jeffdonahue on 10 Oct 2015

👍1

@ducha-aiki Thanks. Have you used it before?
Would
...
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "nfeat"
type: "NormalizeLayer"
bottom: "fc7"
top: "nfeat"
}

work?

hyojinie on 10 Oct 2015

@hyojinie, yes, it works. But you need to carefully initialize layers after the normalized, gaussian std=0.01 does not work :)

ducha-aiki on 11 Oct 2015

@ducha-aiki Thanks a lot. It works! I am trying to use it right before the loss layer (contrastive loss). I am guessing initialization wouldn't matter much in that case..?

hyojinie on 11 Oct 2015

@ducha-aiki I have tried many gaussian std = 0.1,0.01.....,but it doesn't work. But uniform can do. Do you know why this happens?

MenglaiWang on 6 Jan 2016

👍1

@hyojinie After adding "nfeat" layer and sending it to "fc8" classifier, the softmax loss decreases very slowly when training compared with no Normalize layer. The val accuracy first goes up and then down. Did anyone encounter this situation?

zachzeyuwang on 24 May 2016

👍2

@jeffdonahue How does the division work in that code? The Reshape + Tile magic is a bit too magic for me. :sweat_smile:

sunsided on 10 Nov 2016

@ducha-aiki Can I ask what you mean by "layers after the normalized"? Does it mean all the layers after the normalized layer?

xwang90 on 3 Feb 2017

@xwang90 yes

ducha-aiki on 3 Feb 2017

@MenglaiWang Sorry to bother you. What do you mean by "gaussian std = 0.1,0.01.....,but it doesn't work"? "doesn't work" means low clasification accuracy? Thank you. :)

xwang90 on 3 Feb 2017

@ducha-aiki Thanks a lot. So you mentioned "gaussian std=0.01 does not work", do you
mean that "gaussian std=0.01" would lead to low classification accuracy?
I just meet some low classification issues after incorporating this L2
normalization layer. Thanks again!

On Fri, Feb 3, 2017 at 1:28 PM, Dmytro Mishkin notifications@github.com
wrote:

@xwang90 https://github.com/xwang90 yes

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/BVLC/caffe/issues/1224#issuecomment-277323760, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGBF8odk_Dkjnx_v184OebXumvmKYwMXks5rY3G7gaJpZM4CrQAe
.

xwang90 on 3 Feb 2017

@hyojinie

I too am trying to use it before contrastive loss. Did you manage to get this normalization layer to work?

soulslicer on 18 Jul 2018

For the normalization layer, where is it in the .proto file?

soulslicer on 18 Jul 2018

I did. I have not looked at the original proto file. I added it myself to my proto.

hyojinie on 18 Jul 2018

Can you share with me a prototxt on how you used the norm layer? Also, whats the input and output dimensions as an example?

I have a (N * 128) Feature Vector I need to normalize. Does this layer ingest that and output the same size?

soulslicer on 18 Jul 2018

There's nothing fancy for the proto. (see below //added)
Something like this:
// DEPRECATED: use LayerParameter.
message V1LayerParameter {
repeated string bottom = 2;
repeated string top = 3;
optional string name = 4;
repeated NetStateRule include = 32;
repeated NetStateRule exclude = 33;
enum LayerType {
NONE = 0;
ABSVAL = 35;
ACCURACY = 1;
ARGMAX = 30;
BNLL = 2;
CONCAT = 3;
CONTRASTIVE_LOSS = 37;
CONVOLUTION = 4;
DATA = 5;
DECONVOLUTION = 39;
DROPOUT = 6;
DUMMY_DATA = 32;
EUCLIDEAN_LOSS = 7;
ELTWISE = 25;
EXP = 38;
FLATTEN = 8;
HDF5_DATA = 9;
HDF5_OUTPUT = 10;
HINGE_LOSS = 28;
IM2COL = 11;
IMAGE_DATA = 12;
INFOGAIN_LOSS = 13;
INNER_PRODUCT = 14;
LRN = 15;
MEMORY_DATA = 29;
MULTINOMIAL_LOGISTIC_LOSS = 16;
MVN = 34;
POOLING = 17;
POWER = 26;
RELU = 18;
SIGMOID = 19;
SIGMOID_CROSS_ENTROPY_LOSS = 27;
SILENCE = 36;
SOFTMAX = 20;
SOFTMAX_LOSS = 21;
SPLIT = 22;
SLICE = 33;
TANH = 23;
WINDOW_DATA = 24;
THRESHOLD = 31;