Keras: how to calculate Manhatten distance between two tensors of dim 300?

Created on 27 Feb 2017 · 9Comments · Source: keras-team/keras

I tried, but as an output i get a vector of 300 dim, whereas i should be getting a single score as a distance between two tensors

from scipy.spatial.distance import cityblock
def Manahtten_distance(M):
A=M[0]
B=M[1]

res=cityblock(A,B)
return res

merged_vector = merge([encoded_a, encoded_b],mode=Manahtten_distance, output_shape=(1,))

stale

Source

omerarshad

Most helpful comment

Hello,

You have to define your operation using Keras, so that they are symbolic. You can't directly reuse functions from scipy.
You should probably create a custom layer instead of using Merge, as it will be easier and less bug prone to reuse.

Here is one correct way of using Merge to compute the Manhattan distance.

Running (but not tested) code

import numpy as np

from keras.models import Model
from keras.layers import Input
import keras.backend as K

from keras.layers.core import Merge


def Manhattan_distance(A,B):
   return K.sum( K.abs( A-B),axis=1,keepdims=True)

inp1 = Input( shape=(100,))
inp2 = Input( shape=(100,))

merged_vector = Merge(mode=lambda x:Manhattan_distance(x[0],x[1]), output_shape=lambda inp_shp:(inp_shp[0][0],1))([inp1,inp2])

m = Model([inp1,inp2],[merged_vector])
print m.predict( [np.random.randn(30,100),np.random.randn(30,100)] )

unrealwill on 1 Mar 2017

👍3

All 9 comments

Hello,

Here is one correct way of using Merge to compute the Manhattan distance.

Running (but not tested) code

import numpy as np

from keras.models import Model
from keras.layers import Input
import keras.backend as K

from keras.layers.core import Merge


def Manhattan_distance(A,B):
   return K.sum( K.abs( A-B),axis=1,keepdims=True)

inp1 = Input( shape=(100,))
inp2 = Input( shape=(100,))

merged_vector = Merge(mode=lambda x:Manhattan_distance(x[0],x[1]), output_shape=lambda inp_shp:(inp_shp[0][0],1))([inp1,inp2])

m = Model([inp1,inp2],[merged_vector])
print m.predict( [np.random.randn(30,100),np.random.randn(30,100)] )

unrealwill on 1 Mar 2017

👍3

That's working fine. what about contrastive_loss? is following code correct? I tried but i do not achieve accuracy with it nor loss decreases

def contrastive_loss(y, d):
""" Contrastive loss from Hadsell-et-al.'06
http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
"""
margin = 1
return K.mean(y * K.square(d) + (1 - y) * K.square(K.maximum(margin - d, 0)))

omerarshad on 2 Mar 2017

Your loss seems right. Maybe you are not using it correctly.
y should be 1 when both layers should be the same.
y should be 0 when layers should differ.

d must be the distance (and not the distance^2). Caffe implementation use L2 distance.
Maybe you can set a different margin, or make sure that the distance is of the same order as "1". (you may need to divide by the number of dimensions). Because if d > margin for all points then the gradient is 0 and the parameters won't move and the loss won't decrease when y = 0 .

Maybe you need to have correct imbalance between same and different class (~ 50% each)
I'm not familiar with contrastive loss but I've played with triplet loss, and there is some usefulness in picking carefully your negative ("different") examples, because if it's too easy for the network to pick the difference then the gradient are 0, and it doesn't learn the example)

unrealwill on 2 Mar 2017

This is how i am using it

input_a=Input(shape=(20, input_dim))
input_b=Input(shape=(20, input_dim))

shared_lstm = LSTM(50, dropout_W=0.0,dropout_U=0.0)

encoded_a = shared_lstm(input_a)
encoded_b = shared_lstm(input_b)

merged_vector = Merge(mode=lambda x:Manhattan_distance(x[0],x[1]), output_shape=lambda inp_shp:(inp_shp[0][0],1))([encoded_a,encoded_b])

model_lstm = Model([input_a,input_b],[merged_vector])
model_lstm.compile(loss=contrastive_loss, optimizer='adam', metrics=['accuracy'])
model_lstm.fit([X_train_sen1, X_train_sen2],y,nb_epoch=1,callbacks=callbacks_list, batch_size=60,shuffle=True,verbose=0)

here y ranges from 0 to 1, this is basically sentence similarity,

omerarshad on 2 Mar 2017

Because LSTM activation is a tanh, which takes value between -1.0 and 1.0, the average per coordinate distance is ~1.0, and you have 50 cells, this means that average total distance is ~50.0, and the gradient for the "different" part of the loss is 0.
You probably want to normalize by the number of dimensions (i.e. take mean instead of sum in Manhattan distance ).

Additionally you may want to regularize your learning, to give your network incentives to learn a meaningful representation, otherwise you will be facing a "temporal credit assignment problem", which will probably make your training very slow.

Currently your network is probably solving for your problem by outputting almost a constant : it will score well on positives examples, (and we have seen previously that the negative examples can't be learned due to the 0 gradient so they can't help it to improve). So the learned representation is not at all meaningful. You have to prevent your network from taking this bypass.

unrealwill on 2 Mar 2017

Yes training is very slow, infact pearson corelation between gold score and my prediction is in negative what does this mean? I have regularized as well. Loss strts from 0.5 and quickly reaches 0.2 and accuracy is stil negative, which means network is not learning anything

omerarshad on 2 Mar 2017

The "accuracy" metric is probably not relevant here (it doesn't compute what you want it to compute : looks at metrics source code).
Try displaying some predictions and investigating. Try displaying the predicted encodings.
I don't know about your correlation between target and prediction, it's probably just noise at this stage.

unrealwill on 2 Mar 2017

i'm not using the accuracy metric of keras, actually i am using "Pearson corelation". My main concern is that loss reaches aprox 0 very quickly, and i get no accuracy. Why is loss decreasing if network is not learning any thing? I am using wordvectors as input, should i normalize them?

omerarshad on 2 Mar 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.