Keras: How to use cosine proximity?

Created on 20 Jun 2016 · 7Comments · Source: keras-team/keras

I roughly have this network structure:

input_a = Input(shape=(input_dim, 1))
input_b = Input(shape=(input_dim, 1))

vec_a = embedding_a(input_a)
vec_b = embedding_b(input_b)

cos_distance = merge([vec_a, vec_b], mode='cos', dot_axes=1)
cos_distance = Reshape((1,))(cos_distance)

model = Model(input=[input_a, input_b], output=[cos_similarity])
model.compile(optimizer='sgd', loss='cosine_proximity')

When training this network, loss gets nan. Do I use cosine_proximity correctly? Also, cosine proximity/distance is ranged between -1 and 1. Do I have to use this as targets or 1 and 0 as usually?

By the way: The name does not seem intuitive because we usually want to minimize what we define as loss. Shouldn't it be cosine_distance?

stale

Source

maxbry

👍9

Most helpful comment

@anMabbe Hi, I'm new to keras, but if cosine_proximity and cosine_distance are the same then we also need to add K.sum to make an average cosine distance across all data points:

def cos_distance(y_true, y_pred):
    y_true = K.l2_normalize(y_true, axis=-1)
    y_pred = K.l2_normalize(y_pred, axis=-1)
    return K.mean(1 - K.sum((y_true * y_pred), axis=-1))

I interpret it as an average cosine distance between actual output and prediction.

Check:

# cos_distance = [1, 1, 0] => avg = 2/3
x1 = np.array([[0, 1], [1, 0], [1, 1]])
x2 = np.array([[1, 0], [0, 1], [1, 1]])
print(K.eval(cos_distance(x1, x2)))

Output:

0.6666666666666666

alfiya400 on 12 Aug 2016

👍12

All 7 comments

Any news on that?

I have exactly the same questions and remark about the name of the objective. Regarding the latter, I think that the return should be 1 - K.mean(y_true * y_pred, axis=-1) instead of -K.mean(y_true * y_pred, axis=-1) and the name should be _cosine distance_ instead of _proximity_ since the objective is to minimize the distance and not the proximity.

dr-costas on 30 Jun 2016

1 - K.mean(y_true * y_pred, axis=-1) instead of -K.mean(y_true * y_pred, axis=-1)

There will be no difference in behavior between the two, what you propose is just the offset the same quantity by one.

fchollet on 30 Jun 2016

👍3

For those that have the same issue:

from keras import backend as K


def cos_distance(y_true, y_pred):
    def l2_normalize(x, axis):
        norm = K.sqrt(K.sum(K.square(x), axis=axis, keepdims=True))
        return K.maximum(x, K.epsilon()) / K.maximum(norm, K.epsilon())
    y_true = l2_normalize(y_true, axis=-1)
    y_pred = l2_normalize(y_pred, axis=-1)
    return -K.mean(y_true * y_pred, axis=-1)

Thnx @Js-Mim

dr-costas on 30 Jun 2016

👍6

@dr-costas

I would not advise to use this function as the results will significatively differ from what is expected, try it on some randomly generated values to see the difference (x and y sampled from a gaussian distribution -> There are some negative values):

import keras.backend as K
from keras.objectives import cosine_proximity
import numpy as np

def cos_distance(y_true, y_pred):
    def l2_normalize(x, axis):
        norm = K.sqrt(K.sum(K.square(x), axis=axis, keepdims=True))
        return K.maximum(x, K.epsilon()) / K.maximum(norm, K.epsilon())
    y_true = l2_normalize(y_true, axis=-1)
    y_pred = l2_normalize(y_pred, axis=-1)
    return K.mean(y_true * y_pred, axis=-1)

for i in range(5):
    x=np.random.randn(2)
    y=np.random.randn(2)
    a=cosine_proximity(x,y)
    b=-cos_distance(x,y)
    print a.eval()
    print b.eval()
    print "\n"

You will get :

-0.294819036415
-3.85899569993e-15


-0.241333817247
-1.01110316419e-07


0.432237146466
-7.66639345888e-07


0.499905301362
-5.95543567182e-08


0.42822717186
-0.035430591272

Although I agree that the two functions will behave the same way on different sampling distributions for x and y, try for example to use a uniform distribution (-> No negative values) :

for i in range(5):
    x=np.random.rand(2)
    y=np.random.rand(2)
    a=cosine_proximity(x,y)
    b=-cos_distance(x,y)
    print a.eval()
    print b.eval()
    print "\n"

You will get :

-0.492369794357
-0.492369794357


-0.448659241432
-0.448659241432


-0.480616828156
-0.480616828156


-0.470552945779
-0.470552945779


-0.457538363699
-0.457538363699

So instead you should use :

def cos_distance(y_true, y_pred):
    def l2_normalize(x, axis):
        norm = K.sqrt(K.sum(K.square(x), axis=axis, keepdims=True))
        return K.sign(x) * K.maximum(K.abs(x), K.epsilon()) / K.maximum(norm, K.epsilon())
    y_true = l2_normalize(y_true, axis=-1)
    y_pred = l2_normalize(y_pred, axis=-1)
    return K.mean(y_true * y_pred, axis=-1)

anMabbe on 8 Jul 2016

By the way, does someone know why we should modify the numerator this way ? I thought that only the denominator would give rise to nan-related errors.

anMabbe on 8 Jul 2016

@anMabbe Hi, I'm new to keras, but if cosine_proximity and cosine_distance are the same then we also need to add K.sum to make an average cosine distance across all data points:

def cos_distance(y_true, y_pred):
    y_true = K.l2_normalize(y_true, axis=-1)
    y_pred = K.l2_normalize(y_pred, axis=-1)
    return K.mean(1 - K.sum((y_true * y_pred), axis=-1))

I interpret it as an average cosine distance between actual output and prediction.

Check:

# cos_distance = [1, 1, 0] => avg = 2/3
x1 = np.array([[0, 1], [1, 0], [1, 1]])
x2 = np.array([[1, 0], [0, 1], [1, 1]])
print(K.eval(cos_distance(x1, x2)))

Output:

0.6666666666666666

alfiya400 on 12 Aug 2016

👍12

@alifya400 I had to use

x1 = K.variable(np.array([[0, 1], [1, 0], [1, 1]]))
x2 = K.variable(np.array([[1, 0], [0, 1], [1, 1]]))

Keras 1.2.2

kevindewalt on 2 Sep 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Messy log when printing inside a Callback

fredtcaroli · 3Comments

Cost-sensitive classification

zygmuntz · 3Comments

Regularizer config does not serialize to YAML

nryant · 3Comments

Understanding stateful_lstm.py

amityaffliction · 3Comments

Dropout error with Functional API ((Cast uint8 to bool is not supported)

MarkVdBergh · 3Comments