I roughly have this network structure:
input_a = Input(shape=(input_dim, 1))
input_b = Input(shape=(input_dim, 1))
vec_a = embedding_a(input_a)
vec_b = embedding_b(input_b)
cos_distance = merge([vec_a, vec_b], mode='cos', dot_axes=1)
cos_distance = Reshape((1,))(cos_distance)
model = Model(input=[input_a, input_b], output=[cos_similarity])
model.compile(optimizer='sgd', loss='cosine_proximity')
When training this network, loss gets nan. Do I use cosine_proximity correctly? Also, cosine proximity/distance is ranged between -1 and 1. Do I have to use this as targets or 1 and 0 as usually?
By the way: The name does not seem intuitive because we usually want to minimize what we define as loss. Shouldn't it be cosine_distance?
Any news on that?
I have exactly the same questions and remark about the name of the objective. Regarding the latter, I think that the return should be 1 - K.mean(y_true * y_pred, axis=-1) instead of -K.mean(y_true * y_pred, axis=-1) and the name should be _cosine distance_ instead of _proximity_ since the objective is to minimize the distance and not the proximity.
1 - K.mean(y_true * y_pred, axis=-1) instead of -K.mean(y_true * y_pred, axis=-1)
There will be no difference in behavior between the two, what you propose is just the offset the same quantity by one.
For those that have the same issue:
from keras import backend as K
def cos_distance(y_true, y_pred):
def l2_normalize(x, axis):
norm = K.sqrt(K.sum(K.square(x), axis=axis, keepdims=True))
return K.maximum(x, K.epsilon()) / K.maximum(norm, K.epsilon())
y_true = l2_normalize(y_true, axis=-1)
y_pred = l2_normalize(y_pred, axis=-1)
return -K.mean(y_true * y_pred, axis=-1)
Thnx @Js-Mim
@dr-costas
I would not advise to use this function as the results will significatively differ from what is expected, try it on some randomly generated values to see the difference (x and y sampled from a gaussian distribution -> There are some negative values):
import keras.backend as K
from keras.objectives import cosine_proximity
import numpy as np
def cos_distance(y_true, y_pred):
def l2_normalize(x, axis):
norm = K.sqrt(K.sum(K.square(x), axis=axis, keepdims=True))
return K.maximum(x, K.epsilon()) / K.maximum(norm, K.epsilon())
y_true = l2_normalize(y_true, axis=-1)
y_pred = l2_normalize(y_pred, axis=-1)
return K.mean(y_true * y_pred, axis=-1)
for i in range(5):
x=np.random.randn(2)
y=np.random.randn(2)
a=cosine_proximity(x,y)
b=-cos_distance(x,y)
print a.eval()
print b.eval()
print "\n"
You will get :
-0.294819036415
-3.85899569993e-15
-0.241333817247
-1.01110316419e-07
0.432237146466
-7.66639345888e-07
0.499905301362
-5.95543567182e-08
0.42822717186
-0.035430591272
Although I agree that the two functions will behave the same way on different sampling distributions for x and y, try for example to use a uniform distribution (-> No negative values) :
for i in range(5):
x=np.random.rand(2)
y=np.random.rand(2)
a=cosine_proximity(x,y)
b=-cos_distance(x,y)
print a.eval()
print b.eval()
print "\n"
You will get :
-0.492369794357
-0.492369794357
-0.448659241432
-0.448659241432
-0.480616828156
-0.480616828156
-0.470552945779
-0.470552945779
-0.457538363699
-0.457538363699
So instead you should use :
def cos_distance(y_true, y_pred):
def l2_normalize(x, axis):
norm = K.sqrt(K.sum(K.square(x), axis=axis, keepdims=True))
return K.sign(x) * K.maximum(K.abs(x), K.epsilon()) / K.maximum(norm, K.epsilon())
y_true = l2_normalize(y_true, axis=-1)
y_pred = l2_normalize(y_pred, axis=-1)
return K.mean(y_true * y_pred, axis=-1)
By the way, does someone know why we should modify the numerator this way ? I thought that only the denominator would give rise to nan-related errors.
@anMabbe Hi, I'm new to keras, but if cosine_proximity and cosine_distance are the same then we also need to add K.sum to make an average cosine distance across all data points:
def cos_distance(y_true, y_pred):
y_true = K.l2_normalize(y_true, axis=-1)
y_pred = K.l2_normalize(y_pred, axis=-1)
return K.mean(1 - K.sum((y_true * y_pred), axis=-1))
I interpret it as an average cosine distance between actual output and prediction.
Check:
# cos_distance = [1, 1, 0] => avg = 2/3
x1 = np.array([[0, 1], [1, 0], [1, 1]])
x2 = np.array([[1, 0], [0, 1], [1, 1]])
print(K.eval(cos_distance(x1, x2)))
Output:
0.6666666666666666
@alifya400 I had to use
x1 = K.variable(np.array([[0, 1], [1, 0], [1, 1]]))
x2 = K.variable(np.array([[1, 0], [0, 1], [1, 1]]))
Keras 1.2.2
Most helpful comment
@anMabbe Hi, I'm new to keras, but if cosine_proximity and cosine_distance are the same then we also need to add
K.sumto make an average cosine distance across all data points:I interpret it as an average cosine distance between actual output and prediction.
Check:
Output: