Keras: Keras losses discussion

Created on 21 Jul 2017 · 8Comments · Source: keras-team/keras

I've been working with Keras for a little while now and one of my main frustrations are the way it handles losses. It appears restricted to the form of y_true, y_pred, where both tensors must have the same shape. In regular classification problems suchs as VOC, ImageNet, CIFAR, etc. this will work fine, but for more 'exotic' networks this poses issues.

For example, the MNIST siamese example network works around this issue by outputting the distance between two inputs as the model output and then computing the loss based on that distance.

Similarly, the Triplet Loss depends on not two but three images, meaning two distance measures. This is circumvented in this project by computing the loss inside the model, outputting that loss and using an "identity_loss" to compute the mean loss.

The variational autoencoder demo finds another way to resolve this, by adding a custom layer which adds to the loss targets directly and setting the models' loss targets to None. This works apparently, but it is far from ideal as it feels like a massive workaround.

For RPN (Region Proposal Network) it is even more weird. RPN generates a variable number of proposals from an image and calculates the targets (similar to y_true) for these proposals. In other words, y_true cannot be computed beforehand and, nor can its shape be guessed beforehand. The only solution here would be to use an identity_loss as in the Triplet case or use a custom layer which adds a loss target as with the variational autoencoder.

There are multiple issues that discuss this problem.

This issue aims to be a discussion point on how to improve the current scenario.

My proposal would be to reduce the restrictions on the loss parameter for a Keras model and allow for arbitrary loss tensors. Below I will show some code to show how I imagine it (Triplet network example):

def create_base_network(input_shape):
    seq = Sequential()
    seq.add(Conv2D(20, (5,5), input_shape=input_shape))
    seq.add(MaxPool2D(pool_size=(2,2)))
    seq.add(Flatten())
    seq.add(Dense(128))

base_network = create_base_network((224, 224, 3))

query = Input((224, 224, 3))
positive = Input((224, 224, 3))
negative = Input((224, 224, 3))

query_embedding = base_network(query)
positive_embedding = base_network(positive)
negative_embedding = base_network(negative)

loss = Lambda(triplet_loss)([query_embedding, positive_embedding, negative_embedding])

model = Model(inputs=[query, positive, negative], outputs=None, loss=[loss])

That network wouldn't require an output, but it does have a loss function. The deployed version would output an embedding, but this isn't necessary for the training model.

EDIT: Basically this proposal would drop the relation between outputs and loss, and thus its restrictions. In addition, it would change the meaning of loss into that of a list of tensors which have to be minimized during training. outputs would simply be the list of tensors that get returned by the model after prediction.

@fchollet I would love to hear what you think of this proposal; is it something you would support? I can make an attempt to work on this but it is better to discuss such a significant change first. Will we have to worry about backwards compatibility, or should this be a change for the next major release of Keras?

Source

hgaiser

👍15

Most helpful comment

I want to echo @hgaiser. It’s extremely difficult to implement the aforementioned RPN losses in a flexible and intuitive manner. I disagree with @hgaiser, however, losses on intermediate layers are not exotic (consider every model presented at CVPR or ICCV that uses RCNN-like anchor generation). ☺️

0x00b1 on 31 Jul 2017

👍4

All 8 comments

@fchollet any chance to glance at this issue and join the discussion?

hgaiser on 30 Jul 2017

0x00b1 on 31 Jul 2017

👍4

@fchollet this issue still stands. After working with add_loss for a while, I would really like to work towards a solution as proposed in my original post. What do you think?

hgaiser on 22 Sep 2017

This is a major issue for me as well. One point yet unmentioned is the difficulty of including training-only "y" inputs in complex ways. The interface I was considering was something like:

model = Model(inputs={'i1':i1,'i2':i2},outputs={'o1':o1,'o2':o2},ys={'y1':y1,'y2':y2})
model.compile(self,optimizer,loss_func)

Where the loss function would take in dictionaries of outputs and y values - preferably to me as tf tensors, but layers would work as well:

def loss_func(outputs,ys):
  return (outputs['o1']-ys['y1'])**2.sum()

This would be compatible with @hgaiser's use case by including the loss layer in "outputs" and applying a simple loss_func (to prevent returning unnecessary output):

def loss_func(outputs,ys):
  return outputs['loss_layer']
model = Model(inputs=[query, positive, negative], outputs={'loss_layer':loss}, loss=loss_func)

mharradon on 5 Oct 2017

@hgaiser Hi hgaiser. I am looking for a beautiful implementation of Triplet-network in Keras. I would like to hear your suggestions.

pengpaiSH on 10 Nov 2017

i have the same problem with custom losses function, which i need external data for loss function calculation, currently, i follow the variational autoencoder demo to work around, it is still not perfect solution for this common problem.

exitNA on 15 Nov 2017

I have a very ugly implementation of a wrapper around Keras models that allows for arbitrary model definition like I described since I needed that functionality. I ditch y inputs all together and instead make loss layers for everything. Then a "wrapped model" is two separate Keras models - one for training with "y" label values included in the "x" inputs and another for prediction that only has the true "x" inputs. The wrapper serves to abstract this implementation detail to the interface I described above.

https://github.com/mharradon/Octopus

Like I said, this is very dirty, and I implemented this while I was learning the details of the Keras APIs and I didn't understand the Keras codebase well enough to implement it in Keras directly. There's likely a better way to implement this i.e. via proper class inheritance. Hopefully this will serve to demonstrate that what we're interested in is possible without modifying Keras, and possibly encourage full implementation or at least a more proper wrapper approach.

mharradon on 17 Nov 2017

`class CustomVariationalLayers(keras.layers.Layer):
def vae_loss(self,inputs):
x=inputs[0]
z_decoded=inputs[1]
x=K.flatten(x)
z_decoded=K.flatten(z_decoded)
xent_loss=keras.metrics.binary_crossentropy(x,z_decoded)
kl_loss=-5e-4*K.mean(
1+z_log_var-K.square(z_mean)-K.exp(z_log_var),axis=-1)
return K.mean(xent_loss+kl_loss)
def call(self, inputs):
x=inputs[0]
z_decoded=inputs[1]
loss=self.vae_loss([x, z_decoded])
self.add_loss(loss, inputs=inputs)
return x

y=CustomVariationalLayers()([input_img, z_decoded])`

To call CustomVariationalLayers(), why the arguments z_mean and z_log_var are not needed ?