Keras: Multiple objectives with one custom loss

Created on 24 Apr 2016 · 11Comments · Source: keras-team/keras

Hi,

I have a model where I get multiple outputs with each having its own loss function. There 3 outputs where 2 of them can use already in-built objective functions while the third one will use the custom objective function written by me. Something like this:

model.compile(optimizer='rmsprop',
loss={'output1': 'binary_crossentropy', 'output2': 'categorical_crossentropy', 'output3':'custom_objective'})

I don't need separate loss weights for each one and also the custom function returns log probability.
In this scenario does the above formulation mean "minimizing joint negative log likelihood"? That is what I want, to minimize the sum of negative log likelihood for all output but I am not sure if the above formulation achieves that.

If not, can someone please share what needs to be done for my case?

Thanks

stale

Source

code-ball

👍2

Most helpful comment

 prediction1 = Dense(33, activation = 'softmax', name = 'Softmax_prediction1')(x)
 prediction2 = Dense(4,activation = 'softmax', name = 'Softmax_prediction2')(x)

output = keras.layers.concatenate([prediction1, prediction2], axis=1, name = 'concatenate')

model = Model(inputs = input_tensor, outputs = output)

and my compile is :

mutil_model.compile(optimizer=SGD(lr=1e-3, momentum=0.9, decay=1e-4, nesterov=False), \
                    loss = 'categorical_crossentropy', metrics=['acc'])

and I want to add different loss weights to different output , how I code this ?

Autoom on 2 Aug 2018

👍4

All 11 comments

carlthome on 25 Apr 2016

👍1

minimizing joint negative log likelihood

This usually has a very specific meaning. Aka, a joint distribution is P(X,Y). Having the two first outputs with the two indicated losses will do the following:

min L(\theta, data) = min -log(P(output1=1|data)) - log(P(output2=c|data)). So, it is minimizing the linear combination of the two losses where they are equally weighted. The statistical weight it gives to either of those two outputs will depend on the number of classes and relative accuracy because logspace isn't linear. In other words, the closer to 1 your probabilities get, the closer those two values will be, but the closer to 0, the larger the disparity.

aka:

In [34]: log(0.7)/log(0.8)
Out[34]: 1.598410269255312

In [35]: log(0.8)/log(0.9)
Out[35]: 2.117904889901081

quick edit:
Having a linear combination of losses by is not necessarily minimizing the joint. Minimizing the joint would be min -log(P(output1=1, output2=c | data)). If they are independent, or you make the assumption that they are, then you can do -log(P(output1=1|data) * P(output2=c|data)) which can further decompose into -log(P(output1=1|data))-log(P(output2=c|data)). so, if you assume independence between the two classifications, the linear combination of the two losses is indeed "minimizing joint negative log likelihood".

In terms of having multiple losses, the custom loss is just as @carlthome linked. The multiple losses are fairly standard in the API. You can pass either by list or dictionary.

braingineer on 26 Apr 2016

👍1

@carlthome : Thanks for the pointer. It is very helpful.

@braingineer: Thanks a lot for an elaborate explanation. Although I had the same understanding, I was getting confused for some reason but your post clarifies my doubt fully. Thanks again.

code-ball on 26 Apr 2016

I think I have more complicated situation and I want
to understand this thoroughly as to how keras handles things for
loss functions.

I have a recurrent layer which provides some output call it o

o = SimpleRNN(120, activation='relu')(input)

I want to use this output 'o' to get two separate losses.
1.) softmax based loss and 2.) Custom loss function.

First loss (softmax)
@braingineer : As per your description P(output1=c|data) = exp(V^t * o + b)/sum_{all c}exp(V^t * o + b)
and loss will be log P(output1=1|data) whose negative will need to be minimized. Here,c is
the number of classes. To achieve this I have written following code:

o = SimpleRNN(120, activation='relu')(input)
output1 = Dense(1000,activation='softmax')(o)
model.compile(optimizer='rmsprop', loss=['categorical_crossentropy'],metrics=['accuracy'])

Is this correct and does it achieve the above description? I am mainly concerned that
softmax based part. Does the activation of dense layer achieve it already and categorical cross
entropy just gets neg log or do I need to define it in some other way?
The loss is shown in list as I apply another loss later.

Second loss (custom)

I define this in local file and return the value of P(output2 = d|data) and not the log of it.
Now I have follow up question regarding it's definition but it is complicated and I would first
like to resovle the simpler things. To achieve this I have written following:

o = SimpleRNN(120, activation='relu')(input)
output2 = Dense(1)(o)
model.compile(optimizer='rmsprop', loss=[custom_loss],metrics=['accuracy'])

Combining the two together, I have:

o = SimpleRNN(120, activation='relu')(input)
output1 = Dense(1000,activation='softmax')(o)
output2 = Dense(1)(o)
model.compile(optimizer='rmsprop', loss=['categorical_crossentropy', custom_loss],metrics=['accuracy'])

Having done this, I expect that I achieve the following:

total L = -log P(output1=c |data) - log P(output2=d | data) = -log (softmax(o)) + - log(value returned from custom log function)

Do you think the code achieves the formulation? So, the above code is working but my loss and accuracy is very (almost incorrectly) less so I think it is not doing what intend to.

code-ball on 26 Apr 2016

@braingineer: Any idea about this one?

Thanks

code-ball on 27 Apr 2016

Hi @code-ball

re: first loss.

Yes, that is correct. You are obtaining the last output of the RNN, running it through a dense layer, and taking the softmax. Thus, you have a probability distribution. If your y_true that you pass into the fit function is a binary 1-hot, then this becomes the negative log likelihood. (well, it's actually the mean, because it's more numerically stable to use the mean of losses.)

re: second loss

You are missing one step:

model = Model(input=[input], output=[output1, output2])

Keras will assume that the losses correspond 1:1 with the losses. You can see this here

Hope that helps.

braingineer on 27 Apr 2016

👍3

@braingineer Thank you for your detailed clarification. Here I have a question of the independence. Let's say we are doing the image segmentation (only for cats, say) task. Obviously, in this case, output1 will predict whether this image is a cat or not and output2 will predict a mask that each pixel represents whether it describes a cat or not. In this scenario, we can train the model as you suggested as below:

model = Model(input=[input], output=[output1, output2])

Another thought is that we can multiply output1 by output2 so that we could get a new output:

model = Model(input=[input], output=merge([output1, output2], mode='mul'))

What's the difference is the two? Is there any advantage or disadvantage?

As I could see, if we use two losses, i.e. output=[output1, output2], then we could make good use of training data information since the we will fit the model with train_images, cat_label, train_images_masks. However, if we use one loss, i.e. output=merge([output1, output2], mode='mul'), then we only need fit the model with train_images, train_images_masks.

pengpaiSH on 20 Jun 2016

@braingineer In my case, it seems that explicit two outputs method performs better.

pengpaiSH on 21 Jun 2016

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 23 May 2017

Hii,
Suppose if i have 2 loss functions.Some of the weights are common for both the outputs. How will it update the weights using back propagation theorem?
Thanks in advance.

Tharun98 on 7 Sep 2017

 prediction1 = Dense(33, activation = 'softmax', name = 'Softmax_prediction1')(x)
 prediction2 = Dense(4,activation = 'softmax', name = 'Softmax_prediction2')(x)

output = keras.layers.concatenate([prediction1, prediction2], axis=1, name = 'concatenate')

model = Model(inputs = input_tensor, outputs = output)

and my compile is :

mutil_model.compile(optimizer=SGD(lr=1e-3, momentum=0.9, decay=1e-4, nesterov=False), \
                    loss = 'categorical_crossentropy', metrics=['acc'])

and I want to add different loss weights to different output , how I code this ?

Autoom on 2 Aug 2018

👍4

Was this page helpful?

0 / 5 - 0 ratings