Keras: loss function for multi-label problem

Created on 2 Apr 2016 · 21Comments · Source: keras-team/keras

Hi there, How to choose loss function for multi-label problem

it's different from multi-class output, the former output is a 0/1 vector with multiple ones, whereas the multi-class output is a single one-hot vector.

Thanks

stale

Source

kingfengji

Most helpful comment

categorical_crossentropy: 1-of-N (one-hot)
binary_crossentropy: 1-or-more 0/1 labels

NasenSpray on 2 Apr 2016

👍82 😄4

All 21 comments

the multi-class entropy thing is for multi-class problem, I suppose?

kingfengji on 2 Apr 2016

the multi-class entropy thing is for multi-class problem, I suppose?

Yes. What you want is binary_crossentropy

NasenSpray on 2 Apr 2016

@NasenSpray binary_crossentropy is for multiclass, but not multilabel, right?

kingfengji on 2 Apr 2016

categorical_crossentropy: 1-of-N (one-hot)
binary_crossentropy: 1-or-more 0/1 labels

NasenSpray on 2 Apr 2016

👍82 😄4

MSE/MAE also work, binary crossentropy would be preferred in general though.

keunwoochoi on 2 Apr 2016

@kingfengji i'm doing multi-label and do you share your code how to do it. Or Do you give one simple example how to implement multi-label classification.Thanks

alyato on 13 Jul 2016

@alyato this might help you multi label image classification

suraj-deshmukh on 22 Nov 2016

If using binary_crossentropy as loss function, does it mean we are minimizing the average of all cross-entropies over all classes?

michelleowen on 3 Feb 2017

👍8 😄2

I believe so.

On 3Feb 2017, at 20:57, michelleowen notifications@github.com wrote:

If using binary_crossentropy as loss function, does it mean we are minimizing the average of all cross-entropies over all classes?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/2166#issuecomment-277360245, or mute the thread https://github.com/notifications/unsubscribe-auth/APZ8xXygSlgwxsLkMd-PwWHqL7oy3dUAks5rY5SdgaJpZM4H-MfW.

keunwoochoi on 3 Feb 2017

@keunwoochoi Could you explain why binary crossentropy is preferred for multi-label classification? I thought binary crossentropy was only for binary classification where y label is only 0 or 1. Now that the y label is in the format of [1,0,1,0,1..], do you know how the loss is calculated with binary crossentropy?

lipeipei31 on 31 Mar 2017

Thanks,My last layer is softmax layer. I use 'binary_crossentropy' as loss and I get 99% accuracy, while I use other loss function, I get only 10% accuracy.
I want to know how the accuracy is calculated?

xidongbo on 14 Apr 2017

I get high accuracy,but when I see the predict labels. I find the labels are all-zeros.

xidongbo on 14 Apr 2017

@1064950364 Yes, that's the definition of accuracy and that's why accuracy doesn't matter with many multi-label problems. In your true labels, there are so many zeros, right?
@lipeipei31 More precisely, crossentropy is preferred over MAE/MSE. It's bounded, it's loss computation (which is in proportion to the gradient applied) is more plausible. In that case it computes crossetnropy over each output and then compute their average.

keunwoochoi on 14 Apr 2017

👍3

@1064950364 Do you compare the two different layer in the last layer,softmax or sigmoid.

alyato on 1 May 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 30 Jul 2017

I need to classify attributes in a face like colour of eye, hair, skin; facial hair, lighting and so on. Each has few sub-categories in it. So should I directly apply sigmoid on all the labels or separately apply softmax on each subcategory like hair/eye colour etc?
Which one will be better in this case?
Or should I combine both as some subclasses are binary?

sarthakahuja11 on 18 Jun 2018

@sarthakahuja11, sounds like you have a multi-output problem where each output is binary or multi-class classification. I think you should have different loss functions for different outputs.

lipeipei31 on 18 Jun 2018

👍1

@lipeipei31 You have identified the problem correctly. So I should choose binary cross entropy for binary-class classification and categorical-cross entropy for multi-class classification? And combine them together afterwards in the same model?

sarthakahuja11 on 19 Jun 2018

@sarthakahuja11 Yes, that's right. And you can easily do that with the keras functional api:
https://keras.io/getting-started/functional-api-guide/#multi-input-and-multi-output-models. And the loss functions can be a list or dictionary if you named the outputs.

lipeipei31 on 20 Jun 2018

Thanks! @lipeipei31

sarthakahuja11 on 20 Jun 2018

hi,

if binary cross entropy is working in Keras for multi-label problems, will categorical_crossentropy work for multi one-hot encoded classes as well?

My example output is:

[
    [0,0,1,0]
    [0,0,0,1]
    [1,0,0,0]
]

So I have three one hot encoded vectors. For a single on the loss function to choose would be categorical cross entropy. What will Keras do in a case like these?