keras 🚀 - Is there a way in Keras to apply different weights to a cost function?

Sorry, the table has lost its format, I am sending an image:

ayalalazaro on 29 Mar 2016

carlthome on 29 Mar 2016

You could use class_weight.

tboquet on 29 Mar 2016

class_weight applies a weight to all data that belongs to the class, it should be dependent on the missclassification.

ayalalazaro on 29 Mar 2016

You are absolutely right, I'm sorry I misunderstood your question. I will try to come back with something tomorrow using partial to define the weights. What you want to achieve should be doable with Keras abstract backend.

tboquet on 30 Mar 2016

Ok so I had the time to quickly test it.
This is a fully reproducible example on mnist where we put a higher cost when a 1 is missclassified as a 7 and when a 7 is missclassified as a 1.

So if you want to pass constants included in the cost function, just build a new function with partial.

'''Train a simple deep NN on the MNIST dataset.
Get to 98.40% test accuracy after 20 epochs
(there is *a lot* of margin for parameter tuning).
2 seconds per epoch on a K520 GPU.
'''

from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, Adam, RMSprop
from keras.utils import np_utils
import keras.backend as K
from itertools import product

# Custom loss function with costs

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

w_array = np.ones((10,10))
w_array[1, 7] = 1.2
w_array[7, 1] = 1.2

ncce = partial(w_categorical_crossentropy, weights=np.ones((10,10)))

batch_size = 128
nb_classes = 10
nb_epoch = 20

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))

rms = RMSprop()
model.compile(loss=ncce, optimizer=rms)

model.fit(X_train, Y_train,
          batch_size=batch_size, nb_epoch=nb_epoch,
          show_accuracy=True, verbose=1,
          validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test,
                       show_accuracy=True, verbose=1)
print('Test score:', score[0])
print('Test accuracy:', score[1])

tboquet on 31 Mar 2016

👍119 🎉20 ❤16 😄15

Wow, that s nice. Thanks for the detailed answer!

ayalalazaro on 1 Apr 2016

Try to test it on a toy example to verify that it actually works. If it's what you are looking for, feel free to close the issue!
Keras 1.0 will provide a more flexible way to introduce new objectives and metrics.

tboquet on 1 Apr 2016

Well, I am stuck, I can t make it run in my model, it says:

line 56, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (y_pred.shape[0], 1))

AttributeError: 'Tensor' object has no attribute 'shape'

This is the model I am using:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (y_pred.shape[0], 1))
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2
ncce = partial(w_categorical_crossentropy, weights=w_array)

def build_model(X_data):
    data_dim = X_data.shape[2]
    timesteps = X_data.shape[1]
    model = Sequential()
    model.add(BatchNormalization(input_shape = (timesteps,data_dim)))  
    model.add(GRU(output_dim=50,init ='glorot_normal',
         return_sequences=True, W_regularizer=l2(0.00),U_regularizer=l1(0.01),dropout_W =0.2 ))
    model.add(GRU(output_dim=50,init ='glorot_normal',
        return_sequences=True,W_regularizer=l2(0.00),U_regularizer=l1(0.01),dropout_W =0.2))
    model.add(GRU(50,init ='glorot_normal',return_sequences=False,dropout_W =0.01, W_regularizer=l2(0.00),U_regularizer=l1(0.01)))
    model.add(Dense(3, init='glorot_normal'))
    model.add(Activation('softmax'))

    model.compile(loss=ncce,
              optimizer='Adam'
            )
    return model

ayalalazaro on 2 Apr 2016

Sure, sorry I was using Theano functionnalities. I replaced the following line in my previous example:

y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))

It should do the trick!

tboquet on 4 Apr 2016

Sounds the way to go, I was using tensorflow as backend. I tell you if it works as soon as posiible. Thanks!

ayalalazaro on 5 Apr 2016

I still get an error:

line 57, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 271, in reshape
return tf.reshape(x, shape)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 682, in reshape
name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 411, in apply_op
as_ref=input_arg.is_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 529, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 178, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 161, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 319, in make_tensor_proto
_AssertCompatible(values, dtype)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 259, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).__name__))

TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

I ve tried your first reply under theano backend and it works though.

ayalalazaro on 5 Apr 2016

Ok, I was not sure about how K.shape would behave with TensorFlow. It seems you should use:

y_pred_max = K.reshape(y_pred_max, (K.int_shape(y_pred)[0], 1))

tboquet on 5 Apr 2016

I get more or less the same:

line 59, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (K.int_shape(y_pred)[0], 1))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 271, in reshape
return tf.reshape(x, shape)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 682, in reshape
name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 411, in apply_op
as_ref=input_arg.is_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 529, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 178, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 161, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 319, in make_tensor_proto
_AssertCompatible(values, dtype)

File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 259, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).__name__))

TypeError: Expected int32, got None of type '_Message' instead.

It seems like it cannot get the shape of y_pred as an integer , right?

ayalalazaro on 6 Apr 2016

Mm, ok I will take a look at it today and work directly with tensors to try to find a way to have it work properly for both backend.

tboquet on 6 Apr 2016

Hi there, I tried something like that:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, K.shape(y_pred))
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],tf.float32) * K.cast(y_pred_max_mat[:, c_p] ,tf.float32)* K.cast(y_true[:, c_t],tf.float32))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

I Think it will do it.

pgallego25 on 8 Apr 2016

The latter only works for non recurrent networks, but this code works for RNNs following the same idea. It only works for tensorflow though. I couldn t find a way to reshape a tensor the way we want with the keras backend:

import tensorflow as tf

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = tf.expand_dims(y_pred_max, 1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

ayalalazaro on 8 Apr 2016

My bad, just replacing tf.expand_dims with K.expand_dims worked for me:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.expand_dims(y_pred_max, 1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask
w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2
ncce = partial(w_categorical_crossentropy, weights=w_array)
ncce.__name__ ='w_categorical_crossentropy'

The last line is necessary for tensorboard callback to work, thanks!!

ayalalazaro on 9 Apr 2016

👍19 🎉1

Is the Mar 31 solution for @ayalalazaro above still recommended as of v1.2? (Noticed @tboquet 's comment: _Keras 1.0 will provide a more flexible way to introduce new objectives and metrics_.)

My problem is binary classification where true positive accuracy is more important, and some false negatives are acceptable. Would I need the approach above to achieve that objective? I tried class_weights = {0: 1, 1: 10}, but saw no change. (examples are 25% positive, 75% negative)

kimardenmiller on 3 Dec 2016

Just a small detail about the w_categorical_crossentropy implementetion. There is no need to cast weights and y_true. The following code is working in Theano and TensorFlow:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

curiale on 20 Jan 2017

👍1

Hello, I am trying to implement this in tensorflow.

I am confused as to what partial is in the line :

ncce = partial(w_categorical_crossentropy, weights=np.ones((10,10)))

I do not see it defined anywhere in this thread, and get

NameError: name 'partial' is not defined

as output...

Thanks

jerpint on 20 Feb 2017

👍2

@jerpint It’s available from functools, i.e.

import functools

ncce = functools.partial(w_categorical_crossentropy, weights=np.ones((10,10)))

0x00b1 on 20 Feb 2017

👍10

I am trying to incorporate @curiale's implementation w_categorical_crossentropy for a binary classification where the output of my model has shape (?, 5120, 2) but I am running into a couple of issues:

1) Assuming my classs weight distribution is e.g. class_weights=[ 0.85144055 , 1.14855945] What should thew_array be like? Something like this below?

w_array = np.ones((2,2))
w_array[1,0] = class_weights[0]
w_array[0,1] = class_weights[1]
ncce = functools.partial(w_categorical_crossentropy, weights=w_array)

2) When I run model.compile(optimizer=Adam(lr=1e-5), loss=ncce, metrics=[dice_coef]) I get the following error:

ValueError: Dimensions must be equal, but are 5120 and 2 for 'mul_339' (op: 'Mul') with input shapes: [?,5120], [?,2].

These are the variables' shapes inside w_categorical_crossentropy

y_pred shape: (?, 5120, 2) y_true shape: (?, ?, ?) final_mask.shape: (?, 2)

Frankly I am lost in w_categorical_crossentropy function (e.g. what is final_mask be? Its shape?). Any help would be much appreciated.

mongoose54 on 23 Feb 2017

Hnn, I'm sorry but I don't quite understand: What does this (?, 5120, 2) entail? If ? is the batch size and 2 is the number of classes, what is 5120?

recluze on 23 Feb 2017

@recluze Sorry for the confusion. Let me clarify: The model is an image segmentation network with output (?, 5120, 2) where ? : batch_size , 5120 : total_number_of_pixels_per_image and 2 : classes (foreground, background). So basically the network does classification per pixel.

mongoose54 on 23 Feb 2017

Hnn, the last 2 should be removed I think since you have two classes, a single output with binary crossentropy instead of categorical one should work. Don't think 3-dim output shape would work with w_categorical_crossentropy...

recluze on 23 Feb 2017

So I hacked Kera's backend binary_crossentropy function to the following to include weighted_cross_entropy_with_logits() to pass class weights :

def w_binary_crossentropy(output, target, weights):
    output = tf.clip_by_value(output, tf.cast(_EPSILON, dtype=_FLOATX),
                                  tf.cast(1.-_EPSILON, dtype=_FLOATX))
    output = tf.log(output / (1 - output))
    return tf.nn.weighted_cross_entropy_with_logits(output, target, weights)

and in my code I call it like this:

    def wrapped_partial(func, *args, **kwargs):
        partial_func = functools.partial(func, *args, **kwargs)
        functools.update_wrapper(partial_func, func)
        return partial_func


    ncce = wrapped_partial(w_binary_crossentropy, weights=0.01) where weight is the ratio of positive/negatives 

    model.compile(optimizer=Adam(lr=1e-5), loss=ncce, metrics=[dice_coef])

But I am not sure if these weights are the class weights I am after. It is not clear from the definition of weighted_cross_entropy_with_logits whether this is class balancing. I just wanted to share it here with everyone. Any comments are much appreciated.

mongoose54 on 25 Feb 2017

👍2

[edited]
@mongoose54 I'm current playing around with this will post the results back, shouldn't be hard to get a version with fixed weights

dralves on 16 May 2017

@mongoose54 This is what I came up with for binary crossentropy based on tensorpack's version
TF only, but no need to change keras

class WeightedBinaryCrossEntropy(object):

    def __init__(self, pos_ratio):
        neg_ratio = 1. - pos_ratio
        self.pos_ratio = tf.constant(pos_ratio, tf.float32)
        self.weights = tf.constant(neg_ratio / pos_ratio, tf.float32)
        self.__name__ = "weighted_binary_crossentropy({0})".format(pos_ratio)

    def __call__(self, y_true, y_pred):
        return self.weighted_binary_crossentropy(y_true, y_pred)

    def weighted_binary_crossentropy(self, y_true, y_pred):
        # Transform to logits
        epsilon = tf.convert_to_tensor(K.common._EPSILON, y_pred.dtype.base_dtype)
        y_pred = tf.clip_by_value(y_pred, epsilon, 1 - epsilon)
        y_pred = tf.log(y_pred / (1 - y_pred))

        cost = tf.nn.weighted_cross_entropy_with_logits(y_true, y_pred, self.weights)
        return K.mean(cost * self.pos_ratio, axis=-1)

dralves on 17 May 2017

👍5

Thank you @dralves that helps me a lot

Just a quick question. When I compare the ouputs of your class with 0.5 positive weights and the binary_crossentropy loss function from keras, it seems the results differ by a factor of 2

Do you know why and which one is correct ?

import tensorflow as tf
import keras.backend as K
import numpy as np
from keras.losses import binary_crossentropy

class WeightedBinaryCrossEntropy(object):

    def __init__(self, pos_ratio):
        neg_ratio = 1. - pos_ratio
        self.pos_ratio = tf.constant(pos_ratio, tf.float32)
        self.weights = tf.constant(neg_ratio / pos_ratio, tf.float32)
        self.__name__ = "weighted_binary_crossentropy({0})".format(pos_ratio)

    def __call__(self, y_true, y_pred):
        return self.weighted_binary_crossentropy(y_true, y_pred)

    def weighted_binary_crossentropy(self, y_true, y_pred):
            # Transform to logits
            epsilon = tf.convert_to_tensor(K.common._EPSILON, y_pred.dtype.base_dtype)
            y_pred = tf.clip_by_value(y_pred, epsilon, 1 - epsilon)
            y_pred = tf.log(y_pred / (1 - y_pred))

            cost = tf.nn.weighted_cross_entropy_with_logits(y_true, y_pred, self.weights)
            return K.mean(cost * self.pos_ratio, axis=-1)

y_true_arr = np.array([0,1,0,1], dtype="float32")
y_pred_arr = np.array([0,0,1,1], dtype="float32")
y_true = tf.constant(y_true_arr)
y_pred = tf.constant(y_pred_arr)

with tf.Session().as_default(): 
    print(WeightedBinaryCrossEntropy(0.5)(y_true, y_pred).eval())
    print(binary_crossentropy(y_true, y_pred).eval())

Outputs

4.00756
8.01512

gjgd on 22 May 2017

@dardelet good point

This comes directly from tensorpacks implementation which returns the same results
If you remove the final cost * self.pos_ratio you get the same results as with normal sigmoid cross entropy
I do see that in the original implementation of balanced classes cross entropy (from this paper) the authors multiply the loss from the positive labels by the positive ratio and the loss from the negative labels by the negative ratio

I'll look into it a bit more

dralves on 23 May 2017

Thank you @dralves
any new findings on the mentioned difference ?

sedghi on 31 May 2017

👍1

In the example what should the variable final mask look like. I tried to use weights matrix as :

weights = np.matrix([ [0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[1000 ,1000 ,1000 ,1000 ,1000 ,1000 ,1000 ]])

It seems everything except the last class will mess up because the weights are always zeros. However, the confusion matrix is :
[[ 144. 5. 0. 0. 9. 0. 20.]
[ 9. 150. 9. 0. 0. 0. 14.]
[ 7. 8. 109. 6. 2. 1. 17.]
[ 4. 0. 5. 93. 41. 4. 4.]
[ 11. 1. 0. 12. 123. 6. 21.]
[ 0. 0. 1. 5. 12. 126. 8.]
[ 39. 15. 16. 4. 39. 11. 326.]]

I am using keras with tensorflow as backend. Any ideas of why this happens?

bzhong2 on 1 Jul 2017

As @recluze has mentioned above the w_categorical_crossentropy doesn't work with data that's rank 3+ (for example a LSTM with return_sequences=True, TimeDistributed(Dense), etc).

I have changed the above example to support rank 3+ tensors and wrapped it in a class, just like the above WeightedBinaryCrossEntropy.

class WeightedCategoricalCrossEntropy(object):

  def __init__(self, weights):
    nb_cl = len(weights)
    self.weights = np.ones((nb_cl, nb_cl))
    for class_idx, class_weight in weights.items():
      self.weights[0][class_idx] = class_weight
      self.weights[class_idx][0] = class_weight
    self.__name__ = 'w_categorical_crossentropy'

  def __call__(self, y_true, y_pred):
    return self.w_categorical_crossentropy(y_true, y_pred)

  def w_categorical_crossentropy(self, y_true, y_pred):
    nb_cl = len(self.weights)
    final_mask = K.zeros_like(y_pred[..., 0])
    y_pred_max = K.max(y_pred, axis=-1)
    y_pred_max = K.expand_dims(y_pred_max, axis=-1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in itertools.product(range(nb_cl), range(nb_cl)):
        w = K.cast(self.weights[c_t, c_p], K.floatx())
        y_p = K.cast(y_pred_max_mat[..., c_p], K.floatx())
        y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())
        final_mask += w * y_p * y_t
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

The constructor expects a dictionary with same structure as class_weight param from model.fit

{0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344, 7: 57.304}

asiron on 16 Jul 2017

👍6

@asiron Thanks for the code,
Just out of curiosity, what do you follow as rules for assigning different weights to different classes?
any specific formula ? should they sum to 1 ?

sedghi on 16 Jul 2017

@alirzsedghi I think this was answered well in #5116

asiron on 16 Jul 2017

👍1

Hey @asiron thank you for sharing this code. I was wondering if you also figured a way to save the weights with which the loss was initialized when saving the model. This would be really helpful since the weights will be loaded along with the model.

In this version of this custom loss function this is not supported. I am not sure if this functionality is supported by Keras. Any ideas?

Here is a sample code that reproduces the problem.

import keras
import itertools
import numpy as np

from keras import backend as K 
from keras.models import Model
from keras.layers import Input, Dense, Activation

from ipdb import set_trace as bp

class WeightedCategoricalCrossEntropy(object):

  def __init__(self, weights):
    nb_cl = len(weights)
    self.weights = np.ones((nb_cl, nb_cl))
    for class_idx, class_weight in weights.items():
        self.weights[0][class_idx] = class_weight
        self.weights[class_idx][0] = class_weight
    self.__name__ = 'w_categorical_crossentropy'

  def __call__(self, y_true, y_pred):
    return self.w_categorical_crossentropy(y_true, y_pred)

  def w_categorical_crossentropy(self, y_true, y_pred):
    nb_cl = len(self.weights)
    final_mask = K.zeros_like(y_pred[..., 0])
    y_pred_max = K.max(y_pred, axis=-1)
    y_pred_max = K.expand_dims(y_pred_max, axis=-1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in itertools.product(range(nb_cl), range(nb_cl)):
        w = K.cast(self.weights[c_t, c_p], K.floatx())
        y_p = K.cast(y_pred_max_mat[..., c_p], K.floatx())
        y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())
        final_mask += w * y_p * y_t
    return K.categorical_crossentropy(y_pred, y_true) * final_mask

# create a toy model
i = Input(shape=(100,))
h = Dense(7)(i)
o = Activation('softmax')(h)

model = Model(inputs=i, outputs=o)


# compile the model with custom loss
loss = WeightedCategoricalCrossEntropy({0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344})
model.compile(loss=loss, optimizer='sgd')
print "Compilation OK!"

# fit model
model.fit(np.random.random((64, 100)),np.random.random((64, 7)), epochs=10)

# save and load model
model.save('model.h5')
model = keras.models.load_model('model.h5', custom_objects={'w_categorical_crossentropy': WeightedCategoricalCrossEntropy})
print "Load OK!"

stergioc on 11 Aug 2017

👍1

Thanks to those who contributed code here. Helped me along a lot.

In the implementation of @asiron (which I tested because I needed to handle rank 3+ rank tensors), I believe a small error crept in relative to the upstream @tboquet implementation.

y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())

should be

y_t = K.cast(y_true[..., c_t], K.floatx())

otherwise the boolean logic is comparing y_pred with y_pred (instead of y_pred with y_true).

A slightly different point is that the way in which the class_weights dictionary is transformed into the weights matrix within WeightedCategoricalCrossEntropy does not seem consistent with what the original poster was trying to achieve, which is to specify pairwise weights for all combinations of true and predicted values. As it stands it populates only the 0th row and column penalising misclassifications of the 0th class as another class, or another class as the 0th class. Maybe better to supply the complete matrix instead? Just a thought. Thanks again to contributors.

sry002 on 23 Aug 2017

👍4

Question, do the classes need to be in one-hot representations for @asiron code?

And, @sry002, I think you are right.

nd26 on 15 Sep 2017

whilst the @asiron code with @sry002 alteration seems to 'work' for me.

it is not only considerably slower than not weighting the loss, as well as forcing my computer out of memory.
I think thought this is just a case of me using too complex a model, with too much input data examples, for my lowly desktop to handle :(

@nd26 looking at the code above your comment
WeightedCategoricalCrossEntropy({0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344})

I think this suggests no, you dont pass the classes into WeightedCategoricalCrossEntropy as one-hot representations.
(Unless you mean should the output matrix passed into model.fit be one-hot. I think these should still be one-hot)

ThePianoDentist on 17 Oct 2017

Would there be a way to pass in weights that are different for each sample and give each individual sample item a weight if predicted accurately or not? I.e. different payoffs depending on the item?

dickreuter on 24 Nov 2017

Hi @dickreuter, I've managed to pass a weight for each sample just by adding a new layer into the clasification (y_true). Then, I modified the objective and metrics functions to properly unravel the weights before computing the operations.

curiale on 24 Nov 2017

👍1

Do you have an example how this looks like? How do I split the tensor in the loss function to extract y_true and the weights?

dickreuter on 25 Nov 2017

@tboquet Have you tested your code?
Seems like you need wrapper around partial to make things work, like described here http://louistiao.me/posts/adding-__name__-and-__doc__-attributes-to-functoolspartial-objects/

In my case I have tried weighted binary crossentropy:

from functools import partial, update_wrapper

def wrapped_partial(func, *args, **kwargs):
    partial_func = partial(func, *args, **kwargs)
    update_wrapper(partial_func, func)
    return partial_func

def binary_crossentropy_weigted(y_true, y_pred, class_weights):
    y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon())
    loss = K.mean(class_weights*(-y_true * K.log(y_pred) - (1.0 - y_true) * K.log(1.0 - y_pred)),axis=-1)
    return loss

custom_loss = wrapped_partial(binary_crossentropy_weigted, class_weights=np.array([1.0, 2.0]))

model.compile(optimizer=Adadelta(), loss=[custom_loss])

mrgloom on 12 Dec 2017

Sorry for my late response @dickreuter. If you want to weight the batch with a unique spatial weight I recommend to use a similar option as the proposed by @stergioc instead of just a wrapped function. However, if you want to weight each sample in the batch with a particular weight you need to pass the weight inside the y_true. I didn't find another way to do that because it was imposible to me to identify the samples inside the batch. Just an example of what I did is:

class WeightedLoss(object):

  def __init__(self, alpha):
    self.alpha = alpha
    if K.image_dim_ordering() == 'th':
        self.stack_axis = 1
    else:
        self.stack_axis = -1
    self.__name__ = 'w_loss'


  def __call__(self, y_true, y_pred):
    return self.w_loss(y_true, y_pred)

  def w_loss(self, y_true, y_pred):
    # y_true should has the weight concatenated in the last dimension
    slice_stack = [slice(None) for i in range(y_true.get_shape().ndims)]
    slice_stack[self.stack_axis] = slice(2, None)
    weights = y_true[slice_stack]

    slice_stack[self.stack_axis] = slice(0,2)
    y_true = y_true[slice_stack]
    ........

curiale on 12 Dec 2017

Is there any way to use the weights for binary_crossentropy only for misclassification? The examples above use class weights but I only want to use the weight when a misclassification occurs

JoaoLages on 21 Dec 2017

👍3

Hey all,
I'm using the weighted categorical cross entropy function described above by @ayalalazaro but it doesn't seem to work as expected. My understanding is if I pass a weight array of just 1's, then it should replicate what normally happens with Keras' categorical cross entropy. But that's not what I'm seeing. Here's some example code:

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    cross_ent = K.categorical_crossentropy(y_pred, y_true, from_logits=False)
    return cross_ent * final_mask

w_array = np.ones((2,2))
custom_loss = partial(w_categorical_crossentropy, weights=w_array)
custom_loss.__name__ ='w_categorical_crossentropy'

default_model = Sequential([
    Dense(128, input_shape=(20,), activation="relu"),
    BatchNormalization(axis=1),
    Dropout(0.6),
    Dense(2, activation="sigmoid")
])
default_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=["accuracy"])
default_model.optimizer.lr = 0.001
default_model.fit(x=trainable_data.values, y=train_target.values, validation_split=0.1, epochs=20, shuffle=True, batch_size=64)
## Epoch 20/20
## 2018/2018 [==============================] - 0s 73us/step - loss: 0.6188 - acc: 0.6571 ## - val_loss: 0.6402 - val_acc: 0.6222


# THEN USE CUSTOM LOSS, WHICH SHOULD BE THE SAME
custom_model = Sequential([
    Dense(128, input_shape=(20,), activation="relu"),
    BatchNormalization(axis=1),
    Dropout(0.6),
    Dense(2, activation="sigmoid")
])
custom_model.compile(optimizer='rmsprop', loss=custom_loss, metrics=["accuracy"])
custom_model.optimizer.lr = 0.001
## Epoch 20/20
## 2018/2018 [==============================] - 0s 90us/step - loss: 1.0241e-04 - acc: ## 0.6065 - val_loss: 3.9465e-06 - val_acc: 0.6089

Notice that the custom model pretty quickly gets to essentially zero loss. Which sounds cool, except it doesn't make any sense, and really it means my model stopped learning anything new after only a few epochs. It may be worth noting that I only actually have 2 classes here. I want to weight mis-classifications higher, and thought I could do so with the code above. But it doesn't seem to work. Anyone have any ideas for how I can weight mis-classifications higher on a binary problem?

blakewest on 1 Jan 2018

👍1

@ayalalazaro OK, so I found the error. A silly, but big one. The function listed above returns K.categorical_crossentropy(y_pred, y_true). But I checked the source code here, and that flips the arguments. The real signature is K.categorical_crossentropy(y_true y_pred, from_logits=False). Truth goes first, then predictions.

Once I made that switch, it started working!

blakewest on 1 Jan 2018

👍13 ❤3 🎉3 😄2

Hi,
I use Keras 2.0.8 and Python 2.7.12
I tried to run this and get the output

$ python testt.py
Using TensorFlow backend.
60000 train samples
10000 test samples
Traceback (most recent call last):
File "testt.py", line 69, in
model.compile(loss=ncce, optimizer=rms)
File "build/bdist.linux-x86_64/egg/keras/models.py", line 784, in compile
File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 850, in compile
File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 450, in weighted
File "testt.py", line 29, in w_categorical_crossentropy
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 881, in r_binary_op_wrapper
return func(x, y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1088, in _mul_dispatch
return gen_math_ops._mul(x, y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1449, in _mul
result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 589, in apply_op
param_name=input_name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 60, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'x' has DataType bool not in list of allowed values: float16, float32, float64, uint8, int8, uint16, int16, int32, int64, complex64, complex128

enikkari on 5 Jan 2018

👍6

Is this still the best approach? @fchollet just to recap the problem: having a classification problem in which i have images of cats, dogs and snakes I need to penalize twice as much the case in which a snake is classified as dog than the other cases. Do we really need to go through partial to do this?

davideboschetto on 8 Jan 2018

I want to build a binary classifier that does the following with one input neuron (giving x) and one output neuron:

If the output neuron is 0: the payoff is 0
If the output neuron is 1 and correct: the payoff is +1
If the output neuron is 1 and incorrect: the payoff is -x (x is different for each individual sample)
How can I maximize the payoff with a neural network?

How can I create a loss function that would do that? Can I use keras directly or do I need a custom loss function? Does the loss function have to be differentiable? Can I use binary cross entropy or even mse?

dickreuter on 9 Feb 2018

@dickreuter you can do this with keras, but you need a custom loss function. And loss functions always minimize a number, so if you want to "maximize" a payoff, you should just flip your payoffs and make them negative. Then the optimizer will make it the most negative it can, which is equivalent to maximizing.
Now, wanting a different payoff for each X sounds tricky. Probably possible by doing some sorcery where you set shuffle to False, and keep track of the batches or something, but I'm not sure exactly how. Could you use an average or median? If so, then you can use the code listed above in this issue to create the custom loss function, and then just minimize it. You might at least try the average/median approach, and see if it helps your problem. If it does, then you could investigate further optimization by trying to get a different loss for each X sample.

blakewest on 9 Feb 2018

No, I can't take the average or median as each sample has distinct features (I say it's has just one input neuron, but in reality, there are additional input neurons).

dickreuter on 9 Feb 2018

I know. I meant use the average/median for your loss function. I did not mean change your X's. Just pick some payoff for each X that is a reasonable default guess. I don't know your domain, so I can't comment further. But was just saying, if the custom loss function you're talking about will actually improve your model, then it would likely still improve it (over simply binary cross entropy) even if you use an average or a median. If you see improvement over binary cross entropy, then you can try to optimize further by figuring out how to have a custom payoff for each sample.

blakewest on 9 Feb 2018

I don't think this would work in my case, as the model would need to punish large negative payoffs more than small positive ones, so that the payoff can be maximized.

dickreuter on 9 Feb 2018

Let me rephrase the problem again:
Is there a way in keras or tensorflow to give samples an extra weight if they are incorrectly classified only. i. e. a combination of class weight and sample weight but only apply the sample weight for one of the outcome in a binary class (averaging is not an option)? How can this be achieved?

dickreuter on 10 Feb 2018

@curiale I have an issue that seems to have no straight forward solution in Keras. My server runs on ubuntu 14.04, Keras with backend tensorflow. It has 4 Nvidia Geforce gtx1080 GPUs.

I am trying to test the best available implementation of weighted categorical cross entropy(https://github.com/keras-team/keras/issues/2115)(curiale commented on Jan20,2017).

The input array Xtrain is of shape (800,40) where 800 indicates the number of samples and 40 represents the input feature dimension. Similarly Xtest is of shape (400,40). The problem is of a multiclass scenario where the number of classes is three. Following code is used to implement but an error is showing up indicating a GPU and batchsize mismatch, which is difficult to address, please provide some pointers to address this.

import keras
from keras.models import Sequential, Model, load_model
from keras.layers.embeddings import Embedding
from keras.layers.core import Activation, Dense, Dropout, Reshape
from keras.optimizers import SGD, Adam, RMSprop
#from keras.layers import TimeDistributed,Merge, Conv1D, Conv2D, Flatten, MaxPooling2D, Conv2DTranspose, UpSampling2D, RepeatVector
#from 

keras.layers.recurrent import GRU, LSTM
#from keras.datasets.data_utils import get_file
#import tarfile
from functools import partial, update_wrapper
from keras.callbacks import TensorBoard
from time import time
from sklearn.model_selection import KFold
import numpy as np
from keras.callbacks import EarlyStopping
import tensorflow as tf
import scipy.io
from keras import backend as K
from keras.layers import Input, Lambda
import os
from keras import optimizers
from matplotlib import pyplot
from sklearn.preprocessing import MinMaxScaler
#os.export CUDA_VISIBLE_DEVICES="0,1"
import keras, sys
from matplotlib import pyplot
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
#from keras.utils import np_utils
from itertools import product
from keras.layers import Input

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = weights.shape[1]#len(weights[0,:])
    print weights.shape
    print nb_cl
    print y_pred
    print y_true
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)#returns maximum value along an axis in a tensor
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] *y_pred_max_mat[:, c_p]*y_true[:, c_t])
    #ypred_tensor=K.constant(y_pred,dtype=K.set_floatx('float32'))
    #ytrue_tensor=K.constant(y_true,dtype=K.set_floatx('float32'))
    return K.categorical_crossentropy(y_true,y_pred) * final_mask

def get_mat_data(add,in1,in2):
    # Assuming sample_matlab_file.mat has 2 matrices A and B
    matData = scipy.io.loadmat(add)
    matrixA = matData[in1]
    matrixA1 = matData[in2]
    matrixB = matData['Ytrain']
    matrixB1 = matData['Ytest']
    weights = matData['w']
    matrixC = matData['Ytrainclassify']
    matrixC1 = matData['Ytestclassify']
    nfold = matData['nfold']
    return matrixA, matrixA1, matrixB, matrixB1, weights, matrixC, matrixC1, nfold 
def wrapped_partial(func, *args, **kwargs):
    partial_func = partial(func, *args, **kwargs)
    update_wrapper(partial_func, func)
    return partial_func

def gen_model():
    input = Input(shape=(40,))  
    #m1=Sequential()
    # m1.add(conv_model)
    # #m1.add(Conv2D(15, (5,5), strides=(1, 1),activation='relu', input_shape=(1,30,125), kernel_initializer='glorot_uniform'))#temporal filters theano
    # m1.add(Dropout(0.2))
    # #m1.add(Conv2D(15, (5,1), strides=(1, 1),activation='relu',kernel_initializer='glorot_uniform'))#spatial filters
    # #m1.add(Dropout(0.2))
    # m1.add(Flatten())
    # m1.add(Dropout(0.2))
    x1 =(Dense(200,activation='relu',name='dense_1'))(input)
    x2 =(Dropout(0.2))(x1)
    x3 =(Dense(100,activation='relu',name='dense_2'))(x2)
    x4 =(Dropout(0.2))(x3)
    x5 =(Dense(3,activation='softmax',name='softmax_layer'))(x4)
    model = Model(input=input, output=[x5])
    return model

    in1 = 'Xtrain'
    in2 = 'Xtest'
    add = '/home/tharun/all_mat_files/test_keras.mat'
    Xtrain, Xtest, Ytrain, Ytest, weights, Ytrainclassify, Ytestclassify, nfold = get_mat_data(add,in1,in2)
    nb_classes = 3
    print Xtrain.shape, Xtest.shape, Ytrain.shape, Ytest.shape, weights.shape,Ytrainclassify.shape, Ytestclassify.shape
    wts = np.array([[1/weights[:,0], 1, 1],[1, 1/weights[:,1], 1],[1, 1, 1/weights[:,2]]])
    print 'wts:' 
    print wts.shape
    # convert class vectors to binary class matrices
    Y_train = keras.utils.to_categorical(Ytrainclassify[:,None], nb_classes)
    Y_test = keras.utils.to_categorical(Ytestclassify[:,None], nb_classes)
    Xtrain=Xtrain.astype('float32')
    Xtest=Xtest.astype('float32')

    print Xtrain.shape
    print Y_train.shape
    print Xtest.shape
    print Y_test.shape
    ncce = wrapped_partial(w_categorical_crossentropy, wts)
    batch_size = 10
    nb_classes = 3
    nb_epoch = 1
    model=gen_model()
    #model.compile(loss=ncce, optimizer="adam")
    model.summary()
    rms = SGD()
    model.compile(loss=ncce, optimizer=rms)

    model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
    score = model.evaluate(Xtest, Y_test)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])

    #saving weights
    model.save('model_classify_weights.h5')

Error:

python /home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py 

/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
(800, 40) (400, 40) (800, 1) (400, 1) (1, 3) (800, 1) (400, 1)
wts:
(3, 3)
(800, 40)
(800, 3)
(400, 40)
(400, 3)
/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py:129: UserWarning: Update your `Model` call to the Keras 2 API: `Model(outputs=[<tf.Tenso..., inputs=Tensor("in...)`
  model = Model(input=input, output=[x5])
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 40)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               8200      
_________________________________________________________________
dropout_1 (Dropout)          (None, 200)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 100)               20100     
_________________________________________________________________
dropout_2 (Dropout)          (None, 100)               0         
_________________________________________________________________
softmax_layer (Dense)        (None, 3)                 303       
=================================================================
Total params: 28,603
Trainable params: 28,603
Non-trainable params: 0
_________________________________________________________________
(?, 3)
3
Tensor("softmax_layer_target:0", shape=(?, ?), dtype=float32)
[[array([1.41292294]) 1 1]
 [1 array([7.328564]) 1]
 [1 1 array([2.38611435])]]
/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py:176: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
  model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
Epoch 1/1
2018-02-13 15:41:44.382214: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-02-13 15:41:44.758387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:05:00.0
totalMemory: 7.92GiB freeMemory: 7.42GiB
2018-02-13 15:41:44.992640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:06:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.225696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 2 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:09:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.458070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 3 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:0a:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.461078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-02-13 15:41:45.461151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 2 3 
2018-02-13 15:41:45.461160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y Y Y Y 
2018-02-13 15:41:45.461165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   Y Y Y Y 
2018-02-13 15:41:45.461170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 2:   Y Y Y Y 
2018-02-13 15:41:45.461175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 3:   Y Y Y Y 
2018-02-13 15:41:45.461191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:05:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:06:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: GeForce GTX 1080, pci bus id: 0000:09:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: GeForce GTX 1080, pci bus id: 0000:0a:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
    main()
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 176, in main
    model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1598, in fit
    validation_steps=validation_steps)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1183, in _fit_loop
    outs = f(ins_batch)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2273, in __call__
    **self.session_kwargs)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [3] vs. [10]
     [[Node: training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/softmax_layer_loss/mul_20"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape, training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape_1)]]
     [[Node: loss/mul/_19 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_806_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op u'training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs', defined at:
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
    main()
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 176, in main
    model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1575, in fit
    self._make_train_function()
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 960, in _make_train_function
    loss=self.total_loss)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/optimizers.py", line 156, in get_updates
    grads = self.get_gradients(loss, params)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/optimizers.py", line 73, in get_gradients
    grads = K.gradients(loss, params)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2310, in gradients
    return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
    return grad_fn()  # Exit early
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in <lambda>
    grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_grad.py", line 742, in _MulGrad
    rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 532, in _broadcast_gradient_args
    "BroadcastGradientArgs", s0=s0, s1=s1, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

...which was originally created as op u'loss/softmax_layer_loss/mul_20', defined at:
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
    main()
  File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 174, in main
    model.compile(loss=ncce, optimizer=rms)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 850, in compile
    sample_weight, mask)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 466, in weighted
    score_array *= weights
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 894, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1117, in _mul_dispatch
    return gen_math_ops._mul(x, y, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2726, in _mul
    "Mul", x=x, y=y, name=name)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [3] vs. [10]
     [[Node: training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/softmax_layer_loss/mul_20"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape, training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape_1)]]
     [[Node: loss/mul/_19 = _Recv[client_terminated=false, recv_device="/job:loc

tharuniitk on 13 Feb 2018

hey, i have an imbalanced data set. I was hoping to use the weighted cost to help with classification since it would always end up predicting only one outcome(in my case 0). I was hoping for some help in building the cost matrix. I have 3 classes 1:1270, 0:7145. -1:1260 so from the above examples it would be a 3 by 3 matrix, picking the values to fill the matrix is the problem?

if i could also penalize wrong prediction of 1 as -1 or vice versa that would be great

marleymwangi on 22 May 2018

In my case, lambda function worked fine.

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.expand_dims(y_pred_max, 1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask


w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2

loss = lambda y_true, y_pred: w_categorical_crossentropy(y_true, y_pred, weights=w_array)

model.compile(loss=loss,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

machisuke on 27 May 2018

this is my code , although it's a bit messy , it seems to work with RNNs as well :D

def getLoss(weights, rnn=True):
    def w_categorical_crossentropy(y_true, y_pred):
        nb_cl = len(weights)
        if(not rnn):
            final_mask = K.zeros_like(y_pred[:, 0])
            y_pred_max = K.max(y_pred, axis=1)
            y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
            y_pred_max_mat = K.equal(y_pred, y_pred_max)
            for c_p, c_t in product(range(nb_cl), range(nb_cl)):
                final_mask += ( weights[c_t, c_p] * K.cast(y_pred_max_mat, tf.float32)[:, c_p] * K.cast(y_true, tf.float32)[:, c_t]  )
            return K.categorical_crossentropy(y_pred, y_true) * final_mask 
        else:
            final_mask = K.zeros_like(y_pred[:, :,0])
            y_pred_max = K.max(y_pred, axis=2)
            y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], K.shape(y_pred)[1], 1))
            y_pred_max_mat = K.equal(y_pred, y_pred_max)
            for c_p, c_t in product(range(nb_cl), range(nb_cl)):
                final_mask += ( weights[c_t, c_p] * K.cast(y_pred_max_mat, tf.float32)[:, :,c_p] * K.cast(y_true, tf.float32)[:, :,c_t]  )
            return K.categorical_crossentropy(y_pred, y_true) * final_mask       
    return w_categorical_crossentropy

pooriaPoorsarvi on 19 Jul 2018

Are there any plans of integrating this feature into keras itself? Since we already have sample weighting in fitting, this seems to be a logical extension to the standard featureset.

Although there do not seem to be any parameterized loss functions, currently.

xiamaz on 28 Aug 2018

@machisuke shouldn't it be return
K.categorical_crossentropy(y_true, y_pred) * final_mask
instead of
K.categorical_crossentropy(y_pred, y_true) * final_mask

as @blakewest pointed out from the Keras source code?

SwapnilBorse123 on 8 Oct 2018

Greetings!
I have a binary classification at hand where I intend to penalize the FN. I am okay with more FP but want a really low number of FN.

I have used the custom loss function along with lambda as mentioned in the comments above.

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
    y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):
        final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
    return K.categorical_crossentropy(y_pred, y_true) * final_mask


w_array = np.ones((2,2))
w_array[1,0] = 2.5 # penalizing FN
w_array[0,1] = 2.5 # penalizing FP

loss = lambda y_true, y_pred: w_categorical_crossentropy(y_true, y_pred, weights=w_array)

classifier.compile(optimizer = sgd, loss = loss, metrics = ['accuracy'])

After doing this, the number of FN seem to be more or less similar to what I had with all the weights in the w_array being 1. What am I getting wrong here? Any kind of pointer/help would be greatly appreciated. @ayalalazaro @tboquet @curiale @machisuke

FOLLOW UP: If i simply comment out the second assignment in w_array, does it mean I am only penalizing the FN and not the FP.

w_array[1,0] = 2.5 # penalizing FN
#w_array[0,1] = 2.5 # penalizing FP

SwapnilBorse123 on 8 Oct 2018

In my case, lambda function worked fine.

def w_categorical_crossentropy(y_true, y_pred, weights):
    nb_cl = len(weights)
    final_mask = K.zeros_like(y_pred[:, 0])
    y_pred_max = K.max(y_pred, axis=1)
    y_pred_max = K.expand_dims(y_pred_max, 1)
    y_pred_max_mat = K.equal(y_pred, y_pred_max)
    for c_p, c_t in product(range(nb_cl), range(nb_cl)):

        final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
    return K.categorical_crossentropy(y_pred, y_true) * final_mask


w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2

loss = lambda y_true, y_pred: w_categorical_crossentropy(y_true, y_pred, weights=w_array)

model.compile(loss=loss,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

Can anyone explain weight matrix, like how I could put weight for second and third class?

zaher88abd on 26 Nov 2018

@SwapnilBorse123 In the weights matrix the row index is the true class, and the col index is the predicted class.
So if you want to penalize more 0's confused with 1's (meaning the true label is 0, but the model predicted 1) than you should put a high weight on the index [0, 1].

So you are correct

w_array[1,0] = 2.5 # penalizing FN
#w_array[0,1] = 2.5 # penalizing FP

penalizes only FN and not FP.

GalAvineri on 27 Nov 2018

@zaher88abd Say that you have 3 available classes.
Than you would start by defining a 3x3 matrix

w_array = np.ones((3, 3))

Than you can add the weights you'd like to have.
As I said in the comment above,.
w_array[i, j] defines the weight for an example of class i falsely classified as class j.

e.g if you would like to higher penalize examples of class 2 falsely classified as class 3, you could do

w_array[2, 3] = high_weight

If you would like your model to overall put more ephasis on a certain class, you could put high weights on all occurrences of that class.

For example if you'd like to put an overall emphasis on class 2 you could do the following:

w_array[2, :] = high_weight

This will penalize every mistake made with an example with class 2.
But notice that this assignment also includes

w_array[2, 2] = high_weight

This means that this will also penalize an example of class 2 which was labeled correctly but with low confidence.

This behavior may or may not fit you needs.
If you would like to avoid that behavior, you could just do the following:

w_array[2, :] = high_weight
w_array[2, 2] = 1 # restore the original weight

GalAvineri on 27 Nov 2018

For anyone who is still trying to figure out what is going on in the weighted crossentropy loss function, I made an analogous example in numpy. The doc string at the bottom explains what is going on at each step. Adding in some print statements to see what the arrays look like and what their shapes are is a lot easier here than in keras :)

import numpy as np

y_pred = np.array([[[0.6,0.4],[0.3,0.7],[0.1,0.9],[0.23,0.77]],
                 [[0.3,0.7],[0.21,0.79],[0.99,0.01],[0.23,0.77]], 
                 [[0.1,0.9],[0.88,0.12],[0.33,0.67],[0.11,0.89]]])
y_true = np.array([[[1,0],[1,0],[0,1],[0,1]], 
                 [[0,1],[0,1],[1,0],[1,0]], 
                 [[0,1],[1,0],[1,0],[0,1]]])
final_mask = np.zeros_like(y_pred[:, 0])
y_pred_max = np.max(y_pred, axis=1)
y_pred_max = np.expand_dims(y_pred_max, 1)
y_pred_max_mat = np.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(n_classes), range(n_classes)):
    final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
#return K.categorical_crossentropy(y_pred, y_true) * final_mask

"""
1:  y_pred, dim = (3, 4 ,2), type = float
    Your predicted labels with shape (n_images, n_pixels, n_classes)
    Note: n_pixels is your image height * width, if your image is square you can 
    take the square root of n_pixels to get the dimensions
2:  y_true, dim = (3, 4, 2), type = bool
    Your predicted labels with shape (n_images, n_pixels, n_classes)
3:  final_mask, dim = (4, 2), type = zeros
    Dimensions are (n_pixels, n_classes)
    Defines zero array that will be added to when constructing final mask
4:  y_pred_max, dim = (3, 2), type = float
    Dimensions are (n_images, n_classes)
    This is the max probability predicted for each class for any pixel in the image
5:  y_pred_max, dim = (3,1,2), type = float
    Dimensions are (n_images, ., n_classes)
    Reshapes output of previous step so it can be broadcast across y_pred_max
6:  y_pred_max_mat, dim = (3, 4 ,2), type = bool
    This is the evaluation of output(4) == output(5) where 5 is broadcasted across pixels
    Marks highest class probability for each class in each images as True
7:  Iterates over all possible outcomes of predicted = [0, ..., n_classes] and true = [0, ..., n_classes]
8:  updates final_mask, dim = (4, 2), type = int (adding 0 or 1 to each cell in each iteration)
    Multiplies output(6) by true labels and weight for the outcome of the iteration
    Rows are images and columns are adjusted class weights
9:  Weights are now applied to the the crossentropy loss of the original predictions and labels
    Commented out because numpy equivalent isn't represented by a single function
"""

Also, it was mentioned above but a bit out of context: a good method for calculating weights to use is in #5116. I made a simpler numpy implementation for myself that estimates the weights from the training label images. This assumes that the training label images are stored as a single channel image with a number in range(n_classes) representing its class.

import cv2
import glob
import numpy as np

def calc_class_proportions(dir_train_labels, n_classes):
    img_paths = glob.glob(dir_train_labels + '*.png')
    class_counts = np.zeros(n_classes)
    for img_path in img_paths:
        label_img = cv2.imread(img_path)[:,:,0]
        img_label_counts = np.unique(label_img, return_counts = True)
        class_counts = np.add(class_counts, img_label_counts[1])
    class_proportions = class_counts / np.sum(class_counts)
    return class_proportions

def calc_class_weights(dir_train_labels, n_classes, scale = None):
    class_props = calc_class_proportions(dir_train_labels, n_classes)
    if scale == 'log':
        weights = np.log(1 / class_props)
    else: 
        max_prop = np.max(class_props)
        weights = max_prop / class_props
    return weights

kozemzak on 6 Dec 2018

👍10

When I want to convert it from categorical_crossentropy to binary_crossentropy it popped up the dimension errors, which is weird since I do not change other parts of the model.

So I am trying to use categorical_crossentropy to implement the binary classification logic. In this test I changed the sigmoid function to softmax and it turned out that the evaluation metric (F1) did not change anymore even I tried different weight. Anyone could help? Did I miss something when implementing the binary classification logic?

Realvincentyuan on 20 Apr 2019

Adding to the solution above. With the new keras version now you can just override the respective loss function as given below.

    from tensorflow.python import keras
    from itertools import product
    import numpy as np
    from tensorflow.python.keras.utils import losses_utils

    class WeightedCategoricalCrossentropy(keras.losses.CategoricalCrossentropy):

        def __init__(
            self,
            weights,
            from_logits=False,
            label_smoothing=0,
            reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE,
            name='categorical_crossentropy',
        ):
            super().__init__(
                from_logits, label_smoothing, reduction, name=f"weighted_{name}"
            )
            self.weights = weights

        def call(self, y_true, y_pred):
            weights = self.weights
            nb_cl = len(weights)
            final_mask = keras.backend.zeros_like(y_pred[:, 0])
            y_pred_max = keras.backend.max(y_pred, axis=1)
            y_pred_max = keras.backend.reshape(
                y_pred_max, (keras.backend.shape(y_pred)[0], 1))
            y_pred_max_mat = keras.backend.cast(
                keras.backend.equal(y_pred, y_pred_max), keras.backend.floatx())
            for c_p, c_t in product(range(nb_cl), range(nb_cl)):
                final_mask += (
                    weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
            return super().call(y_true, y_pred) * final_mask

SpikingNeuron on 7 May 2019

👍3

Hi @GalAvineri , @kozemzak ,
I am a little bit confused on what purpose of weighted crossentropy loss function. Is it for misclassification (eg. MNIST case, class "1" is misclassified as "7") or for imbalanced dataset (eg. too much images under class "1" compared to "7", etc) . Thanks.

sudonto on 8 May 2019

There are other reasons why you might want to weight the individual samples. For example if they yield a custom payoff.

On 8 May 2019, at 20:02, sudonto notifications@github.com wrote:

Hi @GalAvineri , @kozemzak , I am a little bit confused on what purpose of weighted crossentropy loss function. Is it for misclassification (eg. MNIST case, class "1" is misclassified as "7") or for imbalanced dataset (eg. too much images under class "1" compared to "7", etc) . Thanks.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

dickreuter on 8 May 2019

👍1

@sudonto the weighted crossentropy gives different weights to different misclassificiations by definition.
So it is not it's purpose, rather its just what it does by definition.
I guess that there could be multiple purposes to why you would like to use this loss as @dickreuter said, and one of them is indeed when you have imbalanced dataset.

GalAvineri on 10 May 2019

👍1

@tboquet thanks for this code.
In addition to the mistake found by @blakewest in this https://github.com/keras-team/keras/issues/2115#issuecomment-354678974,
I found something else in your code:

It goes for c_p, c_t but then refers to weights[c_t, c_p]. (different order)

It's easy to miss in your example, because your weights matrix is symmetric.
But really, weights[1,7] is used instead of weights[7,1] and vice versa.

The fix is simple, just switch the order in either of them (but not both).
The convention I'm familiar with uses axis 0 as the "class truth", so I fix via for c_t, c_p.

eliadl on 25 Aug 2019

TypeError: Value passed to parameter 'x' has DataType bool not in list of allowed values: float16, float32, float64, uint8, int8, uint16, int16, int32, int64, complex64, complex128

@enikkari this error can be resolved by adding another line after:

y_pred_max_mat = K.equal(y_pred, y_pred_max)

as following:

y_pred_max_mat = K.equal(y_pred, y_pred_max)
y_pred_max_mat = K.cast(y_pred_max_mat, 'float32')

eliadl on 27 Aug 2019

Also, to prevent a row in y_pred like [.4, .4, .2] being encoded into [1, 1, 0], this:

y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.equal(y_pred, y_pred_max)

can be replaced with a more robust (and intuitive) code:

y_pred_arg_max = K.argmax(y_pred)
y_pred_max_mat = K.one_hot(y_pred_arg_max, num_classes=y_pred.shape[1])

Another added value of this, is it no longer requires to follow with the K.cast fix above.

eliadl on 28 Aug 2019

Adding to the class solution by @SpikingNeuron here in https://github.com/keras-team/keras/issues/2115#issuecomment-490079116
here's a more robust and vectorized implementation:

import tensorflow.keras.backend as K
from tensorflow.keras.losses import CategoricalCrossentropy


class WeightedCategoricalCrossentropy(CategoricalCrossentropy):

    def __init__(self, cost_mat, name='weighted_categorical_crossentropy', **kwargs):
        assert cost_mat.ndim == 2
        assert cost_mat.shape[0] == cost_mat.shape[1]

        super().__init__(name=name, **kwargs)
        self.cost_mat = K.cast_to_floatx(cost_mat)

    def __call__(self, y_true, y_pred, sample_weight=None):
        assert sample_weight is None, "should only be derived from the cost matrix"

        return super().__call__(
            y_true=y_true,
            y_pred=y_pred,
            sample_weight=get_sample_weights(y_true, y_pred, self.cost_mat),
        )


def get_sample_weights(y_true, y_pred, cost_m):
    num_classes = len(cost_m)

    y_pred.shape.assert_has_rank(2)
    y_pred.shape[1:].assert_is_compatible_with(num_classes)
    y_pred.shape.assert_is_compatible_with(y_true.shape)

    y_pred = K.one_hot(K.argmax(y_pred), num_classes)

    y_true_nk1 = K.expand_dims(y_true, 2)
    y_pred_n1k = K.expand_dims(y_pred, 1)
    cost_m_1kk = K.expand_dims(cost_m, 0)

    sample_weights_nkk = cost_m_1kk * y_true_nk1 * y_pred_n1k
    sample_weights_n = K.sum(sample_weights_nkk, axis=[1, 2])

    return sample_weights_n

Usage:

model.compile(loss=WeightedCategoricalCrossentropy(cost_matrix), ...)

Similarly, this can be applied for the CategoricalAccuracy metric too:

from tensorflow.keras.metrics import CategoricalAccuracy


class WeightedCategoricalAccuracy(CategoricalAccuracy):

    def __init__(self, cost_mat, name='weighted_categorical_accuracy', **kwargs):
        assert cost_mat.ndim == 2
        assert cost_mat.shape[0] == cost_mat.shape[1]

        super().__init__(name=name, **kwargs)
        self.cost_mat = K.cast_to_floatx(cost_mat)

    def update_state(self, y_true, y_pred, sample_weight=None):
        assert sample_weight is None, "should only be derived from the cost matrix"

        return super().update_state(
            y_true=y_true,
            y_pred=y_pred,
            sample_weight=get_sample_weights(y_true, y_pred, self.cost_mat),
        )

Usage:

model.compile(metrics=[WeightedCategoricalAccuracy(cost_matrix), ...], ...)

eliadl on 12 Sep 2019

👍4

In addition to w_arry given by @tboquet in the post above, how to construct the cost_matrix?
For ex. for binary classification,
w_array = np.ones((2,2))
w_array[1,2] = 5.0 (to penalize 1s being mis classified.
y_true and y_pred are the targets.

can somebody help please?

girigk on 23 Sep 2019

@zaher88abd Say that you have 3 available classes.
Than you would start by defining a 3x3 matrix
w_array = np.ones((3, 3))
Than you can add the weights you'd like to have.
As I said in the comment above,.
w_array[i, j] defines the weight for an example of class _i_ falsely classified as class _j_.

e.g if you would like to higher penalize examples of class 2 falsely classified as class 3, you could do
w_array[2, 3] = high_weight
If you would like your model to _overall_ put more ephasis on a certain class, you could put high weights on all occurrences of that class.

For example if you'd like to put an overall emphasis on class 2 you could do the following:
w_array[2, :] = high_weight
This will penalize every mistake made with an example with class 2.
But notice that this assignment also includes
w_array[2, 2] = high_weight
This means that this will _also_ penalize an example of class 2 which was _labeled correctly but with low confidence_.

This behavior may or may not fit you needs.
If you would like to avoid that behavior, you could just do the following:
w_array[2, :] = high_weight
w_array[2, 2] = 1 # restore the original weight 
@GalAvineri
i want to put an overall emphasis on class2, ( I have 3 classes 0, 1, 2 ), in your opinion, i should give w[2][0] and w[2][1] a high weight, but Should I assign the same high weight to w[0][2] and w[0][1]??

ledakk on 3 Dec 2019

👍1

@eliadl I'm getting an unexpected keyword argument 'sample_weight'
tf python version r1.13

dest-dir on 4 Dec 2019

@dest-dir Please post a StackOverflow question with your code, and share the link here. I'll try assist there.

eliadl on 5 Dec 2019

@eliadl how I insert the cost matrix in another custom loss? Like focal loss

`class FocalLoss(tf.keras.losses.Loss):
def __init__(self, gamma=2.0, alpha=1.0,
reduction=tf.keras.losses.Reduction.AUTO, name='focal_loss'):

    super(FocalLoss, self).__init__(reduction=reduction,
                                    name=name)
    self.gamma = float(gamma)
    self.alpha = float(alpha)

def call(self, y_true, y_pred):

    epsilon = 1.e-9
    y_true = tf.convert_to_tensor(y_true, tf.float32)
    y_pred = tf.convert_to_tensor(y_pred, tf.float32)

    model_out = tf.add(y_pred, epsilon)
    ce = tf.multiply(y_true, -tf.math.log(model_out))
    weight = tf.multiply(y_true, tf.pow(
       tf.subtract(self.alpha, model_out), self.gamma))
    fl = tf.multiply(1., tf.multiply(weight, ce))
    reduced_fl = tf.reduce_max(ce, axis=1)
    return reduced_fl`

damhurmuller on 18 Dec 2019

@damhurmuller Please post a StackOverflow question with your code, and share the link here. I'll try assist there.

eliadl on 18 Dec 2019

For semantic segmentation, with:
Input (rgb) shape=(batch_size, width, height, 3)
Output (one-hot) shape=(batch_size, width, height, n_classes)
The weighted categorical crossentropy loss function is:

def weighted_categorical_crossentropy(weights):
    # weights = [0.9,0.05,0.04,0.01]
    def wcce(y_true, y_pred):
        Kweights = K.constant(weights)
        if not K.is_tensor(y_pred): y_pred = K.constant(y_pred)
        y_true = K.cast(y_true, y_pred.dtype)
        return K.categorical_crossentropy(y_true, y_pred) * K.sum(y_true * Kweights, axis=-1)
    return wcce

Usage:

loss = weighted_categorical_crossentropy(weights)
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(optimizer=optimizer, loss=loss)

mendi80 on 29 Dec 2019

😄4

@mendi80 Please, is your function right ?

yacine074 on 18 Apr 2020

@dest-dir , @eliadl
I encountered the same unexpected sample weight problem. I also ran into some issues when trying to save the entire model (in order to restore from interrupted training, including the optimizer state).

The sample weight problem seems to be solved by changing the magic function __call__'s to call. I also modified the return on call to multiply the output of super().call(y_t,y_p) by the return from get_sample_weights.

@eliadl - I think your approach, from what I understood, was to overwrite/overload rather than access the categorical crossentropy call method and pass in sample_weight as an expected parameter of this call; however, I couldn't figure out why this worked for you and not for us? (And, frankly, my python knowledge isn't really up for figuring this out!)

I utilised @SpikingNeuron's class code in order to get this working. I also changed the weight argument from a positional argument to a named argument as part of trying to get the model loading working

The loss class therefore became:

Class weighted_categorical_crossentropy(tensorflow.keras.losses.CategoricalCrossentropy):

  def __init__(
      self,
      *,
      weights,
      from_logits=False,
      label_smoothing=0,
      reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE,
      name='categorical_crossentropy',
  ):

      super().__init__(
          from_logits, label_smoothing, reduction, name=f"weighted_{name}"
      )
      self.weights = weights

  def call(self, y_true, y_pred):
     return super().call(y_true, y_pred) * get_sample_weights(y_true, y_pred, self.weights)

  def get_config(self):
    return {'weights': self.weights}

  @classmethod
  def from_config(cls, config):
    return cls(**config)


def get_sample_weights(y_true, y_pred, cost_m):
    num_classes = len(cost_m)

    cost_m = K.cast(cost_m, 'float32')
    y_pred.shape.assert_has_rank(2)
    assert(y_pred.shape[1] == num_classes)
    y_pred.shape.assert_is_compatible_with(y_true.shape)

    y_pred = K.one_hot(K.argmax(y_pred), num_classes)

    y_true_nk1 = K.expand_dims(y_true, 2)
    y_pred_n1k = K.expand_dims(y_pred, 1)
    cost_m_1kk = K.expand_dims(cost_m, 0)

    sample_weights_nkk = cost_m_1kk * y_true_nk1 * y_pred_n1k
    sample_weights_n = K.sum(sample_weights_nkk, axis=[1, 2])

    return sample_weights_n

Note the inclusion of:

  def get_config(self):
    return {'weights': self.weights}

  @classmethod
  def from_config(cls, config):
    return cls(**config)

This is necessary in order for the custom loss function to be registered with Keras for model saving.
I also included the following (after the class code) to make sure that this registration happens:

tf.keras.losses.weighted_categorical_crossentropy = weighted_categorical_crossentropy

Usage:

model.compile(
    optimizer='adam',
    loss={'output': weighted_categorical_crossentropy(weights=cost_matrix)
    )

Saving:

model.save(filepath,,save_format='tf')

Loading:

model = tf.keras.models.load_model(
    filepath,
    compile=True,
    custom_objects={
        'weighted_categorical_crossentropy': weighted_categorical_crossentropy(weights=cost_matrix)
        }
    )

Feedback welcome.
Hope this helps.

PhilAlton on 16 May 2020

@PhilAlton

__call__ accepts sample_weight and handles it inherently, while call doesn't. You had to provide your own implementation there. I didn't.
__call__ does access the categorical crossentropy call method, as my class inherits from CategoricalCrossentropy which uses the categorical_crossentropy function.
CategoricalCrossentropy.from_config is already implemented (or inherited) so there's no need to override it with the same code.
I'm not sure what you exactly mean by "this doesn't work for us". If you post a link to a StackOverflow question, I'll do my best to answer it without polluting this GitHub issue.
Your override of get_config doesn't account for arguments of base class. This does:

def get_config(self):
    return super().get_config().copy().update(
        {'weights': self.weights}
    )

eliadl on 17 May 2020

@eliadl - Thanks; SO Question

PhilAlton on 20 May 2020

👀1 🚀1 🎉1

@eliadl I'm getting an unexpected keyword argument 'sample_weight'
tf python version r1.13

@dest-dir as @PhilAlton found, the problem was __call__didn't match its original signature.

    def __call__(self, y_true, y_pred):

should have been this:

    def __call__(self, y_true, y_pred, sample_weight=None):

eliadl on 21 May 2020

Hi, I just got stumbled into this class weight matrix in a multi class classification problem with one of class as background and back ground getting predicted as positive is higly undesirable, reverse is not that critical. Is class weight matrix based loss function available in tf2 ? Does it actually work as expected? Thanks for above solns anyway.

jerinka on 31 Aug 2020

Hello does anyone know how to do this for sparse categorical crossentropy?

wildbillharris on 13 Sep 2020

Keras: Is there a way in Keras to apply different weights to a cost function?

Most helpful comment

All 90 comments

Related issues