Keras: a weighted custom loss for pixelwise classification

Created on 15 Apr 2017 · 19Comments · Source: keras-team/keras

Hi everyone,

This problem has been gnawing at me for days. I'm having trouble implementing a custom loss function in keras. I am trying to do semantic segmentation on grayscale images.

Brief Context

My fully-convolutional model is a U-Net. It outputs a tensor of predictions, which has a shape of (batch_size, height * width, num_classes). This is the code that produces this output:

conv10 = Reshape((self.num_class, self.img_rows * self.img_cols))(conv10)
conv10 = Permute((2, 1))(conv10)
softmax = Activation('softmax')(conv10)

I then compile the model with model.compile().

self.model.compile(optimizer=Adam(lr=self.learning_rate),
                   loss=self.pixelwise_crossentropy,
                   metrics=['accuracy'])

The Problem

The function 'self.pixelwise_crossentropy' is the custom loss function that I'm struggling with. This is the (non-working) code that I have so far.

def pixelwise_crossentropy(self, y_true, y_pred):
    """
    Pixel-wise cross-entropy loss for dense classification of an image.

    The loss of a misclassified `1` needs to be weighted
    `WEIGHT` times more than a misclassified `0` (only 2 classes).

    Inputs
    ----------------
    y_true: Correct labels of 3D shape (batch_size, img_rows*img_cols, num_classes).

    y_pred: Predicted softmax probabilities of each class for each img_rows*img_cols pixel.
            Same 3D shape as y_true.
    """

    # Copied and pasted from theano
    y_pred = T.clip(y_pred, self.epsilon, 1.0 - self.epsilon)
    y_pred /= y_pred.sum(axis=-1, keepdims=True)

    # Get cross-entropy losses for each pixel.
    pixel_losses = -tensor.sum(y_true * tensor.log(y_pred),
                               axis=y_pred.ndim - 1)

    # Make a weight array to scale cross-entropy losses for every pixel in mini-batch.
    weight_map = np.ones((self.img_rows * self.img_cols,), dtype=np.float32)

    # Cross-entropy loss of `1`s will be WEIGHT times greater than those of `0`s.
    weight_map[y_true[:, :, 1]==1] = self.WEIGHT

    # Return elementwise multiplication of losses with weight map.
    return pixel_losses * weight_map

I would really appreciate any help or brief pointers about going in the right direction. I'm sure I'm using Theano tensors incorrectly, and mixing numpy with Theano in a weird way.

Thanks!!!

stale

Source

wsxwd

Most helpful comment

Hi folks,

Perhaps this isn't exactly what you are looking for, but if you are trying to work with unbalanced data, you can take use Focal Loss (https://arxiv.org/abs/1708.02002).

Keras implementation is:
def focal_loss(target, output, gamma=2):
    output /= K.sum(output, axis=-1, keepdims=True)
    eps = K.epsilon()
    output = K.clip(output, eps, 1. - eps)
    return -K.sum(K.pow(1. - output, gamma) * target * K.log(output),
                  axis=-1)

If you are just trying to ignore some of the pixels as they are unknown, I'm using this loss function:

def ignore_unknown_xentropy(ytrue, ypred):
    return (1-ytrue[:, :, :, 0])*categorical_crossentropy(ytrue, ypred)

So all pixels with class 0 will have zero loss.

cassianokc on 19 Jan 2018

👍15 ❤7 🎉5 😄1

All 19 comments

I'm still awaiting training results to verify that it works as desired, but I simply removed the "axes" argument from tf.reduce_sum in the Keras.losses.categorical_crossentropy function. As set up, the function returns a matrix of the same shape as the image, where each element is the crossentropy for that pixel. I'm not entirely sure how Keras interprets that mathematically, but it's not ideal. The code below instead returns the sum of those elements, essentially treating each pixel in the image as a unique sample.

def pixelwise_crossentropy(target, output):
    output = tf.clip_by_value(output, 10e-8, 1. - 10e-8)
    return - tf.reduce_sum(target * tf.log(output))

This causes the loss to start out at a huge value (mine is in the billions), but that shouldn't make a difference if you're using an optimizer to vary your learning rate.

You can see what's going on mathematically using this numpy code:

import numpy as np

weights = [1,2]

target = np.array([ [[0.0,1.0],[1.0,0.0]],
                    [[0.0,1.0],[1.0,0.0]]])

output = np.array([ [[0.5,0.5],[0.9,0.1]],
                    [[0.9,0.1],[0.4,0.6]]])

crossentropy_matrix = -np.sum(target * np.log(output), axis=-1)
crossentropy = -np.sum(target * np.log(output))

JeffKo427 on 12 May 2017

👍3

@JeffKo427 Were you able to confirm whether it works? I am trying to do per-pixel semantic segmentation with unbalanced training data and I am looking for a way to balance the class inbalances by using weights for the classes in the loss function.

Zerphed on 30 May 2017

@Zerphed I've been experimenting with architecture and hyperparameters for the past two weeks and am confident that pixelwise_crossentropy works as advertised. I have not made use of class weighting in my training process but slight modification to the example code I provided above suggests that it's a simple matter of changing the last line to
crossentropy = -np.sum(target * weights * np.log(output))

JeffKo427 on 30 May 2017

👍2

@JeffKo427 Thanks! This is in fact what I am using right now. The losses seem quite bit, but I guess that was to be expected:

def weighted_pixelwise_crossentropy(class_weights):

    def loss(y_true, y_pred):
        epsilon = _to_tensor(_EPSILON, y_pred.dtype.base_dtype)
        y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)
        return - tf.reduce_sum(tf.multiply(y_true * tf.log(y_pred), class_weights))

    return loss

Zerphed on 30 May 2017

👍7

I thought I would post my function for calculating the class weights over a large dataset here:

def getClassWeights(self):
    if 'class_weights.pickle' not in self.cwd_contents:
        file_list = self.training_file_list + self.validation_file_list
        print("Calculating class weights for " + str(len(file_list)) + " images, this may take a while...")
        classcounts = [1]*self.num_classes
        c = 0
        for f in file_list:
            lbl = cv2.imread(f[1], 0)
            show = lbl*int(255/self.num_classes)
            cv2.putText(show, str(c) + '/' + str(len(file_list)), (50,50), cv2.FONT_HERSHEY_SIMPLEX, 1, 255)
            cv2.imshow('Processing...', show)
            cv2.waitKey(1)
            for i in range(self.num_classes):
                classcounts[i] += len(np.where(lbl == i)[0])
            c += 1
        total = sum(classcounts)
        class_weights = [0]*self.num_classes
        for i in range(self.num_classes):
            # See below for an explanation of ignore_classes.
            class_weights[i] = ignore_classes[i] * total / classcounts[i]
        self.class_weights = class_weights
        cv2.destroyAllWindows()
        cv2.waitKey(1)
        with open('class_weights.pickle', 'wb') as f:
            pickle.dump(self.class_weights, f, protocol=pickle.HIGHEST_PROTOCOL)
    else:
        with open('class_weights.pickle', 'rb') as f:
            self.class_weights = pickle.load(f)
    print(self.class_weights)

ignore_classes is a global list containing a 0 at the indices of the classes I do not want considered during training (such as the 'void' class in CityScapes) and 1 at the other indices.

JeffKo427 on 9 Jun 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 7 Sep 2017

I have tried this approach using Keras and am still coming up with a non working model. I have an unbalanced data set of active/nonactive pixels.

The final layers of my U-net is also:

conv10 = Reshape((self.num_class, self.img_rows * self.img_cols))(conv10)
conv10 = Permute((2, 1))(conv10)
softmax = Activation('softmax')(conv10)

This gives me a [rows*cols, 2] matrix where the probability for each pixel is each row - where the first column in the probability of that pixel being 0 and the second column is the probability of the pixel being one.

I one-hot my labels using keras's to_categorical function so that my label is also in the form of [row*col, 2]

I then pass weights such as [1,8] to the above weighted_pixelwise_crossentropy method. However, I get back results whereby all predictions are .5 for both classes.

Any pointers would be GREATLY appreciate. I have spent quite a bit of time on this with no luck.

benjamin-robbins on 30 Oct 2017

Can you explain what your reshape and permute layers are doing? If you're doing per-pixel classification, you should be able to leave your data in the form [rows, cols, one-hot] rather than [rows*cols, one-hot]. If the value you're trying to predict is correlated with the values of the pixels around it, flattening your image causes the loss of information.

Additionally, if you are doing a simple binary classification (active/not active), you only need one category, not two. So your output could just be a 2d matrix where the value of each element is the predicted likelihood of a pixel being active.

JeffKo427 on 1 Nov 2017

Jeff - thanks for the reply! I've since realized, as you posted above, I need to leave my data in the form of [rows, cols, one-hot] and not flatten it as I was doing. This has made the custom pixelwise loss function work as expected. I am doing just simple binary classification(active/not active). If I change my data to the 2d matrix, can I still apply a weighting factor? Would it just be in the form of [8], rather than [1, 8]?

benjamin-robbins on 1 Nov 2017

Good point, I didn't think about that. You could still do it with only one category, but "if it works don't touch it".

JeffKo427 on 1 Nov 2017

Ha - yeah the weighting has been a real thorn in my side for pixelwise data. It does seem to be functioning as desired now with the two classes, even though it seems overkill to have two classes for a binary problem. It, however, allows me to specify weighting per potential output.

benjamin-robbins on 1 Nov 2017

Hi,

I'm trying as well to train a U-Net based model for binary classification.
I transformed my 'y_true' binary images to one-hot matrices (shape = (nb_batch, 2, 512, 512))
The output layer is:
conv9 = Conv2D(2, (1, 1), activation='sigmoid', kernel_initializer='he_normal')(conv8)

I used the suggested weighted_pixelwise_crossentropy function as loss when calling to model.compile.
During compile, I ran into the following ValueError:
ValueError: Dimensions must be equal, but are 512 and 2 for 'loss/conv2d_15_loss/Mul' (op: 'Mul') with input shapes: [?,2,512,512], [2].

This error is resulting from the return - tf.reduce_sum(tf.multiply(y_true * tf.log(y_pred), class_weights)) line.

It seems like the tf.multiply here cannot do the broadcasting.
What am I missing here?

Thanks,
Aviel

avielbl on 6 Nov 2017

Hi folks,

Perhaps this isn't exactly what you are looking for, but if you are trying to work with unbalanced data, you can take use Focal Loss (https://arxiv.org/abs/1708.02002).

Keras implementation is:
def focal_loss(target, output, gamma=2):
    output /= K.sum(output, axis=-1, keepdims=True)
    eps = K.epsilon()
    output = K.clip(output, eps, 1. - eps)
    return -K.sum(K.pow(1. - output, gamma) * target * K.log(output),
                  axis=-1)

If you are just trying to ignore some of the pixels as they are unknown, I'm using this loss function:

def ignore_unknown_xentropy(ytrue, ypred):
    return (1-ytrue[:, :, :, 0])*categorical_crossentropy(ytrue, ypred)

So all pixels with class 0 will have zero loss.

cassianokc on 19 Jan 2018

👍15 ❤7 🎉5 😄1

@avielbl try to permute the shape of y_true as ( batchsize, 512, 512, 2 )

mad-Ye on 26 Aug 2018

If you are just trying to ignore some of the pixels as they are unknown, I'm using this loss function:
def ignore_unknown_xentropy(ytrue, ypred):
    return (1-ytrue[:, :, :, 0])*categorical_crossentropy(ytrue, ypred)
So all pixels with class 0 will have zero loss.

Shouldn't the network learn to set predictions to zero, then?

davideboschetto on 8 Feb 2019

def pixelwise_softmax_crossentropy(y_true, y_pred):
# epsilon = 10e-8
# output = K.clip(y_pred, epsilon, 1. - epsilon)
# return -K.sum(y_true * tf.log(output))

# scale preds so that the class probas of each sample sum to 1
_EPSILON = 1e-7
y_pred /= tf.reduce_sum(y_pred, len(y_pred.get_shape()) - 1, True)
# manual computation of crossentropy
_epsilon = tf.convert_to_tensor(_EPSILON, y_pred.dtype.base_dtype)
output = tf.clip_by_value(y_pred, _epsilon, 1. - _epsilon)
return - tf.reduce_sum(tf.multiply(y_true * tf.log(output), [1, 10]), len(output.get_shape()) - 1)

redaihanyu on 18 Feb 2019

Given batched RGB images as input, shape=(batch_size, width, height, 3)
And a multiclass target represented as one-hot, shape=(batch_size, width, height, n_classes)
And a model (Unet, DeepLab) with softmax activation in last layer.

def weighted_categorical_crossentropy(weights):
    # weights = [0.9,0.05,0.04,0.01]
    def wcce(y_true, y_pred):
        Kweights = K.constant(weights)
        if not K.is_tensor(y_pred): y_pred = K.constant(y_pred)
        y_true = K.cast(y_true, y_pred.dtype)
        return K.categorical_crossentropy(y_true, y_pred) * K.sum(y_true * Kweights, axis=-1)
    return wcce

Usage:

loss = weighted_categorical_crossentropy(weights)
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(optimizer=optimizer, loss=loss)

mendi80 on 30 Dec 2019

Given batched RGB images as input, shape=(batch_size, width, height, 3)
And a multiclass target represented as one-hot, shape=(batch_size, width, height, n_classes)
And a model (Unet, DeepLab) with softmax activation in last layer.
def weighted_categorical_crossentropy(weights):
    # weights = [0.9,0.05,0.04,0.01]
    def wcce(y_true, y_pred):
        Kweights = K.constant(weights)
        if not K.is_tensor(y_pred): y_pred = K.constant(y_pred)
        y_true = K.cast(y_true, y_pred.dtype)
        return K.categorical_crossentropy(y_true, y_pred) * K.sum(y_true * Kweights, axis=-1)
    return wcce
Usage:
loss = weighted_categorical_crossentropy(weights)
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(optimizer=optimizer, loss=loss)

I am wondering if we can have dynamic weights depending on individual y_true, while keeping the y_true being a tensor instead of a numpy array?

samra-irshad on 17 Feb 2020

def weighted_categorical_crossentropy(weights):
# weights = [0.9,0.05,0.04,0.01]
def wcce(y_true, y_pred):
Kweights = K.constant(weights)
if not K.is_tensor(y_pred): y_pred = K.constant(y_pred)
y_true = K.cast(y_true, y_pred.dtype)
return K.categorical_crossentropy(y_true, y_pred) * K.sum(y_true * Kweights, axis=-1)
return wcce

@samra-irshad @mendi80 is this solution right? we don't need to add something or modify it to work well