Hi there,
I am trying to implement a classification problem with three classes: 0,1 and 2. I would like to fine tune my cost function so that missclassification is weighted some how. In particular, predicting 1 instead of 2 should give twice the cost than predicting 0. writing it in a table format, it should be something like that:
Costs:
Predicted:
0 | 1 | 2
__________________________
Actual 0 | 0 | 0.25 | 0.25
1 | 0.25 | 0 | 0.5
2 | 0.25 | 0.5 | 0
I really like keras framework, it would be nice if it is possible to implement it and not having to dig into tensorflow or theano code.
Thanks
Sorry, the table has lost its format, I am sending an image:
You could use class_weight.
class_weight applies a weight to all data that belongs to the class, it should be dependent on the missclassification.
You are absolutely right, I'm sorry I misunderstood your question. I will try to come back with something tomorrow using partial
to define the weights. What you want to achieve should be doable with Keras abstract backend.
Ok so I had the time to quickly test it.
This is a fully reproducible example on mnist where we put a higher cost when a 1 is missclassified as a 7 and when a 7 is missclassified as a 1.
So if you want to pass constants included in the cost function, just build a new function with partial.
'''Train a simple deep NN on the MNIST dataset.
Get to 98.40% test accuracy after 20 epochs
(there is *a lot* of margin for parameter tuning).
2 seconds per epoch on a K520 GPU.
'''
from __future__ import print_function
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, Adam, RMSprop
from keras.utils import np_utils
import keras.backend as K
from itertools import product
# Custom loss function with costs
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
return K.categorical_crossentropy(y_pred, y_true) * final_mask
w_array = np.ones((10,10))
w_array[1, 7] = 1.2
w_array[7, 1] = 1.2
ncce = partial(w_categorical_crossentropy, weights=np.ones((10,10)))
batch_size = 128
nb_classes = 10
nb_epoch = 20
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))
rms = RMSprop()
model.compile(loss=ncce, optimizer=rms)
model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch,
show_accuracy=True, verbose=1,
validation_data=(X_test, Y_test))
score = model.evaluate(X_test, Y_test,
show_accuracy=True, verbose=1)
print('Test score:', score[0])
print('Test accuracy:', score[1])
Wow, that s nice. Thanks for the detailed answer!
Try to test it on a toy example to verify that it actually works. If it's what you are looking for, feel free to close the issue!
Keras 1.0 will provide a more flexible way to introduce new objectives and metrics.
Well, I am stuck, I can t make it run in my model, it says:
line 56, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (y_pred.shape[0], 1))AttributeError: 'Tensor' object has no attribute 'shape'
This is the model I am using:
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (y_pred.shape[0], 1))
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
return K.categorical_crossentropy(y_pred, y_true) * final_mask
w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2
ncce = partial(w_categorical_crossentropy, weights=w_array)
def build_model(X_data):
data_dim = X_data.shape[2]
timesteps = X_data.shape[1]
model = Sequential()
model.add(BatchNormalization(input_shape = (timesteps,data_dim)))
model.add(GRU(output_dim=50,init ='glorot_normal',
return_sequences=True, W_regularizer=l2(0.00),U_regularizer=l1(0.01),dropout_W =0.2 ))
model.add(GRU(output_dim=50,init ='glorot_normal',
return_sequences=True,W_regularizer=l2(0.00),U_regularizer=l1(0.01),dropout_W =0.2))
model.add(GRU(50,init ='glorot_normal',return_sequences=False,dropout_W =0.01, W_regularizer=l2(0.00),U_regularizer=l1(0.01)))
model.add(Dense(3, init='glorot_normal'))
model.add(Activation('softmax'))
model.compile(loss=ncce,
optimizer='Adam'
)
return model
Sure, sorry I was using Theano functionnalities. I replaced the following line in my previous example:
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
It should do the trick!
Sounds the way to go, I was using tensorflow as backend. I tell you if it works as soon as posiible. Thanks!
I still get an error:
line 57, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 271, in reshape
return tf.reshape(x, shape)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 682, in reshape
name=name)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 411, in apply_op
as_ref=input_arg.is_ref)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 529, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 178, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 161, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 319, in make_tensor_proto
_AssertCompatible(values, dtype)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 259, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).__name__))TypeError: Expected int32, got list containing Tensors of type '_Message' instead.
I ve tried your first reply under theano backend and it works though.
Ok, I was not sure about how K.shape
would behave with TensorFlow. It seems you should use:
y_pred_max = K.reshape(y_pred_max, (K.int_shape(y_pred)[0], 1))
I get more or less the same:
line 59, in w_categorical_crossentropy
y_pred_max = K.reshape(y_pred_max, (K.int_shape(y_pred)[0], 1))File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 271, in reshape
return tf.reshape(x, shape)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 682, in reshape
name=name)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 411, in apply_op
as_ref=input_arg.is_ref)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 529, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 178, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 161, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 319, in make_tensor_proto
_AssertCompatible(values, dtype)File "/home/hal/anaconda2/envs/tflow/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 259, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).__name__))TypeError: Expected int32, got None of type '_Message' instead.
It seems like it cannot get the shape of y_pred as an integer , right?
Mm, ok I will take a look at it today and work directly with tensors to try to find a way to have it work properly for both backend.
Hi there, I tried something like that:
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, K.shape(y_pred))
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (K.cast(weights[c_t, c_p],tf.float32) * K.cast(y_pred_max_mat[:, c_p] ,tf.float32)* K.cast(y_true[:, c_t],tf.float32))
return K.categorical_crossentropy(y_pred, y_true) * final_mask
I Think it will do it.
The latter only works for non recurrent networks, but this code works for RNNs following the same idea. It only works for tensorflow though. I couldn t find a way to reshape a tensor the way we want with the keras backend:
import tensorflow as tf
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = tf.expand_dims(y_pred_max, 1)
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
return K.categorical_crossentropy(y_pred, y_true) * final_mask
My bad, just replacing tf.expand_dims with K.expand_dims worked for me:
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.expand_dims(y_pred_max, 1)
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
return K.categorical_crossentropy(y_pred, y_true) * final_mask
w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2
ncce = partial(w_categorical_crossentropy, weights=w_array)
ncce.__name__ ='w_categorical_crossentropy'
The last line is necessary for tensorboard callback to work, thanks!!
Is the Mar 31 solution for @ayalalazaro above still recommended as of v1.2? (Noticed @tboquet 's comment: _Keras 1.0 will provide a more flexible way to introduce new objectives and metrics_.)
My problem is binary classification where true positive accuracy is more important, and some false negatives are acceptable. Would I need the approach above to achieve that objective? I tried class_weights = {0: 1, 1: 10}
, but saw no change. (examples are 25% positive, 75% negative)
Just a small detail about the w_categorical_crossentropy
implementetion. There is no need to cast weights
and y_true
. The following code is working in Theano and TensorFlow:
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
return K.categorical_crossentropy(y_pred, y_true) * final_mask
Hello, I am trying to implement this in tensorflow.
I am confused as to what partial
is in the line :
ncce = partial(w_categorical_crossentropy, weights=np.ones((10,10)))
I do not see it defined anywhere in this thread, and get
NameError: name 'partial' is not defined
as output...
Thanks
@jerpint It’s available from functools
, i.e.
import functools
ncce = functools.partial(w_categorical_crossentropy, weights=np.ones((10,10)))
I am trying to incorporate @curiale's implementation w_categorical_crossentropy
for a binary classification where the output of my model has shape (?, 5120, 2)
but I am running into a couple of issues:
1) Assuming my classs weight distribution is e.g. class_weights=[ 0.85144055 , 1.14855945]
What should thew_array
be like? Something like this below?
w_array = np.ones((2,2))
w_array[1,0] = class_weights[0]
w_array[0,1] = class_weights[1]
ncce = functools.partial(w_categorical_crossentropy, weights=w_array)
2) When I run model.compile(optimizer=Adam(lr=1e-5), loss=ncce, metrics=[dice_coef])
I get the following error:
ValueError: Dimensions must be equal, but are 5120 and 2 for 'mul_339' (op: 'Mul') with input shapes: [?,5120], [?,2].
These are the variables' shapes inside w_categorical_crossentropy
y_pred shape: (?, 5120, 2) y_true shape: (?, ?, ?) final_mask.shape: (?, 2)
Frankly I am lost in w_categorical_crossentropy
function (e.g. what is final_mask
be? Its shape?). Any help would be much appreciated.
Hnn, I'm sorry but I don't quite understand: What does this (?, 5120, 2)
entail? If ?
is the batch size and 2
is the number of classes, what is 5120
?
@recluze Sorry for the confusion. Let me clarify: The model is an image segmentation network with output (?, 5120, 2)
where ?
: batch_size , 5120
: total_number_of_pixels_per_image and 2
: classes (foreground, background). So basically the network does classification per pixel.
Hnn, the last 2 should be removed I think since you have two classes, a single output with binary crossentropy instead of categorical one should work. Don't think 3-dim output shape would work with w_categorical_crossentropy
...
So I hacked Kera's backend binary_crossentropy
function to the following to include weighted_cross_entropy_with_logits()
to pass class weights :
def w_binary_crossentropy(output, target, weights):
output = tf.clip_by_value(output, tf.cast(_EPSILON, dtype=_FLOATX),
tf.cast(1.-_EPSILON, dtype=_FLOATX))
output = tf.log(output / (1 - output))
return tf.nn.weighted_cross_entropy_with_logits(output, target, weights)
and in my code I call it like this:
def wrapped_partial(func, *args, **kwargs):
partial_func = functools.partial(func, *args, **kwargs)
functools.update_wrapper(partial_func, func)
return partial_func
ncce = wrapped_partial(w_binary_crossentropy, weights=0.01) where weight is the ratio of positive/negatives
model.compile(optimizer=Adam(lr=1e-5), loss=ncce, metrics=[dice_coef])
But I am not sure if these weights
are the class weights I am after. It is not clear from the definition of weighted_cross_entropy_with_logits
whether this is class balancing. I just wanted to share it here with everyone. Any comments are much appreciated.
[edited]
@mongoose54 I'm current playing around with this will post the results back, shouldn't be hard to get a version with fixed weights
@mongoose54 This is what I came up with for binary crossentropy based on tensorpack's version
TF only, but no need to change keras
class WeightedBinaryCrossEntropy(object):
def __init__(self, pos_ratio):
neg_ratio = 1. - pos_ratio
self.pos_ratio = tf.constant(pos_ratio, tf.float32)
self.weights = tf.constant(neg_ratio / pos_ratio, tf.float32)
self.__name__ = "weighted_binary_crossentropy({0})".format(pos_ratio)
def __call__(self, y_true, y_pred):
return self.weighted_binary_crossentropy(y_true, y_pred)
def weighted_binary_crossentropy(self, y_true, y_pred):
# Transform to logits
epsilon = tf.convert_to_tensor(K.common._EPSILON, y_pred.dtype.base_dtype)
y_pred = tf.clip_by_value(y_pred, epsilon, 1 - epsilon)
y_pred = tf.log(y_pred / (1 - y_pred))
cost = tf.nn.weighted_cross_entropy_with_logits(y_true, y_pred, self.weights)
return K.mean(cost * self.pos_ratio, axis=-1)
Thank you @dralves that helps me a lot
Just a quick question. When I compare the ouputs of your class with 0.5 positive weights and the binary_crossentropy loss function from keras, it seems the results differ by a factor of 2
Do you know why and which one is correct ?
import tensorflow as tf
import keras.backend as K
import numpy as np
from keras.losses import binary_crossentropy
class WeightedBinaryCrossEntropy(object):
def __init__(self, pos_ratio):
neg_ratio = 1. - pos_ratio
self.pos_ratio = tf.constant(pos_ratio, tf.float32)
self.weights = tf.constant(neg_ratio / pos_ratio, tf.float32)
self.__name__ = "weighted_binary_crossentropy({0})".format(pos_ratio)
def __call__(self, y_true, y_pred):
return self.weighted_binary_crossentropy(y_true, y_pred)
def weighted_binary_crossentropy(self, y_true, y_pred):
# Transform to logits
epsilon = tf.convert_to_tensor(K.common._EPSILON, y_pred.dtype.base_dtype)
y_pred = tf.clip_by_value(y_pred, epsilon, 1 - epsilon)
y_pred = tf.log(y_pred / (1 - y_pred))
cost = tf.nn.weighted_cross_entropy_with_logits(y_true, y_pred, self.weights)
return K.mean(cost * self.pos_ratio, axis=-1)
y_true_arr = np.array([0,1,0,1], dtype="float32")
y_pred_arr = np.array([0,0,1,1], dtype="float32")
y_true = tf.constant(y_true_arr)
y_pred = tf.constant(y_pred_arr)
with tf.Session().as_default():
print(WeightedBinaryCrossEntropy(0.5)(y_true, y_pred).eval())
print(binary_crossentropy(y_true, y_pred).eval())
Outputs
4.00756
8.01512
@dardelet good point
This comes directly from tensorpacks implementation which returns the same results
If you remove the final cost * self.pos_ratio
you get the same results as with normal sigmoid cross entropy
I do see that in the original implementation of balanced classes cross entropy (from this paper) the authors multiply the loss from the positive labels by the positive ratio and the loss from the negative labels by the negative ratio
I'll look into it a bit more
Thank you @dralves
any new findings on the mentioned difference ?
In the example what should the variable final mask look like. I tried to use weights matrix as :
weights = np.matrix([ [0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[0 ,0 ,0 ,0 ,0 ,0 ,1000 ],
[1000 ,1000 ,1000 ,1000 ,1000 ,1000 ,1000 ]])
It seems everything except the last class will mess up because the weights are always zeros. However, the confusion matrix is :
[[ 144. 5. 0. 0. 9. 0. 20.]
[ 9. 150. 9. 0. 0. 0. 14.]
[ 7. 8. 109. 6. 2. 1. 17.]
[ 4. 0. 5. 93. 41. 4. 4.]
[ 11. 1. 0. 12. 123. 6. 21.]
[ 0. 0. 1. 5. 12. 126. 8.]
[ 39. 15. 16. 4. 39. 11. 326.]]
I am using keras with tensorflow as backend. Any ideas of why this happens?
As @recluze has mentioned above the w_categorical_crossentropy
doesn't work with data that's rank 3+ (for example a LSTM with return_sequences=True, TimeDistributed(Dense), etc).
I have changed the above example to support rank 3+ tensors and wrapped it in a class, just like the above WeightedBinaryCrossEntropy
.
class WeightedCategoricalCrossEntropy(object):
def __init__(self, weights):
nb_cl = len(weights)
self.weights = np.ones((nb_cl, nb_cl))
for class_idx, class_weight in weights.items():
self.weights[0][class_idx] = class_weight
self.weights[class_idx][0] = class_weight
self.__name__ = 'w_categorical_crossentropy'
def __call__(self, y_true, y_pred):
return self.w_categorical_crossentropy(y_true, y_pred)
def w_categorical_crossentropy(self, y_true, y_pred):
nb_cl = len(self.weights)
final_mask = K.zeros_like(y_pred[..., 0])
y_pred_max = K.max(y_pred, axis=-1)
y_pred_max = K.expand_dims(y_pred_max, axis=-1)
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in itertools.product(range(nb_cl), range(nb_cl)):
w = K.cast(self.weights[c_t, c_p], K.floatx())
y_p = K.cast(y_pred_max_mat[..., c_p], K.floatx())
y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())
final_mask += w * y_p * y_t
return K.categorical_crossentropy(y_pred, y_true) * final_mask
The constructor expects a dictionary with same structure as class_weight
param from model.fit
{0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344, 7: 57.304}
@asiron Thanks for the code,
Just out of curiosity, what do you follow as rules for assigning different weights to different classes?
any specific formula ? should they sum to 1 ?
@alirzsedghi I think this was answered well in #5116
Hey @asiron thank you for sharing this code. I was wondering if you also figured a way to save the weights with which the loss was initialized when saving the model. This would be really helpful since the weights will be loaded along with the model.
In this version of this custom loss function this is not supported. I am not sure if this functionality is supported by Keras. Any ideas?
Here is a sample code that reproduces the problem.
import keras
import itertools
import numpy as np
from keras import backend as K
from keras.models import Model
from keras.layers import Input, Dense, Activation
from ipdb import set_trace as bp
class WeightedCategoricalCrossEntropy(object):
def __init__(self, weights):
nb_cl = len(weights)
self.weights = np.ones((nb_cl, nb_cl))
for class_idx, class_weight in weights.items():
self.weights[0][class_idx] = class_weight
self.weights[class_idx][0] = class_weight
self.__name__ = 'w_categorical_crossentropy'
def __call__(self, y_true, y_pred):
return self.w_categorical_crossentropy(y_true, y_pred)
def w_categorical_crossentropy(self, y_true, y_pred):
nb_cl = len(self.weights)
final_mask = K.zeros_like(y_pred[..., 0])
y_pred_max = K.max(y_pred, axis=-1)
y_pred_max = K.expand_dims(y_pred_max, axis=-1)
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in itertools.product(range(nb_cl), range(nb_cl)):
w = K.cast(self.weights[c_t, c_p], K.floatx())
y_p = K.cast(y_pred_max_mat[..., c_p], K.floatx())
y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())
final_mask += w * y_p * y_t
return K.categorical_crossentropy(y_pred, y_true) * final_mask
# create a toy model
i = Input(shape=(100,))
h = Dense(7)(i)
o = Activation('softmax')(h)
model = Model(inputs=i, outputs=o)
# compile the model with custom loss
loss = WeightedCategoricalCrossEntropy({0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344})
model.compile(loss=loss, optimizer='sgd')
print "Compilation OK!"
# fit model
model.fit(np.random.random((64, 100)),np.random.random((64, 7)), epochs=10)
# save and load model
model.save('model.h5')
model = keras.models.load_model('model.h5', custom_objects={'w_categorical_crossentropy': WeightedCategoricalCrossEntropy})
print "Load OK!"
Thanks to those who contributed code here. Helped me along a lot.
In the implementation of @asiron (which I tested because I needed to handle rank 3+ rank tensors), I believe a small error crept in relative to the upstream @tboquet implementation.
y_t = K.cast(y_pred_max_mat[..., c_t], K.floatx())
should be
y_t = K.cast(y_true[..., c_t], K.floatx())
otherwise the boolean logic is comparing y_pred with y_pred (instead of y_pred with y_true).
A slightly different point is that the way in which the class_weights dictionary is transformed into the weights matrix within WeightedCategoricalCrossEntropy
does not seem consistent with what the original poster was trying to achieve, which is to specify pairwise weights for all combinations of true and predicted values. As it stands it populates only the 0th row and column penalising misclassifications of the 0th class as another class, or another class as the 0th class. Maybe better to supply the complete matrix instead? Just a thought. Thanks again to contributors.
Question, do the classes need to be in one-hot representations for @asiron code?
And, @sry002, I think you are right.
whilst the @asiron code with @sry002 alteration seems to 'work' for me.
it is not only considerably slower than not weighting the loss, as well as forcing my computer out of memory.
I think thought this is just a case of me using too complex a model, with too much input data examples, for my lowly desktop to handle :(
@nd26 looking at the code above your comment
WeightedCategoricalCrossEntropy({0: 1.0, 1: 29.6, 2: 17.69, 3: 27.08, 4: 11.04, 5: 45.45, 6: 136.344})
I think this suggests no, you dont pass the classes into WeightedCategoricalCrossEntropy as one-hot representations.
(Unless you mean should the output matrix passed into model.fit be one-hot. I think these should still be one-hot)
Would there be a way to pass in weights that are different for each sample and give each individual sample item a weight if predicted accurately or not? I.e. different payoffs depending on the item?
Hi @dickreuter, I've managed to pass a weight for each sample just by adding a new layer into the clasification (y_true). Then, I modified the objective and metrics functions to properly unravel the weights before computing the operations.
Do you have an example how this looks like? How do I split the tensor in the loss function to extract y_true and the weights?
@tboquet Have you tested your code?
Seems like you need wrapper around partial
to make things work, like described here http://louistiao.me/posts/adding-__name__-and-__doc__-attributes-to-functoolspartial-objects/
In my case I have tried weighted binary crossentropy:
from functools import partial, update_wrapper
def wrapped_partial(func, *args, **kwargs):
partial_func = partial(func, *args, **kwargs)
update_wrapper(partial_func, func)
return partial_func
def binary_crossentropy_weigted(y_true, y_pred, class_weights):
y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon())
loss = K.mean(class_weights*(-y_true * K.log(y_pred) - (1.0 - y_true) * K.log(1.0 - y_pred)),axis=-1)
return loss
custom_loss = wrapped_partial(binary_crossentropy_weigted, class_weights=np.array([1.0, 2.0]))
model.compile(optimizer=Adadelta(), loss=[custom_loss])
Sorry for my late response @dickreuter. If you want to weight the batch with a unique spatial weight I recommend to use a similar option as the proposed by @stergioc instead of just a wrapped function. However, if you want to weight each sample in the batch with a particular weight you need to pass the weight inside the y_true. I didn't find another way to do that because it was imposible to me to identify the samples inside the batch. Just an example of what I did is:
class WeightedLoss(object):
def __init__(self, alpha):
self.alpha = alpha
if K.image_dim_ordering() == 'th':
self.stack_axis = 1
else:
self.stack_axis = -1
self.__name__ = 'w_loss'
def __call__(self, y_true, y_pred):
return self.w_loss(y_true, y_pred)
def w_loss(self, y_true, y_pred):
# y_true should has the weight concatenated in the last dimension
slice_stack = [slice(None) for i in range(y_true.get_shape().ndims)]
slice_stack[self.stack_axis] = slice(2, None)
weights = y_true[slice_stack]
slice_stack[self.stack_axis] = slice(0,2)
y_true = y_true[slice_stack]
........
Is there any way to use the weights for binary_crossentropy only for misclassification? The examples above use class weights but I only want to use the weight when a misclassification occurs
Hey all,
I'm using the weighted categorical cross entropy function described above by @ayalalazaro but it doesn't seem to work as expected. My understanding is if I pass a weight array of just 1's, then it should replicate what normally happens with Keras' categorical cross entropy. But that's not what I'm seeing. Here's some example code:
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
cross_ent = K.categorical_crossentropy(y_pred, y_true, from_logits=False)
return cross_ent * final_mask
w_array = np.ones((2,2))
custom_loss = partial(w_categorical_crossentropy, weights=w_array)
custom_loss.__name__ ='w_categorical_crossentropy'
default_model = Sequential([
Dense(128, input_shape=(20,), activation="relu"),
BatchNormalization(axis=1),
Dropout(0.6),
Dense(2, activation="sigmoid")
])
default_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=["accuracy"])
default_model.optimizer.lr = 0.001
default_model.fit(x=trainable_data.values, y=train_target.values, validation_split=0.1, epochs=20, shuffle=True, batch_size=64)
## Epoch 20/20
## 2018/2018 [==============================] - 0s 73us/step - loss: 0.6188 - acc: 0.6571 ## - val_loss: 0.6402 - val_acc: 0.6222
# THEN USE CUSTOM LOSS, WHICH SHOULD BE THE SAME
custom_model = Sequential([
Dense(128, input_shape=(20,), activation="relu"),
BatchNormalization(axis=1),
Dropout(0.6),
Dense(2, activation="sigmoid")
])
custom_model.compile(optimizer='rmsprop', loss=custom_loss, metrics=["accuracy"])
custom_model.optimizer.lr = 0.001
## Epoch 20/20
## 2018/2018 [==============================] - 0s 90us/step - loss: 1.0241e-04 - acc: ## 0.6065 - val_loss: 3.9465e-06 - val_acc: 0.6089
Notice that the custom model pretty quickly gets to essentially zero loss. Which sounds cool, except it doesn't make any sense, and really it means my model stopped learning anything new after only a few epochs. It may be worth noting that I only actually have 2 classes here. I want to weight mis-classifications higher, and thought I could do so with the code above. But it doesn't seem to work. Anyone have any ideas for how I can weight mis-classifications higher on a binary problem?
@ayalalazaro OK, so I found the error. A silly, but big one. The function listed above returns K.categorical_crossentropy(y_pred, y_true)
. But I checked the source code here, and that flips the arguments. The real signature is K.categorical_crossentropy(y_true y_pred, from_logits=False)
. Truth goes first, then predictions.
Once I made that switch, it started working!
Hi,
I use Keras 2.0.8 and Python 2.7.12
I tried to run this and get the output
$ python testt.py
Using TensorFlow backend.
60000 train samples
10000 test samples
Traceback (most recent call last):
File "testt.py", line 69, in
model.compile(loss=ncce, optimizer=rms)
File "build/bdist.linux-x86_64/egg/keras/models.py", line 784, in compile
File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 850, in compile
File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 450, in weighted
File "testt.py", line 29, in w_categorical_crossentropy
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 881, in r_binary_op_wrapper
return func(x, y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1088, in _mul_dispatch
return gen_math_ops._mul(x, y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1449, in _mul
result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 589, in apply_op
param_name=input_name)
File "/usr/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 60, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'x' has DataType bool not in list of allowed values: float16, float32, float64, uint8, int8, uint16, int16, int32, int64, complex64, complex128
Is this still the best approach? @fchollet just to recap the problem: having a classification problem in which i have images of cats, dogs and snakes I need to penalize twice as much the case in which a snake is classified as dog than the other cases. Do we really need to go through partial to do this?
I want to build a binary classifier that does the following with one input neuron (giving x) and one output neuron:
If the output neuron is 0: the payoff is 0
If the output neuron is 1 and correct: the payoff is +1
If the output neuron is 1 and incorrect: the payoff is -x (x is different for each individual sample)
How can I maximize the payoff with a neural network?
How can I create a loss function that would do that? Can I use keras directly or do I need a custom loss function? Does the loss function have to be differentiable? Can I use binary cross entropy or even mse?
@dickreuter you can do this with keras, but you need a custom loss function. And loss functions always minimize a number, so if you want to "maximize" a payoff, you should just flip your payoffs and make them negative. Then the optimizer will make it the most negative it can, which is equivalent to maximizing.
Now, wanting a different payoff for each X sounds tricky. Probably possible by doing some sorcery where you set shuffle to False, and keep track of the batches or something, but I'm not sure exactly how. Could you use an average or median? If so, then you can use the code listed above in this issue to create the custom loss function, and then just minimize it. You might at least try the average/median approach, and see if it helps your problem. If it does, then you could investigate further optimization by trying to get a different loss for each X sample.
No, I can't take the average or median as each sample has distinct features (I say it's has just one input neuron, but in reality, there are additional input neurons).
I know. I meant use the average/median for your loss function. I did not mean change your X's. Just pick some payoff for each X that is a reasonable default guess. I don't know your domain, so I can't comment further. But was just saying, if the custom loss function you're talking about will actually improve your model, then it would likely still improve it (over simply binary cross entropy) even if you use an average or a median. If you see improvement over binary cross entropy, then you can try to optimize further by figuring out how to have a custom payoff for each sample.
I don't think this would work in my case, as the model would need to punish large negative payoffs more than small positive ones, so that the payoff can be maximized.
Let me rephrase the problem again:
Is there a way in keras or tensorflow to give samples an extra weight if they are incorrectly classified only. i. e. a combination of class weight and sample weight but only apply the sample weight for one of the outcome in a binary class (averaging is not an option)? How can this be achieved?
@curiale I have an issue that seems to have no straight forward solution in Keras. My server runs on ubuntu 14.04, Keras with backend tensorflow. It has 4 Nvidia Geforce gtx1080 GPUs.
I am trying to test the best available implementation of weighted categorical cross entropy(https://github.com/keras-team/keras/issues/2115)(curiale commented on Jan20,2017).
The input array Xtrain is of shape (800,40) where 800 indicates the number of samples and 40 represents the input feature dimension. Similarly Xtest is of shape (400,40). The problem is of a multiclass scenario where the number of classes is three. Following code is used to implement but an error is showing up indicating a GPU and batchsize mismatch, which is difficult to address, please provide some pointers to address this.
import keras
from keras.models import Sequential, Model, load_model
from keras.layers.embeddings import Embedding
from keras.layers.core import Activation, Dense, Dropout, Reshape
from keras.optimizers import SGD, Adam, RMSprop
#from keras.layers import TimeDistributed,Merge, Conv1D, Conv2D, Flatten, MaxPooling2D, Conv2DTranspose, UpSampling2D, RepeatVector
#from
keras.layers.recurrent import GRU, LSTM
#from keras.datasets.data_utils import get_file
#import tarfile
from functools import partial, update_wrapper
from keras.callbacks import TensorBoard
from time import time
from sklearn.model_selection import KFold
import numpy as np
from keras.callbacks import EarlyStopping
import tensorflow as tf
import scipy.io
from keras import backend as K
from keras.layers import Input, Lambda
import os
from keras import optimizers
from matplotlib import pyplot
from sklearn.preprocessing import MinMaxScaler
#os.export CUDA_VISIBLE_DEVICES="0,1"
import keras, sys
from matplotlib import pyplot
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
#from keras.utils import np_utils
from itertools import product
from keras.layers import Input
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = weights.shape[1]#len(weights[0,:])
print weights.shape
print nb_cl
print y_pred
print y_true
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)#returns maximum value along an axis in a tensor
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (weights[c_t, c_p] *y_pred_max_mat[:, c_p]*y_true[:, c_t])
#ypred_tensor=K.constant(y_pred,dtype=K.set_floatx('float32'))
#ytrue_tensor=K.constant(y_true,dtype=K.set_floatx('float32'))
return K.categorical_crossentropy(y_true,y_pred) * final_mask
def get_mat_data(add,in1,in2):
# Assuming sample_matlab_file.mat has 2 matrices A and B
matData = scipy.io.loadmat(add)
matrixA = matData[in1]
matrixA1 = matData[in2]
matrixB = matData['Ytrain']
matrixB1 = matData['Ytest']
weights = matData['w']
matrixC = matData['Ytrainclassify']
matrixC1 = matData['Ytestclassify']
nfold = matData['nfold']
return matrixA, matrixA1, matrixB, matrixB1, weights, matrixC, matrixC1, nfold
def wrapped_partial(func, *args, **kwargs):
partial_func = partial(func, *args, **kwargs)
update_wrapper(partial_func, func)
return partial_func
def gen_model():
input = Input(shape=(40,))
#m1=Sequential()
# m1.add(conv_model)
# #m1.add(Conv2D(15, (5,5), strides=(1, 1),activation='relu', input_shape=(1,30,125), kernel_initializer='glorot_uniform'))#temporal filters theano
# m1.add(Dropout(0.2))
# #m1.add(Conv2D(15, (5,1), strides=(1, 1),activation='relu',kernel_initializer='glorot_uniform'))#spatial filters
# #m1.add(Dropout(0.2))
# m1.add(Flatten())
# m1.add(Dropout(0.2))
x1 =(Dense(200,activation='relu',name='dense_1'))(input)
x2 =(Dropout(0.2))(x1)
x3 =(Dense(100,activation='relu',name='dense_2'))(x2)
x4 =(Dropout(0.2))(x3)
x5 =(Dense(3,activation='softmax',name='softmax_layer'))(x4)
model = Model(input=input, output=[x5])
return model
in1 = 'Xtrain'
in2 = 'Xtest'
add = '/home/tharun/all_mat_files/test_keras.mat'
Xtrain, Xtest, Ytrain, Ytest, weights, Ytrainclassify, Ytestclassify, nfold = get_mat_data(add,in1,in2)
nb_classes = 3
print Xtrain.shape, Xtest.shape, Ytrain.shape, Ytest.shape, weights.shape,Ytrainclassify.shape, Ytestclassify.shape
wts = np.array([[1/weights[:,0], 1, 1],[1, 1/weights[:,1], 1],[1, 1, 1/weights[:,2]]])
print 'wts:'
print wts.shape
# convert class vectors to binary class matrices
Y_train = keras.utils.to_categorical(Ytrainclassify[:,None], nb_classes)
Y_test = keras.utils.to_categorical(Ytestclassify[:,None], nb_classes)
Xtrain=Xtrain.astype('float32')
Xtest=Xtest.astype('float32')
print Xtrain.shape
print Y_train.shape
print Xtest.shape
print Y_test.shape
ncce = wrapped_partial(w_categorical_crossentropy, wts)
batch_size = 10
nb_classes = 3
nb_epoch = 1
model=gen_model()
#model.compile(loss=ncce, optimizer="adam")
model.summary()
rms = SGD()
model.compile(loss=ncce, optimizer=rms)
model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
score = model.evaluate(Xtest, Y_test)
print('Test score:', score[0])
print('Test accuracy:', score[1])
#saving weights
model.save('model_classify_weights.h5')
Error:
python /home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py
/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
(800, 40) (400, 40) (800, 1) (400, 1) (1, 3) (800, 1) (400, 1)
wts:
(3, 3)
(800, 40)
(800, 3)
(400, 40)
(400, 3)
/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py:129: UserWarning: Update your `Model` call to the Keras 2 API: `Model(outputs=[<tf.Tenso..., inputs=Tensor("in...)`
model = Model(input=input, output=[x5])
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 40) 0
_________________________________________________________________
dense_1 (Dense) (None, 200) 8200
_________________________________________________________________
dropout_1 (Dropout) (None, 200) 0
_________________________________________________________________
dense_2 (Dense) (None, 100) 20100
_________________________________________________________________
dropout_2 (Dropout) (None, 100) 0
_________________________________________________________________
softmax_layer (Dense) (None, 3) 303
=================================================================
Total params: 28,603
Trainable params: 28,603
Non-trainable params: 0
_________________________________________________________________
(?, 3)
3
Tensor("softmax_layer_target:0", shape=(?, ?), dtype=float32)
[[array([1.41292294]) 1 1]
[1 array([7.328564]) 1]
[1 1 array([2.38611435])]]
/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py:176: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.
model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
Epoch 1/1
2018-02-13 15:41:44.382214: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-02-13 15:41:44.758387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:05:00.0
totalMemory: 7.92GiB freeMemory: 7.42GiB
2018-02-13 15:41:44.992640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:06:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.225696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 2 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:09:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.458070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 3 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:0a:00.0
totalMemory: 7.92GiB freeMemory: 7.80GiB
2018-02-13 15:41:45.461078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-02-13 15:41:45.461151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 2 3
2018-02-13 15:41:45.461160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0: Y Y Y Y
2018-02-13 15:41:45.461165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1: Y Y Y Y
2018-02-13 15:41:45.461170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 2: Y Y Y Y
2018-02-13 15:41:45.461175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 3: Y Y Y Y
2018-02-13 15:41:45.461191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:05:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:06:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: GeForce GTX 1080, pci bus id: 0000:09:00.0, compute capability: 6.1)
2018-02-13 15:41:45.461209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: GeForce GTX 1080, pci bus id: 0000:0a:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
main()
File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 176, in main
model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1598, in fit
validation_steps=validation_steps)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1183, in _fit_loop
outs = f(ins_batch)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2273, in __call__
**self.session_kwargs)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [3] vs. [10]
[[Node: training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/softmax_layer_loss/mul_20"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape, training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape_1)]]
[[Node: loss/mul/_19 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_806_loss/mul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op u'training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs', defined at:
File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
main()
File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 176, in main
model.fit(Xtrain, Y_train,batch_size=batch_size, nb_epoch=nb_epoch)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 1575, in fit
self._make_train_function()
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 960, in _make_train_function
loss=self.total_loss)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
return func(*args, **kwargs)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/optimizers.py", line 156, in get_updates
grads = self.get_gradients(loss, params)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/optimizers.py", line 73, in get_gradients
grads = K.gradients(loss, params)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2310, in gradients
return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile
return grad_fn() # Exit early
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in <lambda>
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_grad.py", line 742, in _MulGrad
rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 532, in _broadcast_gradient_args
"BroadcastGradientArgs", s0=s0, s1=s1, name=name)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op u'loss/softmax_layer_loss/mul_20', defined at:
File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 239, in <module>
main()
File "/home/tharun/keras_workshop/EEG_RxtimeDNN_regress_classify.py", line 174, in main
model.compile(loss=ncce, optimizer=rms)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 850, in compile
sample_weight, mask)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/keras/engine/training.py", line 466, in weighted
score_array *= weights
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 894, in binary_op_wrapper
return func(x, y, name=name)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1117, in _mul_dispatch
return gen_math_ops._mul(x, y, name=name)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2726, in _mul
"Mul", x=x, y=y, name=name)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/tharun/anaconda2/envs/kerasdl/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Incompatible shapes: [3] vs. [10]
[[Node: training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss/softmax_layer_loss/mul_20"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape, training/SGD/gradients/loss/softmax_layer_loss/mul_20_grad/Shape_1)]]
[[Node: loss/mul/_19 = _Recv[client_terminated=false, recv_device="/job:loc
hey, i have an imbalanced data set. I was hoping to use the weighted cost to help with classification since it would always end up predicting only one outcome(in my case 0). I was hoping for some help in building the cost matrix. I have 3 classes 1:1270, 0:7145. -1:1260 so from the above examples it would be a 3 by 3 matrix, picking the values to fill the matrix is the problem?
if i could also penalize wrong prediction of 1 as -1 or vice versa that would be great
In my case, lambda function worked fine.
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.expand_dims(y_pred_max, 1)
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx()))
return K.categorical_crossentropy(y_pred, y_true) * final_mask
w_array = np.ones((3,3))
w_array[2,1] = 1.2
w_array[1,2] = 1.2
loss = lambda y_true, y_pred: w_categorical_crossentropy(y_true, y_pred, weights=w_array)
model.compile(loss=loss,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
this is my code , although it's a bit messy , it seems to work with RNNs as well :D
def getLoss(weights, rnn=True):
def w_categorical_crossentropy(y_true, y_pred):
nb_cl = len(weights)
if(not rnn):
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += ( weights[c_t, c_p] * K.cast(y_pred_max_mat, tf.float32)[:, c_p] * K.cast(y_true, tf.float32)[:, c_t] )
return K.categorical_crossentropy(y_pred, y_true) * final_mask
else:
final_mask = K.zeros_like(y_pred[:, :,0])
y_pred_max = K.max(y_pred, axis=2)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], K.shape(y_pred)[1], 1))
y_pred_max_mat = K.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += ( weights[c_t, c_p] * K.cast(y_pred_max_mat, tf.float32)[:, :,c_p] * K.cast(y_true, tf.float32)[:, :,c_t] )
return K.categorical_crossentropy(y_pred, y_true) * final_mask
return w_categorical_crossentropy
Are there any plans of integrating this feature into keras itself? Since we already have sample weighting in fitting, this seems to be a logical extension to the standard featureset.
Although there do not seem to be any parameterized loss functions, currently.
@machisuke shouldn't it be return
K.categorical_crossentropy(y_true, y_pred) * final_mask
instead of
K.categorical_crossentropy(y_pred, y_true) * final_mask
as @blakewest pointed out from the Keras source code?
Greetings!
I have a binary classification at hand where I intend to penalize the FN. I am okay with more FP but want a really low number of FN.
I have used the custom loss function along with lambda as mentioned in the comments above.
def w_categorical_crossentropy(y_true, y_pred, weights):
nb_cl = len(weights)
final_mask = K.zeros_like(y_pred[:, 0])
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.cast(K.equal(y_pred, y_pred_max), K.floatx())
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
return K.categorical_crossentropy(y_pred, y_true) * final_mask
w_array = np.ones((2,2))
w_array[1,0] = 2.5 # penalizing FN
w_array[0,1] = 2.5 # penalizing FP
loss = lambda y_true, y_pred: w_categorical_crossentropy(y_true, y_pred, weights=w_array)
classifier.compile(optimizer = sgd, loss = loss, metrics = ['accuracy'])
After doing this, the number of FN seem to be more or less similar to what I had with all the weights in the w_array
being 1. What am I getting wrong here? Any kind of pointer/help would be greatly appreciated. @ayalalazaro @tboquet @curiale @machisuke
FOLLOW UP: If i simply comment out the second assignment in w_array, does it mean I am only penalizing the FN and not the FP.
w_array[1,0] = 2.5 # penalizing FN
#w_array[0,1] = 2.5 # penalizing FP
In my case, lambda function worked fine.
def w_categorical_crossentropy(y_true, y_pred, weights): nb_cl = len(weights) final_mask = K.zeros_like(y_pred[:, 0]) y_pred_max = K.max(y_pred, axis=1) y_pred_max = K.expand_dims(y_pred_max, 1) y_pred_max_mat = K.equal(y_pred, y_pred_max) for c_p, c_t in product(range(nb_cl), range(nb_cl)): final_mask += (K.cast(weights[c_t, c_p],K.floatx()) * K.cast(y_pred_max_mat[:, c_p] ,K.floatx())* K.cast(y_true[:, c_t],K.floatx())) return K.categorical_crossentropy(y_pred, y_true) * final_mask w_array = np.ones((3,3)) w_array[2,1] = 1.2 w_array[1,2] = 1.2 loss = lambda y_true, y_pred: w_categorical_crossentropy(y_true, y_pred, weights=w_array) model.compile(loss=loss, optimizer=keras.optimizers.Adadelta(), metrics=['accuracy'])
Can anyone explain weight matrix, like how I could put weight for second and third class?
@SwapnilBorse123 In the weights matrix the row index is the true class, and the col index is the predicted class.
So if you want to penalize more 0's confused with 1's (meaning the true label is 0, but the model predicted 1) than you should put a high weight on the index [0, 1].
So you are correct
w_array[1,0] = 2.5 # penalizing FN
#w_array[0,1] = 2.5 # penalizing FP
penalizes only FN and not FP.
@zaher88abd Say that you have 3 available classes.
Than you would start by defining a 3x3 matrix
w_array = np.ones((3, 3))
Than you can add the weights you'd like to have.
As I said in the comment above,.
w_array[i, j]
defines the weight for an example of class i falsely classified as class j.
e.g if you would like to higher penalize examples of class 2 falsely classified as class 3, you could do
w_array[2, 3] = high_weight
If you would like your model to overall put more ephasis on a certain class, you could put high weights on all occurrences of that class.
For example if you'd like to put an overall emphasis on class 2 you could do the following:
w_array[2, :] = high_weight
This will penalize every mistake made with an example with class 2.
But notice that this assignment also includes
w_array[2, 2] = high_weight
This means that this will also penalize an example of class 2 which was labeled correctly but with low confidence.
This behavior may or may not fit you needs.
If you would like to avoid that behavior, you could just do the following:
w_array[2, :] = high_weight
w_array[2, 2] = 1 # restore the original weight
For anyone who is still trying to figure out what is going on in the weighted crossentropy loss function, I made an analogous example in numpy. The doc string at the bottom explains what is going on at each step. Adding in some print statements to see what the arrays look like and what their shapes are is a lot easier here than in keras :)
import numpy as np
y_pred = np.array([[[0.6,0.4],[0.3,0.7],[0.1,0.9],[0.23,0.77]],
[[0.3,0.7],[0.21,0.79],[0.99,0.01],[0.23,0.77]],
[[0.1,0.9],[0.88,0.12],[0.33,0.67],[0.11,0.89]]])
y_true = np.array([[[1,0],[1,0],[0,1],[0,1]],
[[0,1],[0,1],[1,0],[1,0]],
[[0,1],[1,0],[1,0],[0,1]]])
final_mask = np.zeros_like(y_pred[:, 0])
y_pred_max = np.max(y_pred, axis=1)
y_pred_max = np.expand_dims(y_pred_max, 1)
y_pred_max_mat = np.equal(y_pred, y_pred_max)
for c_p, c_t in product(range(n_classes), range(n_classes)):
final_mask += (weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
#return K.categorical_crossentropy(y_pred, y_true) * final_mask
"""
1: y_pred, dim = (3, 4 ,2), type = float
Your predicted labels with shape (n_images, n_pixels, n_classes)
Note: n_pixels is your image height * width, if your image is square you can
take the square root of n_pixels to get the dimensions
2: y_true, dim = (3, 4, 2), type = bool
Your predicted labels with shape (n_images, n_pixels, n_classes)
3: final_mask, dim = (4, 2), type = zeros
Dimensions are (n_pixels, n_classes)
Defines zero array that will be added to when constructing final mask
4: y_pred_max, dim = (3, 2), type = float
Dimensions are (n_images, n_classes)
This is the max probability predicted for each class for any pixel in the image
5: y_pred_max, dim = (3,1,2), type = float
Dimensions are (n_images, ., n_classes)
Reshapes output of previous step so it can be broadcast across y_pred_max
6: y_pred_max_mat, dim = (3, 4 ,2), type = bool
This is the evaluation of output(4) == output(5) where 5 is broadcasted across pixels
Marks highest class probability for each class in each images as True
7: Iterates over all possible outcomes of predicted = [0, ..., n_classes] and true = [0, ..., n_classes]
8: updates final_mask, dim = (4, 2), type = int (adding 0 or 1 to each cell in each iteration)
Multiplies output(6) by true labels and weight for the outcome of the iteration
Rows are images and columns are adjusted class weights
9: Weights are now applied to the the crossentropy loss of the original predictions and labels
Commented out because numpy equivalent isn't represented by a single function
"""
Also, it was mentioned above but a bit out of context: a good method for calculating weights to use is in #5116. I made a simpler numpy implementation for myself that estimates the weights from the training label images. This assumes that the training label images are stored as a single channel image with a number in range(n_classes) representing its class.
import cv2
import glob
import numpy as np
def calc_class_proportions(dir_train_labels, n_classes):
img_paths = glob.glob(dir_train_labels + '*.png')
class_counts = np.zeros(n_classes)
for img_path in img_paths:
label_img = cv2.imread(img_path)[:,:,0]
img_label_counts = np.unique(label_img, return_counts = True)
class_counts = np.add(class_counts, img_label_counts[1])
class_proportions = class_counts / np.sum(class_counts)
return class_proportions
def calc_class_weights(dir_train_labels, n_classes, scale = None):
class_props = calc_class_proportions(dir_train_labels, n_classes)
if scale == 'log':
weights = np.log(1 / class_props)
else:
max_prop = np.max(class_props)
weights = max_prop / class_props
return weights
When I want to convert it from categorical_crossentropy to binary_crossentropy it popped up the dimension errors, which is weird since I do not change other parts of the model.
So I am trying to use categorical_crossentropy to implement the binary classification logic. In this test I changed the sigmoid function to softmax and it turned out that the evaluation metric (F1) did not change anymore even I tried different weight. Anyone could help? Did I miss something when implementing the binary classification logic?
Adding to the solution above. With the new keras version now you can just override the respective loss function as given below.
from tensorflow.python import keras
from itertools import product
import numpy as np
from tensorflow.python.keras.utils import losses_utils
class WeightedCategoricalCrossentropy(keras.losses.CategoricalCrossentropy):
def __init__(
self,
weights,
from_logits=False,
label_smoothing=0,
reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE,
name='categorical_crossentropy',
):
super().__init__(
from_logits, label_smoothing, reduction, name=f"weighted_{name}"
)
self.weights = weights
def call(self, y_true, y_pred):
weights = self.weights
nb_cl = len(weights)
final_mask = keras.backend.zeros_like(y_pred[:, 0])
y_pred_max = keras.backend.max(y_pred, axis=1)
y_pred_max = keras.backend.reshape(
y_pred_max, (keras.backend.shape(y_pred)[0], 1))
y_pred_max_mat = keras.backend.cast(
keras.backend.equal(y_pred, y_pred_max), keras.backend.floatx())
for c_p, c_t in product(range(nb_cl), range(nb_cl)):
final_mask += (
weights[c_t, c_p] * y_pred_max_mat[:, c_p] * y_true[:, c_t])
return super().call(y_true, y_pred) * final_mask
Hi @GalAvineri , @kozemzak ,
I am a little bit confused on what purpose of weighted crossentropy loss function. Is it for misclassification (eg. MNIST case, class "1" is misclassified as "7") or for imbalanced dataset (eg. too much images under class "1" compared to "7", etc) . Thanks.
There are other reasons why you might want to weight the individual samples. For example if they yield a custom payoff.
On 8 May 2019, at 20:02, sudonto notifications@github.com wrote:
Hi @GalAvineri , @kozemzak , I am a little bit confused on what purpose of weighted crossentropy loss function. Is it for misclassification (eg. MNIST case, class "1" is misclassified as "7") or for imbalanced dataset (eg. too much images under class "1" compared to "7", etc) . Thanks.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
@sudonto the weighted crossentropy gives different weights to different misclassificiations by definition.
So it is not it's purpose, rather its just what it does by definition.
I guess that there could be multiple purposes to why you would like to use this loss as @dickreuter said, and one of them is indeed when you have imbalanced dataset.
@tboquet thanks for this code.
In addition to the mistake found by @blakewest in this https://github.com/keras-team/keras/issues/2115#issuecomment-354678974,
I found something else in your code:
It goes for c_p, c_t
but then refers to weights[c_t, c_p]
. (different order)
It's easy to miss in your example, because your weights matrix is symmetric.
But really, weights[1,7]
is used instead of weights[7,1]
and vice versa.
The fix is simple, just switch the order in either of them (but not both).
The convention I'm familiar with uses axis 0 as the "class truth", so I fix via for c_t, c_p
.
TypeError: Value passed to parameter 'x' has DataType bool not in list of allowed values: float16, float32, float64, uint8, int8, uint16, int16, int32, int64, complex64, complex128
@enikkari this error can be resolved by adding another line after:
y_pred_max_mat = K.equal(y_pred, y_pred_max)
as following:
y_pred_max_mat = K.equal(y_pred, y_pred_max)
y_pred_max_mat = K.cast(y_pred_max_mat, 'float32')
Also, to prevent a row in y_pred
like [.4, .4, .2]
being encoded into [1, 1, 0]
, this:
y_pred_max = K.max(y_pred, axis=1)
y_pred_max = K.reshape(y_pred_max, (K.shape(y_pred)[0], 1))
y_pred_max_mat = K.equal(y_pred, y_pred_max)
can be replaced with a more robust (and intuitive) code:
y_pred_arg_max = K.argmax(y_pred)
y_pred_max_mat = K.one_hot(y_pred_arg_max, num_classes=y_pred.shape[1])
Another added value of this, is it no longer requires to follow with the K.cast
fix above.
Adding to the class
solution by @SpikingNeuron here in https://github.com/keras-team/keras/issues/2115#issuecomment-490079116
here's a more robust and vectorized implementation:
import tensorflow.keras.backend as K
from tensorflow.keras.losses import CategoricalCrossentropy
class WeightedCategoricalCrossentropy(CategoricalCrossentropy):
def __init__(self, cost_mat, name='weighted_categorical_crossentropy', **kwargs):
assert cost_mat.ndim == 2
assert cost_mat.shape[0] == cost_mat.shape[1]
super().__init__(name=name, **kwargs)
self.cost_mat = K.cast_to_floatx(cost_mat)
def __call__(self, y_true, y_pred, sample_weight=None):
assert sample_weight is None, "should only be derived from the cost matrix"
return super().__call__(
y_true=y_true,
y_pred=y_pred,
sample_weight=get_sample_weights(y_true, y_pred, self.cost_mat),
)
def get_sample_weights(y_true, y_pred, cost_m):
num_classes = len(cost_m)
y_pred.shape.assert_has_rank(2)
y_pred.shape[1:].assert_is_compatible_with(num_classes)
y_pred.shape.assert_is_compatible_with(y_true.shape)
y_pred = K.one_hot(K.argmax(y_pred), num_classes)
y_true_nk1 = K.expand_dims(y_true, 2)
y_pred_n1k = K.expand_dims(y_pred, 1)
cost_m_1kk = K.expand_dims(cost_m, 0)
sample_weights_nkk = cost_m_1kk * y_true_nk1 * y_pred_n1k
sample_weights_n = K.sum(sample_weights_nkk, axis=[1, 2])
return sample_weights_n
Usage:
model.compile(loss=WeightedCategoricalCrossentropy(cost_matrix), ...)
Similarly, this can be applied for the CategoricalAccuracy
metric too:
from tensorflow.keras.metrics import CategoricalAccuracy
class WeightedCategoricalAccuracy(CategoricalAccuracy):
def __init__(self, cost_mat, name='weighted_categorical_accuracy', **kwargs):
assert cost_mat.ndim == 2
assert cost_mat.shape[0] == cost_mat.shape[1]
super().__init__(name=name, **kwargs)
self.cost_mat = K.cast_to_floatx(cost_mat)
def update_state(self, y_true, y_pred, sample_weight=None):
assert sample_weight is None, "should only be derived from the cost matrix"
return super().update_state(
y_true=y_true,
y_pred=y_pred,
sample_weight=get_sample_weights(y_true, y_pred, self.cost_mat),
)
Usage:
model.compile(metrics=[WeightedCategoricalAccuracy(cost_matrix), ...], ...)
In addition to w_arry given by @tboquet in the post above, how to construct the cost_matrix?
For ex. for binary classification,
w_array = np.ones((2,2))
w_array[1,2] = 5.0 (to penalize 1s being mis classified.
y_true and y_pred are the targets.
can somebody help please?
@zaher88abd Say that you have 3 available classes.
Than you would start by defining a 3x3 matrixw_array = np.ones((3, 3))
Than you can add the weights you'd like to have.
As I said in the comment above,.
w_array[i, j]
defines the weight for an example of class _i_ falsely classified as class _j_.e.g if you would like to higher penalize examples of class 2 falsely classified as class 3, you could do
w_array[2, 3] = high_weight
If you would like your model to _overall_ put more ephasis on a certain class, you could put high weights on all occurrences of that class.
For example if you'd like to put an overall emphasis on class 2 you could do the following:
w_array[2, :] = high_weight
This will penalize every mistake made with an example with class 2.
But notice that this assignment also includesw_array[2, 2] = high_weight
This means that this will _also_ penalize an example of class 2 which was _labeled correctly but with low confidence_.
This behavior may or may not fit you needs.
If you would like to avoid that behavior, you could just do the following:w_array[2, :] = high_weight w_array[2, 2] = 1 # restore the original weight
@GalAvineri
i want to put an overall emphasis on class2, ( I have 3 classes 0, 1, 2 ), in your opinion, i should give w[2][0] and w[2][1] a high weight, but Should I assign the same high weight to w[0][2] and w[0][1]??
@eliadl I'm getting an unexpected keyword argument 'sample_weight'
tf python version r1.13
@dest-dir Please post a StackOverflow question with your code, and share the link here. I'll try assist there.
@eliadl how I insert the cost matrix in another custom loss? Like focal loss
`class FocalLoss(tf.keras.losses.Loss):
def __init__(self, gamma=2.0, alpha=1.0,
reduction=tf.keras.losses.Reduction.AUTO, name='focal_loss'):
super(FocalLoss, self).__init__(reduction=reduction,
name=name)
self.gamma = float(gamma)
self.alpha = float(alpha)
def call(self, y_true, y_pred):
epsilon = 1.e-9
y_true = tf.convert_to_tensor(y_true, tf.float32)
y_pred = tf.convert_to_tensor(y_pred, tf.float32)
model_out = tf.add(y_pred, epsilon)
ce = tf.multiply(y_true, -tf.math.log(model_out))
weight = tf.multiply(y_true, tf.pow(
tf.subtract(self.alpha, model_out), self.gamma))
fl = tf.multiply(1., tf.multiply(weight, ce))
reduced_fl = tf.reduce_max(ce, axis=1)
return reduced_fl`
@damhurmuller Please post a StackOverflow question with your code, and share the link here. I'll try assist there.
For semantic segmentation, with:
Input (rgb) shape=(batch_size, width, height, 3)
Output (one-hot) shape=(batch_size, width, height, n_classes)
The weighted categorical crossentropy loss function is:
def weighted_categorical_crossentropy(weights):
# weights = [0.9,0.05,0.04,0.01]
def wcce(y_true, y_pred):
Kweights = K.constant(weights)
if not K.is_tensor(y_pred): y_pred = K.constant(y_pred)
y_true = K.cast(y_true, y_pred.dtype)
return K.categorical_crossentropy(y_true, y_pred) * K.sum(y_true * Kweights, axis=-1)
return wcce
Usage:
loss = weighted_categorical_crossentropy(weights)
optimizer = keras.optimizers.Adam(lr=0.01)
model.compile(optimizer=optimizer, loss=loss)
@mendi80 Please, is your function right ?
@dest-dir , @eliadl
I encountered the same unexpected sample weight problem. I also ran into some issues when trying to save the entire model (in order to restore from interrupted training, including the optimizer state).
The sample weight problem seems to be solved by changing the magic function __call__'s to call. I also modified the return on call to multiply the output of super().call(y_t,y_p) by the return from get_sample_weights.
@eliadl - I think your approach, from what I understood, was to overwrite/overload rather than access the categorical crossentropy call method and pass in sample_weight as an expected parameter of this call; however, I couldn't figure out why this worked for you and not for us? (And, frankly, my python knowledge isn't really up for figuring this out!)
I utilised @SpikingNeuron's class code in order to get this working. I also changed the weight argument from a positional argument to a named argument as part of trying to get the model loading working
The loss class therefore became:
Class weighted_categorical_crossentropy(tensorflow.keras.losses.CategoricalCrossentropy):
def __init__(
self,
*,
weights,
from_logits=False,
label_smoothing=0,
reduction=losses_utils.ReductionV2.SUM_OVER_BATCH_SIZE,
name='categorical_crossentropy',
):
super().__init__(
from_logits, label_smoothing, reduction, name=f"weighted_{name}"
)
self.weights = weights
def call(self, y_true, y_pred):
return super().call(y_true, y_pred) * get_sample_weights(y_true, y_pred, self.weights)
def get_config(self):
return {'weights': self.weights}
@classmethod
def from_config(cls, config):
return cls(**config)
def get_sample_weights(y_true, y_pred, cost_m):
num_classes = len(cost_m)
cost_m = K.cast(cost_m, 'float32')
y_pred.shape.assert_has_rank(2)
assert(y_pred.shape[1] == num_classes)
y_pred.shape.assert_is_compatible_with(y_true.shape)
y_pred = K.one_hot(K.argmax(y_pred), num_classes)
y_true_nk1 = K.expand_dims(y_true, 2)
y_pred_n1k = K.expand_dims(y_pred, 1)
cost_m_1kk = K.expand_dims(cost_m, 0)
sample_weights_nkk = cost_m_1kk * y_true_nk1 * y_pred_n1k
sample_weights_n = K.sum(sample_weights_nkk, axis=[1, 2])
return sample_weights_n
Note the inclusion of:
def get_config(self):
return {'weights': self.weights}
@classmethod
def from_config(cls, config):
return cls(**config)
This is necessary in order for the custom loss function to be registered with Keras for model saving.
I also included the following (after the class code) to make sure that this registration happens:
tf.keras.losses.weighted_categorical_crossentropy = weighted_categorical_crossentropy
Usage:
model.compile(
optimizer='adam',
loss={'output': weighted_categorical_crossentropy(weights=cost_matrix)
)
Saving:
model.save(filepath,,save_format='tf')
Loading:
model = tf.keras.models.load_model(
filepath,
compile=True,
custom_objects={
'weighted_categorical_crossentropy': weighted_categorical_crossentropy(weights=cost_matrix)
}
)
Feedback welcome.
Hope this helps.
@PhilAlton
__call__
accepts sample_weight
and handles it inherently, while call
doesn't. You had to provide your own implementation there. I didn't.__call__
does access the categorical crossentropy call method, as my class inherits from CategoricalCrossentropy
which uses the categorical_crossentropy
function.CategoricalCrossentropy.from_config
is already implemented (or inherited) so there's no need to override it with the same code.get_config
doesn't account for arguments of base class. This does:def get_config(self):
return super().get_config().copy().update(
{'weights': self.weights}
)
@eliadl - Thanks; SO Question
@eliadl I'm getting an unexpected keyword argument 'sample_weight'
tf python version r1.13
@dest-dir as @PhilAlton found, the problem was __call__
didn't match its original signature.
def __call__(self, y_true, y_pred):
should have been this:
def __call__(self, y_true, y_pred, sample_weight=None):
Hi, I just got stumbled into this class weight matrix in a multi class classification problem with one of class as background and back ground getting predicted as positive is higly undesirable, reverse is not that critical. Is class weight matrix based loss function available in tf2 ? Does it actually work as expected? Thanks for above solns anyway.
Hello does anyone know how to do this for sparse categorical crossentropy?
Most helpful comment
Ok so I had the time to quickly test it.
This is a fully reproducible example on mnist where we put a higher cost when a 1 is missclassified as a 7 and when a 7 is missclassified as a 1.
So if you want to pass constants included in the cost function, just build a new function with partial.