Addons: Problems with CohenKappa metrics

Created on 17 Jan 2020 · 12Comments · Source: tensorflow/addons

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
TensorFlow version and how it was installed (source or binary): 2.1.0
TensorFlow-Addons version and how it was installed (source or binary): 0.7.0
Python version: 3.6.9
Is GPU used? (yes/no): yes

Describe the bug

Hello,
I would to use tfa.metrics.CohenKappa from tensorflow_addons.
I have a problem when I wanted to use it. I have a function where I create a basic convolution network, and I would like to use this metrics.

However, when I do that, it raised an exception

ValueError: Number of samples in y_true and y_pred are different

So I checked in the code, and it's seam that the shape of the two Tensor are the not the same :

Tensor("Cast:0", shape=(None, None), dtype=int64) Tensor("Cast_1:0", shape=(None, 5), dtype=int64)

I wanted to know, how I can I precise the shape of the y_pred in order to have the same shape as the y_true.

Code to reproduce the issue

def convolution(categories=5, shape_x=224, shape_y=224, channels=3):
    model = tf.keras.models.Sequential([
      tf.keras.layers.Conv2D(10, kernel_size=(5, 5), strides=(1, 1), activation=tf.nn.relu, use_bias=True, input_shape=(shape_x, shape_y, channels)),
      tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='valid'),
      tf.keras.layers.Conv2D(10, kernel_size=(5, 5), strides=(1, 1), activation=tf.nn.relu, use_bias=True),
      tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='valid'),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(128, activation=tf.nn.relu),
      tf.keras.layers.Dense(categories, activation=tf.nn.softmax)
    ])
    model.compile(optimizer='adam', loss='mse', metrics=[tfa.metrics.CohenKappa(num_classes=5)])
    return model

Other info / logs

File "/home/rere/Project/Aptos/aptos2019-blindness-detection/model.py", line 38, in convolution
model.compile(optimizer='adam', loss='mse', metrics=[tfa.metrics.CohenKappa(num_classes=5)])
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, args, *kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 439, in compile
masks=self._prepare_output_masks())
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 2004, in _handle_metrics
target, output, output_mask))
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 1955, in _handle_per_output_metrics
metric_fn, y_true, y_pred, weights=weights, mask=mask)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_utils.py", line 1155, in call_metric_function
return metric_fn(y_true, y_pred, sample_weight=weights)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/metrics.py", line 196, in __call__
replica_local_fn, args, *kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py", line 1135, in call_replica_local_fn
return fn(args, *kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/metrics.py", line 179, in replica_local_fn
update_op = self.update_state(args, *kwargs) # pylint: disable=not-callable
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/utils/metrics_utils.py", line 76, in decorated
update_op = update_state_fn(args, *kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
result = self._call(args, *kwds)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 615, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 497, in _initialize
args, *kwds))
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2389, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2703, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2593, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 978, in func_graph_from_py_func
func_outputs = python_func(func_args, *func_kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 439, in wrapped_fn
return weak_wrapped_fn().__wrapped__(args, *kwds)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 968, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in converted code:
/home/rere/.local/lib/python3.6/site-packages/tensorflow_addons/metrics/cohens_kappa.py:122 update_state  *
    raise ValueError(

ValueError: Number of samples in `y_true` and `y_pred` are different

Thank you in advance for any help :)

metrics

Source

rere-rere

All 12 comments

Thanks for the report! I suppose you could remove the could directly. This is similar #876 #298. I would go through our metrics/losses on the weekend to check if they are compatible with .compile().

WindQAQ on 18 Jan 2020

👍1

Though that check is redundant (because it would be checked in tf.math.confusion_matrix), I think your example is not going to work with CohenKappa. In your last layer, the output would be shaped [num_samples, num_classes], where num_classes=5, while CohenKappa takes y_pred with shape [num_samples,] as argument.

cc @AakashKumarNain

WindQAQ on 18 Jan 2020

@WindQAQ Although I favor these checks because they save a lot of potential bugs where broadcasting can happen unknowingly but I think there is no other way except removing them, at least I am not aware of any other way to do proper checks.

AakashKumarNain on 18 Jan 2020

@WindQAQ @facaiy I can think of the following scenarios for Kappa calculation:

The last layer of the model consists of a single unit and regression is used for finding the labels instead of classification. In this case, the shape of the last layer would be (None, 1) while the shape of the true labels would be (num_samples,)
The predictions are coming from a classification model and the last layer has a sigmoid/softmax actiavtion depending in whether it is a binary_classification or multi-class classification. In this case the last layer can have the shape (None, 1) or (None, num_classes). The true labels, on the other hand, can have different shapes.

Here are all the scenarios I could think of:

##Regression##
y_true:(batch_size,)
y_pred:(batch_size, 1)
Round the predictions to get the predicted label. We can even include a parameter for the user to provide a custom threshold list.

##Classification## 
Case1: 
y_true: (None, num_classes) -> OHE
y_pred: (None, num_classes)
Use argmax for both the tensors to find the the labels and calculate kappa afterwards.

Case2:
y_true: (batch_size,) -> Using sparse labels instead of OHE
y_pred: (batch_size, num_classes)
Use argmax for prediction tensor and afterwards calculate kappa

All we need to do include these checks and we are good to go IMO. Let me know what you think.
If it looks okay, I can push a fix

AakashKumarNain on 18 Jan 2020

@WindQAQ Tzu-Wei, do you think if is_compatible_with work in the case?

facaiy on 19 Jan 2020

@AakashKumarNain Hi, Aakash, I'm not sure I understand your question throughly. For tf.keras, I believe it use different metrics to handle different label formats (eg: AUC, BinaryAccuracy, CategoricalAccuracy, etc)

facaiy on 19 Jan 2020

👍1

I will ping you with the details in the gitter

AakashKumarNain on 19 Jan 2020

@WindQAQ Tzu-Wei, do you think if is_compatible_with work in the case?

Thanks you Facai, this should work!

@AakashKumarNain Hi, Aakash, I'm not sure I understand your question throughly. For tf.keras, I believe it use different metrics to handle different label formats (eg: AUC, BinaryAccuracy, CategoricalAccuracy, etc)

This seems to be a good approach. If necessary, I vote for this solution.

WindQAQ on 20 Jan 2020

@WindQAQ Tzu-Wei, do you think if is_compatible_with work in the case?

Thanks you Facai, this should work!

@AakashKumarNain Hi, Aakash, I'm not sure I understand your question throughly. For tf.keras, I believe it use different metrics to handle different label formats (eg: AUC, BinaryAccuracy, CategoricalAccuracy, etc)

This seems to be a good approach. If necessary, I vote for this solution.

Let's discuss on this in detail. I think we aren't one the same page yet..lol

AakashKumarNain on 20 Jan 2020

@WindQAQ @facaiy I have made a separate file for detailed discussion on this. You can find it here. Let me know what you think

AakashKumarNain on 22 Jan 2020

👍1

@AakashKumarNain Sorry for the delay, Aakash. Thanks for your detailed RFC, which looks really great. As said before, is it possible to create a metric for every case mentioned by you, for example: CohenKappa, BinaryCohenKappa, CategoricalCohenKappa etc (refer to Accuracy, BinaryAccuracy, CategoricalAccuracy, ... ). What do you think? cc @WindQAQ @seanpmorgan

facaiy on 1 Feb 2020

👍1

Thanks @facaiy for the review. I discussed on this with Francois as well. I will try to make it more simple in the coming weeks.

AakashKumarNain on 1 Feb 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings