Addons: Problems with CohenKappa metrics

Created on 17 Jan 2020  路  12Comments  路  Source: tensorflow/addons

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • TensorFlow version and how it was installed (source or binary): 2.1.0
  • TensorFlow-Addons version and how it was installed (source or binary): 0.7.0
  • Python version: 3.6.9
  • Is GPU used? (yes/no): yes

Describe the bug

Hello,
I would to use tfa.metrics.CohenKappa from tensorflow_addons.
I have a problem when I wanted to use it. I have a function where I create a basic convolution network, and I would like to use this metrics.

However, when I do that, it raised an exception

ValueError: Number of samples in y_true and y_pred are different

So I checked in the code, and it's seam that the shape of the two Tensor are the not the same :

Tensor("Cast:0", shape=(None, None), dtype=int64) Tensor("Cast_1:0", shape=(None, 5), dtype=int64)

I wanted to know, how I can I precise the shape of the y_pred in order to have the same shape as the y_true.

Code to reproduce the issue

def convolution(categories=5, shape_x=224, shape_y=224, channels=3):
    model = tf.keras.models.Sequential([
      tf.keras.layers.Conv2D(10, kernel_size=(5, 5), strides=(1, 1), activation=tf.nn.relu, use_bias=True, input_shape=(shape_x, shape_y, channels)),
      tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='valid'),
      tf.keras.layers.Conv2D(10, kernel_size=(5, 5), strides=(1, 1), activation=tf.nn.relu, use_bias=True),
      tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='valid'),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(128, activation=tf.nn.relu),
      tf.keras.layers.Dense(categories, activation=tf.nn.softmax)
    ])
    model.compile(optimizer='adam', loss='mse', metrics=[tfa.metrics.CohenKappa(num_classes=5)])
    return model

Other info / logs

File "/home/rere/Project/Aptos/aptos2019-blindness-detection/model.py", line 38, in convolution
model.compile(optimizer='adam', loss='mse', metrics=[tfa.metrics.CohenKappa(num_classes=5)])
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, args, *kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 439, in compile
masks=self._prepare_output_masks())
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 2004, in _handle_metrics
target, output, output_mask))
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 1955, in _handle_per_output_metrics
metric_fn, y_true, y_pred, weights=weights, mask=mask)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_utils.py", line 1155, in call_metric_function
return metric_fn(y_true, y_pred, sample_weight=weights)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/metrics.py", line 196, in __call__
replica_local_fn, args, *kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/distribute/distributed_training_utils.py", line 1135, in call_replica_local_fn
return fn(args, *kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/metrics.py", line 179, in replica_local_fn
update_op = self.update_state(args, *kwargs) # pylint: disable=not-callable
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/keras/utils/metrics_utils.py", line 76, in decorated
update_op = update_state_fn(args, *kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
result = self._call(args, *kwds)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 615, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 497, in _initialize
args, *kwds))
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2389, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2703, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 2593, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 978, in func_graph_from_py_func
func_outputs = python_func(func_args, *func_kwargs)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 439, in wrapped_fn
return weak_wrapped_fn().__wrapped__(args, *kwds)
File "/home/rere/.local/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 968, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in converted code:

/home/rere/.local/lib/python3.6/site-packages/tensorflow_addons/metrics/cohens_kappa.py:122 update_state  *
    raise ValueError(

ValueError: Number of samples in `y_true` and `y_pred` are different

Thank you in advance for any help :)

metrics

All 12 comments

Thanks for the report! I suppose you could remove the could directly. This is similar #876 #298. I would go through our metrics/losses on the weekend to check if they are compatible with .compile().

Though that check is redundant (because it would be checked in tf.math.confusion_matrix), I think your example is not going to work with CohenKappa. In your last layer, the output would be shaped [num_samples, num_classes], where num_classes=5, while CohenKappa takes y_pred with shape [num_samples,] as argument.

cc @AakashKumarNain

@WindQAQ Although I favor these checks because they save a lot of potential bugs where broadcasting can happen unknowingly but I think there is no other way except removing them, at least I am not aware of any other way to do proper checks.

@WindQAQ @facaiy I can think of the following scenarios for Kappa calculation:

  1. The last layer of the model consists of a single unit and regression is used for finding the labels instead of classification. In this case, the shape of the last layer would be (None, 1) while the shape of the true labels would be (num_samples,)
  2. The predictions are coming from a classification model and the last layer has a sigmoid/softmax actiavtion depending in whether it is a binary_classification or multi-class classification. In this case the last layer can have the shape (None, 1) or (None, num_classes). The true labels, on the other hand, can have different shapes.

Here are all the scenarios I could think of:

##Regression##
y_true:(batch_size,)
y_pred:(batch_size, 1)
Round the predictions to get the predicted label. We can even include a parameter for the user to provide a custom threshold list.

##Classification## 
Case1: 
y_true: (None, num_classes) -> OHE
y_pred: (None, num_classes)
Use argmax for both the tensors to find the the labels and calculate kappa afterwards.

Case2:
y_true: (batch_size,) -> Using sparse labels instead of OHE
y_pred: (batch_size, num_classes)
Use argmax for prediction tensor and afterwards calculate kappa

All we need to do include these checks and we are good to go IMO. Let me know what you think.
If it looks okay, I can push a fix

@WindQAQ Tzu-Wei, do you think if is_compatible_with work in the case?

@AakashKumarNain Hi, Aakash, I'm not sure I understand your question throughly. For tf.keras, I believe it use different metrics to handle different label formats (eg: AUC, BinaryAccuracy, CategoricalAccuracy, etc)

I will ping you with the details in the gitter

@WindQAQ Tzu-Wei, do you think if is_compatible_with work in the case?

Thanks you Facai, this should work!

@AakashKumarNain Hi, Aakash, I'm not sure I understand your question throughly. For tf.keras, I believe it use different metrics to handle different label formats (eg: AUC, BinaryAccuracy, CategoricalAccuracy, etc)

This seems to be a good approach. If necessary, I vote for this solution.

@WindQAQ Tzu-Wei, do you think if is_compatible_with work in the case?

Thanks you Facai, this should work!

@AakashKumarNain Hi, Aakash, I'm not sure I understand your question throughly. For tf.keras, I believe it use different metrics to handle different label formats (eg: AUC, BinaryAccuracy, CategoricalAccuracy, etc)

This seems to be a good approach. If necessary, I vote for this solution.

Let's discuss on this in detail. I think we aren't one the same page yet..lol

@WindQAQ @facaiy I have made a separate file for detailed discussion on this. You can find it here. Let me know what you think

@AakashKumarNain Sorry for the delay, Aakash. Thanks for your detailed RFC, which looks really great. As said before, is it possible to create a metric for every case mentioned by you, for example: CohenKappa, BinaryCohenKappa, CategoricalCohenKappa etc (refer to Accuracy, BinaryAccuracy, CategoricalAccuracy, ... ). What do you think? cc @WindQAQ @seanpmorgan

Thanks @facaiy for the review. I discussed on this with Francois as well. I will try to make it more simple in the coming weeks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

SoufianeDataFan picture SoufianeDataFan  路  4Comments

maziyarpanahi picture maziyarpanahi  路  3Comments

iskorini picture iskorini  路  4Comments

seanpmorgan picture seanpmorgan  路  4Comments

seanpmorgan picture seanpmorgan  路  3Comments