Addons: Add new metrics for tf.keras

Created on 2 Apr 2019  路  35Comments  路  Source: tensorflow/addons

Can we please add metrics as well to the tensorflow_addons directory? There are many metrics that are used on a daily basis by many Data Scientists/ML engineers but they are still not available in tf.keras. Some of them are:

  • Cohen's Kappa
  • IOU for bounding boxes
  • f1/f-beta score
help wanted metrics

Most helpful comment

Per our discussion in the monthly meeting this seems like it would be a good subpackage. To start we could begin by defining the API structure for the subpackage and an example metric. f1 score should be pretty easy to implement, seeing as precision and recall are already built in to tf.keras.metrics

All 35 comments

Some of the other metrics/loss functions that could be added are for example pinball loss and SMAPE loss, both from the field of time-series.

Per our discussion in the monthly meeting this seems like it would be a good subpackage. To start we could begin by defining the API structure for the subpackage and an example metric. f1 score should be pretty easy to implement, seeing as precision and recall are already built in to tf.keras.metrics

@AakashKumarNain Hi, Aakash, would you like to make a contribution? By the way, thank you for mentioning Cohen's Kappa, which was implemented by me, haha :-)

@facaiy Sure. Actually, before I was aware of addons, I submitted a PR to TF repo. Check this https://github.com/tensorflow/tensorflow/pull/25051#event-2258495550

@facaiy We need to import class Metric(Layer) in keras_utils.py so that we can directly import it in the metrics subpackage without defining the class explicitly. It will be similar to how LossFunctionWrapper class has been imported. What do you say?

I would say that it sounds reasonable. cc @seanpmorgan Sean, what do you think?

Yes, but it should also be included in our API issue sent to core. I'll be sure to include it and send the issue this week (was just waiting until we had a complete list... but that may never happen)

@seanpmorgan @facaiy any updates on this?

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/metrics.py#L65 I believe its public now? Sorry I've built the list of exports we need, but I have to keep updating it as new APIs are released / want to make sure there are not alternatives

Haha I haven't checked it yet @seanpmorgan but I will do it by tomorrow. Btw 1.14 cut was scheduled on 15th. So whatever is in that release, it should be public, right?

Hi, Aakash. tf_addons depends on tf 2.0 nightly version, rather than 1.x version, so I think it safe to assume that the API is public :-)

@seanpmorgan I forget tf-2.0-alpha, do we need to add a proxy which catches import error and falls back to private API ?

Hey @facaiy Thanks for the info. I am quite busy since last week working on a competition whose deadline is approaching fast. I will look into it on the weekend. Sorry for the delay

Hi, Aakash. tf_addons depends on tf 2.0 nightly version, rather than 1.x version, so I think it safe to assume that the API is public :-)

@seanpmorgan I forget tf-2.0-alpha, do we need to add a proxy which catches import error and falls back to private API ?

Yes, that would be the fix but seeing as we're probably not too far from some sort of tf2 release candidate I can just add it to the list of API rollbacks we'll be doing on the release branches while we wait

@seanpmorgan @facaiy I still don't see the Metric import in the utils file which I was talking about. Anyways, I made some changes in my metric implementation and I am getting a bunch of errors. Specifically, variable assignments aren't allowed and I have to find a way to add a variable in my constructor using add_weight method. The problem is that the variable would be used in update_state method of that class but that variable shouldn't be updated for each iteration/epoch. It is a constant and should be initialized only once in the constructor(ideally)

Could you create an PR ? Then we can take a look and check if there's something wrong there.

Sure @facaiy. I will do that. Thank you

@facaiy before I create a PR, can you please take a look at this notebook? https://colab.research.google.com/drive/1Dd97zdfJhGIrMTpzSKdR6rxnbDGuSBV0

Some of the ops aren't executed in eager mode which is strange. Though if I don't use Metric inheritance, the functions works well

@facaiy did you get a chance to look at the problem?

@AakashKumarNain Sorry for the delay, Aakash, I've been busy and forgot to reply. I'll take a look this weekend.

By the way, would you mind creating an empty metric subpackage directory at first? I just find that the two issues pending depends on the work :-)

@shashvatshahi1998 Welcome, Shashvat. Perhaps you'll like to discuss F1 socre implementation with @AakashKumarNain https://github.com/tensorflow/addons/issues/232

Sorry for the delay @facaiy I was busy in a competition. I have created a PR for the metrics module now

@ychervonyi @shashvatshahi1998 Hi, #247 (thank @AakashKumarNain) metric module has been created in the master branch, so I think we can start the related work from here. Please contact us if any help is needed :-)

before I create a PR, can you please take a look at this notebook? https://colab.research.google.com/drive/1Dd97zdfJhGIrMTpzSKdR6rxnbDGuSBV0
Some of the ops aren't executed in eager mode which is strange. Though if I don't use Metric inheritance, the functions works well

@AakashKumarNain Aakash, I think you're right after I read tf-2.0 code: the update_state of cusom metric is decorated by tf.function automatically, that's why you find those code runs in graph mode. see https://github.com/tensorflow/tensorflow/commit/39a561786aad35c6c203ba698af501652d67da77 :

https://github.com/tensorflow/tensorflow/blob/2c2d508aa2947ede05cfa195139b176d6cdc9056/tensorflow/python/keras/metrics.py#L149-L152

Please correct me if I'm wrong, @pavithrasv Pavithra :-)

But we shouldn't worry it too much, because tf.function will take care of them in theory(both in graph or eager mode). I take a try to fix your python case, and it works (although the result seems wrong): Note that we use nb_ratings = tf.shape(conf_mtx)[0] to retrieve its dynamic shape.

# -*- coding: utf-8 -*-
"""KappaforTF.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/drive/1Dd97zdfJhGIrMTpzSKdR6rxnbDGuSBV0
"""

import tensorflow as tf
import tensorflow.keras.backend as K

tf.__version__

# These are the imports which we need in the utils file
from tensorflow.math import confusion_matrix
from tensorflow.keras.metrics import Metric

class CohensKappa(Metric):
    """Computes Kappa score between two raters.
    The score lies in the range [-1,1], where a score of -1 represents
    complete disagreement between two raters whereas a score of 1 represents
    complete agreement between the two raters. A value of 0 means chance by agreement

    Args:
    y1 : array, shape = [n_samples]
        Labels assigned by the first annotator.
    y2 : array, shape = [n_samples]
        Labels assigned by the second annotator. The kappa statistic is
        symmetric, so swapping ``y1`` and ``y2`` doesn't change the value.
    labels : array, shape = [n_classes], optional
        List of labels to index the matrix. This may be used to select a
        subset of labels. If None, all labels that appear at least once in
        ``y1`` or ``y2`` are used.

    Returns
    kappa : float
        The kappa statistic, which is a number between -1 and 1. The maximum
        value means complete agreement; zero or lower means chance agreement.

    """
    def __init__(self, name='cohens_kappa', dtype=tf.float32,):
        super(CohensKappa, self).__init__(name=name, dtype=dtype)
        self.kappa_score = self.add_weight('kappa_score', initializer=None)

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_true = tf.cast(y_true, dtype=tf.int32)
        y_pred = tf.cast(y_pred, dtype=tf.int32)

        # check the tensors
        print("Actauls: ", y_true)
        print("Predictions: ", y_pred)


        # Get the confusion matrix
        # This is where this function throws an error
        conf_mtx = confusion_matrix(labels=y_true, predictions=y_pred)

        # If you print the confusion matrix, you will see that
        # this isn't executed in eager mode. 
        print("Confusion Matrix:")
        print(conf_mtx)

        conf_mtx = K.cast(conf_mtx, dtype=tf.int32)
        nb_ratings = tf.shape(conf_mtx)[0]

        # 2. Create a weight matrix
        if sample_weight is None:
            weight_mtx = tf.ones([nb_ratings, nb_ratings], dtype=tf.int32)
            diagonal = tf.zeros([5], dtype=tf.int32)
            weight_mtx = tf.linalg.set_diag(weight_mtx, diagonal=diagonal)
            weight_mtx = tf.cast(weight_mtx, dtype=tf.float32)

        elif sample_weight=="linear": 
            weight_mtx = tf.ones([nb_ratings, nb_ratings], dtype=tf.int32)
            weight_mtx += tf.range(nb_ratings, dtype=tf.int32)
            weight_mtx = tf.cast(weight_mtx, dtype=tf.float32)
            weight_mtx = tf.abs(weight_mtx - K.transpose(weight_mtx))

        elif sample_weight=="quadratic":
            weight_mtx = tf.ones([nb_ratings, nb_ratings], dtype=tf.int32)
            weight_mtx += tf.range(nb_ratings, dtype=tf.int32)
            weight_mtx = tf.cast(weight_mtx, dtype=tf.float32)
            weight_mtx = K.pow((weight_mtx - K.transpose(weight_mtx)), 2)

        else:
            raise ValueError("Unknown kappa weighting type.")

        actual_ratings_hist = K.sum(conf_mtx, axis=1)
        predicted_ratings_hist = K.sum(conf_mtx, axis=0)

        out_prod = predicted_ratings_hist[..., None] * actual_ratings_hist[None, ...]

        conf_mtx = conf_mtx / K.sum(conf_mtx)
        out_prod = out_prod / K.sum(out_prod)

        numerator = K.sum(tf.cast(conf_mtx, tf.float32) * weight_mtx)
        denominator = K.sum(tf.cast(out_prod, tf.float32) * weight_mtx)          
        kp = 1-(numerator/denominator)
        return self.kappa_score.assign(kp)

    def result(self):
        return self.kappa_score

actuals = tf.convert_to_tensor([4, 4, 3, 4, 2, 4, 1, 1, 2, 0], dtype=tf.int32)
preds = tf.convert_to_tensor([4, 4, 3, 4, 4, 2, 1, 1, 2, 0], dtype=tf.int32)

print('confusion_matrix')
print(confusion_matrix(actuals, preds))

kp = CohensKappa()
print('CohensKappa')
print(kp.variables)
print('init')
print(kp.result())
kp.update_state(actuals, preds, sample_weight=None)
print('result')
print(kp.result())

@facaiy Thanks for this information. I am on vacation till next Thursday. Once I am back, I will look into it and will correct it.

No worries, you're welcome :-)

@facaiy I fixed it. I have also included a simple function to test the implementation with sklearn implementation. Please take a look again and see if if it is fine.

https://colab.research.google.com/drive/1Dd97zdfJhGIrMTpzSKdR6rxnbDGuSBV0

@AakashKumarNain Good news! Could you create a PR, Aakash?

@facaiy I am almost done. I am stuck at the test case though. I am getting some errors. Can you please look into it again?
https://colab.research.google.com/drive/1Dd97zdfJhGIrMTpzSKdR6rxnbDGuSBV0

Towards the end you will find the Test Case section. Thank you

Hi, @AakashKumarNain I think you are looking for

self.assertAllClose(score1, 0.68932) # could feed tensor/np-array/float and tolerance defaults to 1e-5

or

self.assertAlmostEqual(self.evaluate(score1), 0.68932, 5) # value only

Thanks for the contribution btw 馃槃

Thanks @WindQAQ for the info. Actually I tried both of them and even they are throwing errors.

1) self.assertAllClose(score1, 0.68932)

E0601 04:35:49.201909 140117080422272 test_util.py:1522] 2 root error(s) found.
  (0) Failed precondition: Error while reading resource variable kappa_score_3 from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/kappa_score_3)
     [[node Identity_3/ReadVariableOp (defined at <ipython-input-27-433e0c86f80d>:17) ]]
     [[Identity_3/_1]]
  (1) Failed precondition: Error while reading resource variable kappa_score_3 from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/kappa_score_3)
     [[node Identity_3/ReadVariableOp (defined at <ipython-input-27-433e0c86f80d>:17) ]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node Identity_3/ReadVariableOp:
 kappa_score_3 (defined at <ipython-input-10-2938b575c3c5>:28)

Input Source operations connected to node Identity_3/ReadVariableOp:
 kappa_score_3 (defined at <ipython-input-10-2938b575c3c5>:28)

2) self.assertAlmostEqual(self.evaluate(score1), 0.68932, 5)

E0601 04:37:52.891778 140117080422272 test_util.py:1522] 2 root error(s) found.
  (0) Failed precondition: Error while reading resource variable kappa_score_4 from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/kappa_score_4)
     [[node Identity_4/ReadVariableOp (defined at <ipython-input-30-c2339c53b877>:17) ]]
  (1) Failed precondition: Error while reading resource variable kappa_score_4 from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/kappa_score_4)
     [[node Identity_4/ReadVariableOp (defined at <ipython-input-30-c2339c53b877>:17) ]]
     [[Identity_4/_1]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node Identity_4/ReadVariableOp:
 kappa_score_4 (defined at <ipython-input-10-2938b575c3c5>:28)

Input Source operations connected to node Identity_4/ReadVariableOp:
 kappa_score_4 (defined at <ipython-input-10-2938b575c3c5>:28)

Hi, after a rough test and digging into keras implementation, I found update_state is forced to execute in graph mode.

https://github.com/tensorflow/tensorflow/blob/a6e5f879e5e36cb33efaed25597cc254ed71bae5/tensorflow/python/keras/metrics.py#L212

And thus, it would be correct to return the update_op in update_state, that is, self.kappa_score.assign(kp).

class CohensKappa(...):
    def update_state(self, ...):
        ...
        return self.kappa_score.assign(kp)

And do tests like the following:

@test_util.run_all_in_graph_and_eager_modes
class CohensKappaTest(tf.test.TestCase):
    def test_config(self):
      kp_obj = CohensKappa(name='cohens_kappa')
      self.assertEqual(kp_obj.name, 'cohens_kappa')

    def test_kappa(self):
      actuals = np.array([4, 4, 3, 4, 2, 4, 1, 1], dtype=np.int32)
      preds = np.array([4, 4, 3, 4, 4, 2, 1, 1], dtype=np.int32)
      sample_weights = 'quadratic'
      kp_obj = CohensKappa()

      actuals = tf.convert_to_tensor(actuals, dtype=tf.int32)
      preds = tf.convert_to_tensor(preds, dtype=tf.int32)

      # self.evaluate(tf.compat.v1.initializers.variables(kp_obj.variables))
      update_op = kp_obj.update_state(actuals, preds, sample_weight=sample_weights)
      self.evaluate(update_op)
      #score2 = cohen_kappa_score(actuals, preds, weights=sample_weights)

      #print(f"This implementation : {score2:>20}")
      self.assertAlmostEqual(self.evaluate(kp_obj.result()), 0.68932, 5)

I also found this function update_confusion_matrix_variables. Not sure if it could help you aggregate the stats or something.

Edit: revised notebook here: https://colab.research.google.com/drive/1NUnHu9IPZqgJAeXxXqcm3K1QXJQlmHQ_

Perfect. Thanks @WindQAQ

Good to close this? Seems like we have a few separate metric's issues with some overlap?

Yup, we can track metrics under #265.

Yeap. Good to close.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

seanpmorgan picture seanpmorgan  路  3Comments

seanpmorgan picture seanpmorgan  路  3Comments

WindQAQ picture WindQAQ  路  4Comments

ididhmc picture ididhmc  路  4Comments

seanpmorgan picture seanpmorgan  路  4Comments