Addons: Add hamming loss for both multiclass and multilabel

Created on 17 Jun 2019 · 33Comments · Source: tensorflow/addons

System information

TensorFlow version (you are using): 1.13
TensorFlow Addons version:
Is it in the tf.contrib (if so, where):
Are you willing to contribute it (yes/no): Yes
Are you willing to maintain it going forward? (yes/no): Yes

Describe the feature and the current behavior/state.
Hamming score is of great interest in multilabel classification.

Will this change the current api? How?
Yes, it will add a new feature

Who will benefit with this feature?
Anyone working with multilabel classification

Any Other info.
Initial colab notebook: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB

Feature Request metrics

Source

SSaishruthi

👍1

Most helpful comment

from tensorflow_addons.metrics.utils import MeanMetricWrapper should work? If you're talking about in a colab notebook you may have to use !pip install tfa-nightly if it was added after 0.4 release

seanpmorgan on 3 Jul 2019

👍2

All 33 comments

Clarification:
Do we need to hold state information for this?

SSaishruthi on 17 Jun 2019

@SSaishruthi Yes, we'll need a running sum of hamming loss and count increment every time update_state is called. The result can return the average value of hamming loss.

Squadrick on 18 Jun 2019

@Squadrick
Perfect, thanks for the clarification. Will submit a PR soon.

SSaishruthi on 18 Jun 2019

@SSaishruthi Use this: MeanMetricWrapper. Keras already has something that wraps a stateless function and does the aggregate.

Squadrick on 18 Jun 2019

Define a function hamming_loss like this. Then make a HammingLoss metric subclassing MeanMetricWrapper like here.

Squadrick on 18 Jun 2019

@Squadrick
Thanks again for the links. Will keep you posted about the updates.

SSaishruthi on 18 Jun 2019

@Squadrick
I tried wrapping hamming metrics. Below are the observations.

Maintaining epoch count and dividing it by current results did not provide the desired result.

Reference: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB#scrollTo=DvlbepsGFZjj&line=4&uniqifier=1

Instead, I maintained the count variable for holding the number of data points in a particular epoch. It worked fine.
Reference: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB#scrollTo=UKTf8PxceWDH

I am not able to import MeanMetricWrapper so used Mean

If this is fine, I will create a PR with all supporting scripts.

SSaishruthi on 19 Jun 2019

@seanpmorgan @facaiy @WindQAQ

We can't import MeanMetricWrapper using tf.keras.metrics.MeanMetricWrapper, but can be imported using tf.python.keras.metrics.MeanMetricWrapper. Is the latter fine, or should I open a PR for TF master to tf_export the API for MeanMetricWrapper (here).

Squadrick on 19 Jun 2019

Exposing MeanMetricWrapper will make the implementation much cleaner.

def hamming_loss(y_true, y_pred, mode='multiclass'):
    if mode not in ['multiclass', 'multilabel']:
        raise TypeError('mode must be: [None, multilabel])')

    if mode == 'multiclass':
        nonzero = tf.cast(tf.math.count_nonzero(y_true * y_pred, axis=-1), tf.float32)
        return 1.0 - nonzero

    else:
        nonzero = tf.cast(tf.math.count_nonzero(y_true - y_pred, axis=-1), 
            tf.float32)
        return nonzero / y_true.get_shape()[-1]


class HammingLoss(tf.python.keras.metrics.MeanMetricWrapper):
    def __init__(self, name='hamming_loss', dtype=None, mode='multiclass'):
        super(HammingLoss, self).__init__(
                hamming_loss, name, dtype=dtype, mode=mode)

Squadrick on 19 Jun 2019

@seanpmorgan @facaiy @WindQAQ

We can't import MeanMetricWrapper using tf.keras.metrics.MeanMetricWrapper, but can be imported using tf.python.keras.metrics.MeanMetricWrapper. Is the latter fine, or should I open a PR for TF master to tf_export the API for MeanMetricWrapper (here).

@Squadrick so tf.python is not a public API and we should avoid it. You can bring this up in this issue:
https://github.com/tensorflow/tensorflow/issues/28601 to see what tf-core devs recommend. It may be exposing the API as public or just copying it statically into Addons.

seanpmorgan on 19 Jun 2019

@seanpmorgan @Squadrick
Should I proceed with Mean till we get a response on this?

SSaishruthi on 19 Jun 2019

@seanpmorgan @Squadrick

Are we going have a version of MeanMetricWrapper in addons?

SSaishruthi on 25 Jun 2019

I think so, see https://github.com/tensorflow/tensorflow/issues/28601#issuecomment-505098700

facaiy on 25 Jun 2019

I'm copied the implementation from core TF to TFA: #316. Once that's merged, @SSaishruthi can proceed with the implementation.

Squadrick on 25 Jun 2019

👍1

Looks like the PR got merged. I will start working on that.

SSaishruthi on 26 Jun 2019

@Squadrick @facaiy
Getting this error when trying to import tensorflow addons in colab.

Any comment on how to get rid of this?
NotFoundError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory

SSaishruthi on 3 Jul 2019

Any comment on how to get rid of this?
NotFoundError: libtensorflow_framework.so.2: cannot open shared object file: No such file or directory

@SSaishruthi Could you link the colab notebook? Be sure to run !pip install tensorflow==2.0.0-beta1 first. This error likely means that you're running tf2-alpha or tf1.x

seanpmorgan on 3 Jul 2019

@seanpmorgan

Colab link: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB

Using tf2-beta1

SSaishruthi on 3 Jul 2019

Colab link: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB
Using tf2-beta1

Could you try to reset the runtime and run the cells in order again. I just created a copy and it's working:
https://colab.research.google.com/drive/1wKDdQCirA4LEHdx4bgkQHP1YZZSAT-5I

seanpmorgan on 3 Jul 2019

@seanpmorgan Thanks

I was just resetting the current runtime. Just tried after resetting all the run times and it worked.

SSaishruthi on 3 Jul 2019

👍1

I am trying to import MeanMetricWrapper and not able to. Only CohenKappa is available

Please view the same notebook for reference. Not sure if I need to build from source.

Should I do anything from my side?

@seanpmorgan

SSaishruthi on 3 Jul 2019

seanpmorgan on 3 Jul 2019

👍2

@Squadrick
I am trying to wrap hamming loss using MeanMetricWrapper as per the suggestion. I have some clarifications about the same.

Taking Mean over the total value was not yielding a proper result.

Using the mean method: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB#scrollTo=UKTf8PxceWDH

As you can see in the notebook, result does not match.

Whereas, if I hold the state of number of records in every epoch I was able to get the result expected results.

Holding state: https://colab.research.google.com/drive/1Msuv5xUu7lu5wDH1ei-VOPB-UnBolDfB#scrollTo=bGBO5unx33xS

I am not sure if I am missing anything here. Can I use the regular method of using Metric?
Please suggest.

SSaishruthi on 11 Jul 2019

Also, for hamming distance metric, I think it is ok to have a function like below just like euclidean.

If this is fine, I can create a separate PR for this. This can be used as an alternate distance metric

def hamming_distance(actuals, predictions):
  result=tf.not_equal(actuals,predictions)
  not_eq = tf.reduce_sum(tf.cast(result, tf.float32))
  ham_distance = tf.math.divide_no_nan(not_eq, len(result))
  return ham_distance

SSaishruthi on 11 Jul 2019

def hamming_loss(y_true, y_pred, mode='multiclass'):
    if mode not in ['multiclass', 'multilabel']:
        raise TypeError('mode must be: [multiclass, multilabel])')

    if mode == 'multiclass':
        nonzero = tf.cast(tf.math.count_nonzero(y_true * y_pred, axis=-1), tf.float32)
        print(nonzero)
        return 1.0 - nonzero

    else:
        nonzero = tf.cast(tf.math.count_nonzero(y_true - y_pred, axis=-1), 
            tf.float32)
        return nonzero / y_true.get_shape()[-1]


class HammingLoss(tf.python.keras.metrics.MeanMetricWrapper):
    def __init__(self, name='hamming_loss', dtype=None, mode='multiclass'):
        super(HammingLoss, self).__init__(
                hamming_loss, name, dtype=dtype, mode=mode)

This works for me. The idea to to have hamming_loss calculate loss from each sample in the batch separately, and let MeanMetricWrapper do the aggregation.

So:

actuals = tf.constant([[0, 1, 0, 0], [0, 1, 0, 0], [1, 0, 0, 0]], 
        dtype=tf.int32)
predictions = tf.constant([[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0]], 
        dtype=tf.int32)
print(hamming_loss(actuals, predictions, mode='multiclass').numpy())  #prints [1, 1, 1]
hamm = HammingLoss(mode='multiclass')
hamm.update_state(actuals, predictions)
print(hamm.result().numpy())  # prints 1.0

Squadrick on 11 Jul 2019

@Squadrick Thanks for the clarification. Got the idea now. Will create a PR

SSaishruthi on 11 Jul 2019

@Squadrick Also, can we have hamming distance separately as a distance metric?

SSaishruthi on 11 Jul 2019

@SSaishruthi You can call the file hamming.py or hamming_metrics.py and add: hamming_distance, hamming_loss and HammingLoss (as a tf.keras.metrics.Metric).

Squadrick on 11 Jul 2019

👍1

@Squadrick How would this look as a loss function instead of a metric?

rjurney on 9 Aug 2019

👍1

@rjurney The only problem I see is that tf.count_nonzero is non-differentiable which could be solved by rewriting it with a close approximation, resulting in:

def hamming_loss(y_true, y_pred):
  diff = tf.cast(y_true - y_pred, dtype=tf.float32)

  #Counting non-zeros in a differentiable way
  epsilon = K.epsilon()
  nonzero = tf.reduce_sum( tf.math.abs( diff / (tf.math.abs(diff) + epsilon) ))

  return tf.reduce_mean(nonzero / K.int_shape(y_pred)[-1])

hichameyessou on 26 Sep 2019

@seanpmorgan why closed?

rjurney on 2 Dec 2019

👍1

@seanpmorgan why closed?

Hamming loss was merged in https://github.com/tensorflow/addons/pull/342.

seanpmorgan on 2 Dec 2019

Cool!

On Mon, Dec 2, 2019 at 6:15 AM Sean Morgan notifications@github.com wrote:

@seanpmorgan https://github.com/seanpmorgan why closed?

Hamming loss was merged in #342
https://github.com/tensorflow/addons/pull/342.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/addons/issues/305?email_source=notifications&email_token=AAAKJJIUHQ7I3KUD3IWUKTTQWUKAJA5CNFSM4HYZG6CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFTTZYQ#issuecomment-560413922,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAAKJJPUVWC6PIEPLK2O36TQWUKAJANCNFSM4HYZG6CA
.

rjurney on 2 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Request for example: Weight Decay Optimizers / Super Convergence

seanpmorgan · 4Comments

Complete black formatting

seanpmorgan · 3Comments

AttentionWrapperTest results failing on nightlies

seanpmorgan · 4Comments

euclidean_distance_transform_op.cc build falied on GPU

facaiy · 3Comments

Use tf.saved_model with tfa.activations.mish produce error: "Op type not registered 'Addons>Mish' in binary running"

pikaliov · 3Comments