Pytorch-lightning: Why raise error "cannot infer num_classes when target is all zero" in accuracy metric

Created on 20 Jun 2020 · 13Comments · Source: PyTorchLightning/pytorch-lightning

Why does the accuracy metric raise error when target is all zero?
(https://github.com/PyTorchLightning/pytorch-lightning/blob/0.8.1/pytorch_lightning/metrics/functional/classification.py#L222)

I think it's still reasonable to compute accuracy even when the target in the minibatch happens to be all zero.

Metrics bug / fix help wanted

Source

tridao

All 13 comments

Hi! thanks for your contribution!, great first issue!

github-actions[bot] on 20 Jun 2020

I think it is not meant to compute this accuracy over the minibatch, but rather over the whole data in e.g. validation_epoch_end, and then it should not happen that all targets are 0. If you still want it, you can pass in e.g. num_classes=1 and then the error will not be raised. That's my understanding.
@justusschock correct me if I'm wrong :)

awaelchli on 23 Jun 2020

👎1

almost @awaelchli :)
@tridao when you have only one class, in your targets your sups will be zero for the other class and you will divide by zero since accuracy is generally defined by sum(tps)/sum(sups).
Since we don't want to divide by zero we explicitly excluded that case.

If you set num_classes to 1, you bypass that, but in some cases you may end up with zero division error / NaN

justusschock on 24 Jun 2020

👍1

@awaelchli Ideally, accuracy should work with a minibatch too so that we can just import it from pl and directly use to track this metric during training or validation for each batch. Also, what's sups here @justusschock ??

rohitgr7 on 24 Jun 2020

Yes, this is the issue I'm running into. I'm doing binary classification, and calling accuracy on minibatches. After a while, I'll get a minibatch where the target is all zero, and it errors.
I think this is a common enough situation.

tridao on 24 Jun 2020

Then the solution is to pass in the number of classes, right?

awaelchli on 24 Jun 2020

I don't think there's division by zero.
After removing the raise RuntimeError in the accuracy implementation this code works just fine:

accuracy(torch.tensor([0, 1]), torch.tensor([0, 0]))
# tensor(0.5000)

So I'm not seeing the reason for raising error.

tridao on 24 Jun 2020

Sorry, there's division by zero, but it's handled automatically after taking the mean.
If reduction='none' (per class accuracy):

accuracy(torch.tensor([0, 1]), torch.tensor([0, 0]), reduction='none')
# tensor([0.5000,     nan])

Maybe the error should only be raised if reduction='none'?

tridao on 24 Jun 2020

@tridao to follow up on this, can you maybe create a PR that changes this to a warning maybe and ping me on it?

justusschock on 30 Jul 2020

I think this is actually a more major issue than you were originally considering. During the initial sanity check, it's happened to me that the small sampled data has a uniform class and there is an error thrown. The accuracy function is thus not only being called with a full dataset. I had to switch to the slower sklearn metrics module to solve this problem. It's an easy fix and high priority for your users given this erratic behavior.

sauhaardac on 24 Aug 2020

👍1

@sauhaardac As of now, I agree with you, would you make this a warning or drop completely?

justusschock on 24 Aug 2020

I would drop it completely as other frameworks don't have this kind of a
warning and users don't seem to be complaining about it.
ᐧ

On Mon, Aug 24, 2020 at 12:36 AM Justus Schock notifications@github.com
wrote:

@sauhaardac https://github.com/sauhaardac As of now, I agree with you,
would you make this a warning or drop completely?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/PyTorchLightning/pytorch-lightning/issues/2305#issuecomment-678962022,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACTSCR3TYFYDAPTVLDTZIMDSCIJ6XANCNFSM4ODRQJNA
.