Why does the accuracy metric raise error when target is all zero?
(https://github.com/PyTorchLightning/pytorch-lightning/blob/0.8.1/pytorch_lightning/metrics/functional/classification.py#L222)
I think it's still reasonable to compute accuracy even when the target in the minibatch happens to be all zero.
Hi! thanks for your contribution!, great first issue!
I think it is not meant to compute this accuracy over the minibatch, but rather over the whole data in e.g. validation_epoch_end, and then it should not happen that all targets are 0. If you still want it, you can pass in e.g. num_classes=1 and then the error will not be raised. That's my understanding.
@justusschock correct me if I'm wrong :)
almost @awaelchli :)
@tridao when you have only one class, in your targets your sups will be zero for the other class and you will divide by zero since accuracy is generally defined by sum(tps)/sum(sups).
Since we don't want to divide by zero we explicitly excluded that case.
If you set num_classes to 1, you bypass that, but in some cases you may end up with zero division error / NaN
@awaelchli Ideally, accuracy should work with a minibatch too so that we can just import it from pl and directly use to track this metric during training or validation for each batch. Also, what's sups here @justusschock ??
Yes, this is the issue I'm running into. I'm doing binary classification, and calling accuracy on minibatches. After a while, I'll get a minibatch where the target is all zero, and it errors.
I think this is a common enough situation.
Then the solution is to pass in the number of classes, right?
I don't think there's division by zero.
After removing the raise RuntimeError in the accuracy implementation this code works just fine:
accuracy(torch.tensor([0, 1]), torch.tensor([0, 0]))
# tensor(0.5000)
So I'm not seeing the reason for raising error.
Sorry, there's division by zero, but it's handled automatically after taking the mean.
If reduction='none' (per class accuracy):
accuracy(torch.tensor([0, 1]), torch.tensor([0, 0]), reduction='none')
# tensor([0.5000, nan])
Maybe the error should only be raised if reduction='none'?
@tridao to follow up on this, can you maybe create a PR that changes this to a warning maybe and ping me on it?
I think this is actually a more major issue than you were originally considering. During the initial sanity check, it's happened to me that the small sampled data has a uniform class and there is an error thrown. The accuracy function is thus not only being called with a full dataset. I had to switch to the slower sklearn metrics module to solve this problem. It's an easy fix and high priority for your users given this erratic behavior.
@sauhaardac As of now, I agree with you, would you make this a warning or drop completely?
I would drop it completely as other frameworks don't have this kind of a
warning and users don't seem to be complaining about it.
ᐧ
On Mon, Aug 24, 2020 at 12:36 AM Justus Schock notifications@github.com
wrote:
@sauhaardac https://github.com/sauhaardac As of now, I agree with you,
would you make this a warning or drop completely?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/PyTorchLightning/pytorch-lightning/issues/2305#issuecomment-678962022,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACTSCR3TYFYDAPTVLDTZIMDSCIJ6XANCNFSM4ODRQJNA
.
Same problem here, you need to specify the num_classes