Flair: Micro average accuracy for multiclass classification

Created on 16 May 2019  路  3Comments  路  Source: flairNLP/flair

Hi,
I am aware that accuracy is computed without taking into account true negatives (tn) as per issue #483 . However, the following is the output produced for a 3 class sentiment classification task (N|P|NEU, all examples are predicted and assigned exactly one class):

Testing using best model ...
loading file flair-exps/eu/models100/best-model.pt
MICRO_AVG: acc 0.5221 - f1-score 0.6861
MACRO_AVG: acc 0.5197 - f1-score 0.6837
N          tp: 203 - fp: 108 - fn: 101 - tn: 757 - precision: 0.6527 - recall: 0.6678 - accuracy: 0.4927 - f1-score: 0.6602
NEU        tp: 300 - fp: 129 - fn: 147 - tn: 593 - precision: 0.6993 - recall: 0.6711 - accuracy: 0.5208 - f1-score: 0.6849
P          tp: 299 - fp: 130 - fn: 119 - tn: 621 - precision: 0.6970 - recall: 0.7153 - accuracy: 0.5456 - f1-score: 0.7060

It seems to me that the micro-averaged accuracy is not correctly computed in this case. I would expect that MICRO AVG acc to be equal to MICRO AVG f1-score. In fact, if we compute accuracy (correct predictions/total predictions) with the above numbers (203+300+299)/ 1169 = 0.6861.
Shouldn't it be MICRO_AVG: acc 0.6861 - f1-score 0.6861 instead of MICRO_AVG: acc 0.5221 - f1-score 0.6861 ?

I think the problem is in training_utils.py#L120, because when calling self.accuracy(None), the total number predictions are computed as the sum of tps,fps and fns, which is not the actual number of samples in the test set.

Again I'm in a multiclass single-label text classification scenario. I haven't tested other tasks.

In any case, thanks for the great work!

bug wontfix

Most helpful comment

Hello @isanvicente thanks for reporting this - we'll take a closer look!

All 3 comments

Hello @isanvicente thanks for reporting this - we'll take a closer look!

I found a similar problem, the way of calculating accuracy in the metrics.py should be (tp+tn)/(tp+tn+fp+fn)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

UrszulaCzerwinska picture UrszulaCzerwinska  路  3Comments

frtacoa picture frtacoa  路  3Comments

mnishant2 picture mnishant2  路  3Comments

Aditya715 picture Aditya715  路  3Comments

Y4rd13 picture Y4rd13  路  3Comments