Flair: TextClassifier label predictions always have score 1.0

Created on 9 Mar 2019  路  11Comments  路  Source: flairNLP/flair

Hi guys,

I have trained my own TextClassifier following this tutorial: https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_7_TRAINING_A_MODEL.md

I am using the option multi_label=False, as each sentence should be assigned only one label. In terms of embeddings, I use FlairEmbeddings mix-forward and mix-backward.

The issue is that every time I predict the label of a new unseen sentence, I get a label score = 1.0. It seems that it never has any different value between 0.0 and 1.0. It is always 1.0.

Is this the expected behavior? What am I doing wrong?

Thanks!

bug

Most helpful comment

I did some tests and it seems to work very well. Thanks once again! :)

All 11 comments

Hello @algomax that should not happen. Which dataset are you using? Does the model train OK, i.e. does it reach good F-scores?

Hi @alanakbik,

I have a relatively small dataset, around 3000 examples in total. The examples fall in around 30 different classes, so about 100 examples per class. The examples are short sentences - things that you would typically say in a chat conversation (e.g "how are you today" or "cancel this task please"). Something specific about the dataset is that many of the examples within one category are quite similar - usually a few words difference.

When I train the classification model, I get pretty good micro and macro F1 scores - above 90%. However, I face two main problems when I apply the model to new unseen examples:

  1. Everything is classified with score 1.0. Always.
  2. When I try to predict an example which is clearly not in any of the 30 classes, the model predicts one of the 30 classes with confidence 1.0.

It seems that the model does not learn a good representation of the data. Could it be due to overfitting? How could I train the model so that examples which do not fall in any class are not assigned a label?

Thanks!

Hi @algomax score 1.0 definitely sound like a bug so this is something we need to check out.

Is your training data always labeled, i.e. are there any examples that do not belong to any of the 30 classes? If you do not have such examples, the model will believe that each item must belong to one of the 30 classes - simply because in the training data it has never seen otherwise. So this explains at least why a class is always predicted, but not why confidence is always 1.0.

I'll add a bug label to this issue - we'll take a closer look.

Hi @alanakbik,

Do we have an update on this? I am facing a similar issue.

Thanks!

Hi @ashahzada,

As far as I know, there is no update so far on this issue. It seems to be a bug or we are somehow training the model incorrectly. Until this issue is resolved, I decided to use a different text classification tool. I hope this helps. Regards!

Thank you @algomax !

Hello @ashahzada @algomax - we haven't looked into the error yet, but I'll be back in office next week (currently travelling) and will take a look.

I found certain clues in the code that might be causing the issue:

  1. The first clue is that this line in the predict method of the TextClassifier class:
scores = self.forward(batch)

in my case generates a tensor like this:

tensor([[ 2.2129,  0.2944,  5.3489, -0.7817,  1.1176, -0.6716, -0.5714, -2.0608,
          3.6055,  3.1275,  1.4636,  4.2829,  1.8713, -2.6787, -0.4784,  0.6990,
          0.5859, -1.5066, -3.6451, -1.7429, -4.3391,  2.3591, -1.4598, -0.2570,
         -1.7566,  0.0564, -5.3699, -5.1569,  2.4890, -2.1881,  4.4994,  0.4503,
          3.9420, -4.4185,  1.2770,  3.5883,  5.4932, -0.7858, -4.2145,  2.3449,
         -2.8780]])

Please notice that the scores in the tensor range between -6.0 to +6.0.

  1. The second clue is that the score setter of the Label class check whether the score is within the range [0, 1] and if not, sets it to 1.0. This is exactly why we always get predictions with score 1.0. For example, when the max score of 5.3489 is selected, it will be reduced to 1.0 by the setter.
@score.setter
    def score(self, score):
        if 0.0 <= score <= 1.0:
            self._score = score
        else:
            self._score = 1.0

The next step probably is to figure out why certain scores are estimated outside the [0, 1] range. Any thoughts?

Thanks! Looks like there's a softmax missing - I'll put in a PR!

Added the fix to master - should work now but let me know if it doesn't!

I did some tests and it seems to work very well. Thanks once again! :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alanakbik picture alanakbik  路  3Comments

davidsbatista picture davidsbatista  路  3Comments

inyukwo1 picture inyukwo1  路  3Comments

ciaochiaociao picture ciaochiaociao  路  3Comments

Aditya715 picture Aditya715  路  3Comments