Fasttext: precision and recall are identical

Created on 29 Sep 2016 · 15Comments · Source: facebookresearch/fastText

Hi, I am using the python interface to fastText (https://github.com/salestock/fastText.py), but I have noticed the same problem here.

I am testing a binary classification (i.e., labels: yes, no) with "supervised" and I am obtaining, in the result output of the "test" method, identical values for precision and recall.
Training and test sets are a split of a dataset in which, for each line, the first token is __label__yes/__label__no, and the rest of the tokens are Italian terms separated by a space.

Is this normal? Am I doing something wrong?

Thank in advance
Giacomo

Source

giacbrd

Most helpful comment

Hi all,
The commit https://github.com/facebookresearch/fastText/commit/be1e597cb67c069ba9940ff241d9aad38ccd37da adds the ability to display the score for each label.
Please note that the new command is "test-label" and not "print-label-scores" as mentioned in the description.

Regards.

Celebio on 24 Oct 2018

👍4 ❤1

All 15 comments

After looking at the code, this problem seems due to the "interpretation" of precision and recall in fastText.
In a binary classification problem I would consider one label as "presence" of the class, and the other as absence, i.e., positive and negative. FastText apparently interprets any classification problem as multi-class.

Now I build the confusion matrix for evaluation on my "concepts" of positive and negative. The precision and recall I obtain are different from the one value returned by the fastText test method, and, under the current conditions, I consider mine the valid ones.

giacbrd on 30 Sep 2016

👍3

Agreed, how is fasttext computing its precision and recall? I've got a binary classification example where the outcome is rarely "true". fasttext reports P = R = .99 but the F1 score I've calculated myself is .125 due to very low recall.

jnmiller on 25 Oct 2016

fastText is computing precision and recall for all the labels, treating each problem as multi-class (as noted by @giacbrd).

@jnmiller: In the case of binary classification, it thus compute precision and recall for both labels, while you want precision and recall only for the positive one. We might add per-label metrics in the future, although we do not have a timeline yet.

EdouardGrave on 17 Nov 2016

@EdouardGrave having per-label metrics could help a lot!

loretoparisi on 3 Jul 2017

👍4

@EdouardGrave @giacbrd in the meanwhile what could be an alternative solution to compute these values per label?

loretoparisi on 3 Jul 2017

@loretoparisi I simply computed the confusion matrix by myself, according to my requirements.

giacbrd on 3 Jul 2017

@giacbrd Giusto, grazie!

loretoparisi on 3 Jul 2017

@giacbrd btw there a pul request related to this, never merged! https://github.com/facebookresearch/fastText/pull/35

loretoparisi on 3 Jul 2017

@EdouardGrave it seems that the way fastText computes precision and recall in multi-class problem is different with "micro-precision" and "micro-recall"?

superrrrrr1995 on 3 Apr 2018

***generating model***
-input train.txt -output model_component -lr 0.1 -epoch 1000 -wordNgrams 5 -minCount 2 -ws 10 -loss softmax -verbose 2 -dim 100 -thread 16

Read 4M words
Number of words:  65930
Number of labels: 10
Progress: 100.0% words/sec/thread:  971478 lr:  0.000000 loss:  0.095427 ETA:   0h 0m

***Validating***
N       2301
Number of examples: 2301

P@1     0.72
R@1     0.72

P@2     0.428
R@2     0.856

P@3     0.305
R@3     0.916

P@10    0.1
R@10    1

How do I interpret this result?
Depending on P@k and R@k what is the best way to evaluate my model?
Can someone point to any formula/algorithm?

a11apurva on 25 Apr 2018

I am still clueless. Can someone help

@giacbrd : can you please elaborate with a toy example
If FastText interprets any classification problem as multi-class, why does it matter ?

Confusion matrix is a valid concept for multiclass setting as well.

anujgupta82 on 4 Jun 2018

@anujgupta82 I am just making a guess with my knowledge of recall and precision.

First of all, the metrics are not not recall and precision, but its recall@1 and precision@1
Which means we are retrieving only 1 label per query in the validation set and then averaging the result.
So our denominator in case of precision and recall is always 1, that's why the value of P@1 equals to R@1.
If you query for P@k and R@k where k>1 => P@k != R@k

Please let me know if I am wrong.

a11apurva on 4 Jun 2018

@anujgupta82 I have not used fastText for a long time. I suggest you to compute the evaluation by yourself, according to your requirements. You need just few lines of python, so you will be sure of the results

giacbrd on 4 Jun 2018

Regards.

Celebio on 24 Oct 2018

👍4 ❤1

@Celebio @EdouardGrave that's awesome!!! I have just updated fasttext.js to support this feature!

[
  {
    "ham": {
      "F1": "0.172414",
      "P": "0.094340",
      "R": "1.000000"
    }
  },
  {
    "spam": {
      "F1": "0.950495",
      "P": "0.905660",
      "R": "1.000000"
    }
  }
]

loretoparisi on 24 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings