Fasttext: precision and recall are identical

Created on 29 Sep 2016  路  15Comments  路  Source: facebookresearch/fastText

Hi, I am using the python interface to fastText (https://github.com/salestock/fastText.py), but I have noticed the same problem here.

I am testing a binary classification (i.e., labels: yes, no) with "supervised" and I am obtaining, in the result output of the "test" method, identical values for precision and recall.
Training and test sets are a split of a dataset in which, for each line, the first token is __label__yes/__label__no, and the rest of the tokens are Italian terms separated by a space.

Is this normal? Am I doing something wrong?

Thank in advance
Giacomo

Most helpful comment

Hi all,
The commit https://github.com/facebookresearch/fastText/commit/be1e597cb67c069ba9940ff241d9aad38ccd37da adds the ability to display the score for each label.
Please note that the new command is "test-label" and not "print-label-scores" as mentioned in the description.

Regards.

All 15 comments

After looking at the code, this problem seems due to the "interpretation" of precision and recall in fastText.
In a binary classification problem I would consider one label as "presence" of the class, and the other as absence, i.e., positive and negative. FastText apparently interprets any classification problem as multi-class.

Now I build the confusion matrix for evaluation on my "concepts" of positive and negative. The precision and recall I obtain are different from the one value returned by the fastText test method, and, under the current conditions, I consider mine the valid ones.

Agreed, how is fasttext computing its precision and recall? I've got a binary classification example where the outcome is rarely "true". fasttext reports P = R = .99 but the F1 score I've calculated myself is .125 due to very low recall.

fastText is computing precision and recall for all the labels, treating each problem as multi-class (as noted by @giacbrd).

@jnmiller: In the case of binary classification, it thus compute precision and recall for both labels, while you want precision and recall only for the positive one. We might add per-label metrics in the future, although we do not have a timeline yet.

@EdouardGrave having per-label metrics could help a lot!

@EdouardGrave @giacbrd in the meanwhile what could be an alternative solution to compute these values per label?

@loretoparisi I simply computed the confusion matrix by myself, according to my requirements.

@giacbrd Giusto, grazie!

@giacbrd btw there a pul request related to this, never merged! https://github.com/facebookresearch/fastText/pull/35

@EdouardGrave it seems that the way fastText computes precision and recall in multi-class problem is different with "micro-precision" and "micro-recall"?

***generating model***
-input train.txt -output model_component -lr 0.1 -epoch 1000 -wordNgrams 5 -minCount 2 -ws 10 -loss softmax -verbose 2 -dim 100 -thread 16

Read 4M words
Number of words:  65930
Number of labels: 10
Progress: 100.0% words/sec/thread:  971478 lr:  0.000000 loss:  0.095427 ETA:   0h 0m

***Validating***
N       2301
Number of examples: 2301

P@1     0.72
R@1     0.72

P@2     0.428
R@2     0.856

P@3     0.305
R@3     0.916

P@10    0.1
R@10    1

How do I interpret this result?
Depending on P@k and R@k what is the best way to evaluate my model?
Can someone point to any formula/algorithm?

I am still clueless. Can someone help

@giacbrd : can you please elaborate with a toy example
If FastText interprets any classification problem as multi-class, why does it matter ?

Confusion matrix is a valid concept for multiclass setting as well.

@anujgupta82 I am just making a guess with my knowledge of recall and precision.

First of all, the metrics are not not recall and precision, but its recall@1 and precision@1
Which means we are retrieving only 1 label per query in the validation set and then averaging the result.
So our denominator in case of precision and recall is always 1, that's why the value of P@1 equals to R@1.
If you query for P@k and R@k where k>1 => P@k != R@k

Please let me know if I am wrong.

@anujgupta82 I have not used fastText for a long time. I suggest you to compute the evaluation by yourself, according to your requirements. You need just few lines of python, so you will be sure of the results

Hi all,
The commit https://github.com/facebookresearch/fastText/commit/be1e597cb67c069ba9940ff241d9aad38ccd37da adds the ability to display the score for each label.
Please note that the new command is "test-label" and not "print-label-scores" as mentioned in the description.

Regards.

@Celebio @EdouardGrave that's awesome!!! I have just updated fasttext.js to support this feature!

[
  {
    "ham": {
      "F1": "0.172414",
      "P": "0.094340",
      "R": "1.000000"
    }
  },
  {
    "spam": {
      "F1": "0.950495",
      "P": "0.905660",
      "R": "1.000000"
    }
  }
]
Was this page helpful?
0 / 5 - 0 ratings

Related issues

mino98 picture mino98  路  3Comments

shriiitk picture shriiitk  路  3Comments

nomadlx picture nomadlx  路  3Comments

leonardgithub picture leonardgithub  路  4Comments

loretoparisi picture loretoparisi  路  3Comments