Flair: Calculating F1 score

Created on 3 Jun 2019 · 6Comments · Source: flairNLP/flair

would you please add some code to the documentation which helps us to calculate F1 score for BIO test and train files?
I want to have a comparison between Prepared Datasets by flair and Trained Model by myself. so I need to inject test data to the model and get the f1 score like here

question

Source

reza1615

Most helpful comment

You can call the tagger evaluate method on a list of sentences from the corpus to get these results.

from flair.datasets import ColumnCorpus
from flair.models import SequenceTagger

corpus: ColumnCorpus = ColumnCorpus(path_to_BIO_corpus, column_format={0: 'text', 1: 'ner'})

tagger: SequenceTagger = SequenceTagger.load('ner')
result, _ = tagger.evaluate(corpus.test)
print(result.detailed_results)

Output will look like:

MICRO_AVG: acc 0.6259 - f1-score 0.7699
MACRO_AVG: acc 0.5408 - f1-score 0.6944
LOC        tp: 3 - fp: 2 - fn: 2 - tn: 3 - precision: 0.6000 - recall: 0.6000 - accuracy: 0.4286 - f1-score: 0.6000
MISC       tp: 6 - fp: 4 - fn: 4 - tn: 6 - precision: 0.6000 - recall: 0.6000 - accuracy: 0.4286 - f1-score: 0.6000
ORG        tp: 30 - fp: 7 - fn: 13 - tn: 30 - precision: 0.8108 - recall: 0.6977 - accuracy: 0.6000 - f1-score: 0.7500
PER        tp: 48 - fp: 13 - fn: 7 - tn: 48 - precision: 0.7869 - recall: 0.8727 - accuracy: 0.7059 - f1-score: 0.8276

This works for any corpus or tagger. Simply change the path or name of the model/corpus.

CamielK on 4 Jun 2019

👍3

All 6 comments

You can call the tagger evaluate method on a list of sentences from the corpus to get these results.

from flair.datasets import ColumnCorpus
from flair.models import SequenceTagger

corpus: ColumnCorpus = ColumnCorpus(path_to_BIO_corpus, column_format={0: 'text', 1: 'ner'})

tagger: SequenceTagger = SequenceTagger.load('ner')
result, _ = tagger.evaluate(corpus.test)
print(result.detailed_results)

Output will look like:

MICRO_AVG: acc 0.6259 - f1-score 0.7699
MACRO_AVG: acc 0.5408 - f1-score 0.6944
LOC        tp: 3 - fp: 2 - fn: 2 - tn: 3 - precision: 0.6000 - recall: 0.6000 - accuracy: 0.4286 - f1-score: 0.6000
MISC       tp: 6 - fp: 4 - fn: 4 - tn: 6 - precision: 0.6000 - recall: 0.6000 - accuracy: 0.4286 - f1-score: 0.6000
ORG        tp: 30 - fp: 7 - fn: 13 - tn: 30 - precision: 0.8108 - recall: 0.6977 - accuracy: 0.6000 - f1-score: 0.7500
PER        tp: 48 - fp: 13 - fn: 7 - tn: 48 - precision: 0.7869 - recall: 0.8727 - accuracy: 0.7059 - f1-score: 0.8276

This works for any corpus or tagger. Simply change the path or name of the model/corpus.

CamielK on 4 Jun 2019

👍3

@Cameilk: Thank you for reply.
would you please tell me a:b means a,b? (for example tagger: SequenceTagger). I didnt see this syntax in python

reza1615 on 4 Jun 2019

variable: type = value
It allows you to define a type hint for the variable (since Python 3.6). It is not required but it can be useful in some scenarios

CamielK on 4 Jun 2019

👍1

@CamielK thanks for answering this and preparing the example!

alanakbik on 13 Jun 2019

A small correction:
Instead of
result, _ = tagger.evaluate(corpus.test)
we need
result, _ = tagger.evaluate([corpus.test])

Ayush-iitkgp on 14 Oct 2019

👍1

I have a problem.

Problem : RuntimeError: The expanded size of the tensor (300) must match the existing size (0) at non-singleton dimension 1. Target sizes: [9, 300]. Tensor sizes: [9, 0]