Transformers: if crf needed when do ner?

Created on 3 Apr 2019  路  3Comments  路  Source: huggingface/transformers

If crf needed when do ner? In BertForTokenClassification, just Linear is used to predict tag. If not, why?

Discussion wontfix

Most helpful comment

A CRF gives better NER F1 scores in some cases, but not necessarily in all cases. In the BERT paper, no CRF is used and hence also no CRF in this repository. I'd presume the BERT authors tested both with and without CRF and found that a CRF layer gives no improvement, since using a CRF is kind of the default setting nowadays.

All 3 comments

A CRF gives better NER F1 scores in some cases, but not necessarily in all cases. In the BERT paper, no CRF is used and hence also no CRF in this repository. I'd presume the BERT authors tested both with and without CRF and found that a CRF layer gives no improvement, since using a CRF is kind of the default setting nowadays.

Issue #64 is a good reference for discussion on NER.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings