Transformers: if crf needed when do ner?

Created on 3 Apr 2019  路  3Comments  路  Source: huggingface/transformers

If crf needed when do ner? In BertForTokenClassification, just Linear is used to predict tag. If not, why?

Discussion wontfix

Most helpful comment

A CRF gives better NER F1 scores in some cases, but not necessarily in all cases. In the BERT paper, no CRF is used and hence also no CRF in this repository. I'd presume the BERT authors tested both with and without CRF and found that a CRF layer gives no improvement, since using a CRF is kind of the default setting nowadays.

All 3 comments

A CRF gives better NER F1 scores in some cases, but not necessarily in all cases. In the BERT paper, no CRF is used and hence also no CRF in this repository. I'd presume the BERT authors tested both with and without CRF and found that a CRF layer gives no improvement, since using a CRF is kind of the default setting nowadays.

Issue #64 is a good reference for discussion on NER.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rsanjaykamath picture rsanjaykamath  路  3Comments

guanlongtianzi picture guanlongtianzi  路  3Comments

0x01h picture 0x01h  路  3Comments

iedmrc picture iedmrc  路  3Comments

hsajjad picture hsajjad  路  3Comments