Transformers: if crf needed when do ner?

Created on 3 Apr 2019 · 3Comments · Source: huggingface/transformers

If crf needed when do ner? In BertForTokenClassification, just Linear is used to predict tag. If not, why?

Discussion wontfix

Source

alphanlp

Most helpful comment

A CRF gives better NER F1 scores in some cases, but not necessarily in all cases. In the BERT paper, no CRF is used and hence also no CRF in this repository. I'd presume the BERT authors tested both with and without CRF and found that a CRF layer gives no improvement, since using a CRF is kind of the default setting nowadays.

bheinzerling on 3 Apr 2019

👍3 ❤1

All 3 comments

bheinzerling on 3 Apr 2019

👍3 ❤1

Issue #64 is a good reference for discussion on NER.

thomwolf on 3 Apr 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.