Flair: NER task using Flair BertEmbeddings VS HuggingFace scripts

Created on 3 Apr 2020 · 3Comments · Source: flairNLP/flair

Hi everyone!

I am new to NLP and NER so I'm still trying to understand how exactly different architectures work.

My question is the following: the architecture implemented to do NER using Flair BertEmbeddings withing Flair SequenceTagger is the same implemented by HuggingFace Team in the pytorch/tf example scripts here?

In particular my doubt is related to the fact the Flair SequenceTagger is based on Bi-LSTM(-CRF) that I still see through layers when running, while HuggingFace scripts are based purely on the Transformer architecture.

I am running this tutorial in Google Colab.

I'd really appreciate any clarification. Thank you all in advance.

Best regards.

question

Source

ChessMateK

👍1

Most helpful comment

For the Huggingface scripts @stefan-it is the person to ask :)

Both implementations are very different: In Flair, our default sequence labeling architecture is BiLSTM-CRF with a feature-based approach (i.e. no fine-tuning of transformers) trained with many epochs of SGD and annealing. Huggingface is I believe doing a fine-tuning of transformers as in the BERT paper (few epochs, very small learning rate, Adam optimizer) which is very different.

We are just now adding this fine-tuning transformers approach for Flair as well - it's part of master branch and undergoing testing (see #1494), so it will be part of the next release. It should allow the community to directly compare both approaches.