Can you please explain as to how can I make the Bert model used in BertEmbeddings trainable during a sequence tagging model?
Any news on this? As soon as I start using BertEmbeddings I get CUDA OOM errors and I'm unable to find much on how to manage GPU memory in Flair.
Hi @PradyumnaGupta ,
fine-tuning a BERT model is currently not possible in Flair. But you can use the fine-tuning example from Hugging Face library: https://github.com/huggingface/transformers/tree/master/examples/ner.
After you've fine-tuned your model, you can load it with Flair :)
@stefan-it is there a difference in the term "fine-tuning". Because fine-tuning in the context of downstream tasks (NER, ..) is possible with flair, looking at the tutorial. Or am i mistake the term "fine-tuning"
@pascalhuszar see my answer in #1527 - in master branch, you can now fine-tune BERT and other transformer embeddings in the task.
Once we're done testing this, we'll do a release of Flair that adds fine-tuning transformers. This seems to work especially well for text classification - for sequence labeling we still get best results using a feature based approach (i.e. no fine-tuning).
Thanks for the fast reply! @alanakbik i'm a bit confused with the terms "training" and "fine-tuning": Is the meaning of the terms the same in context of tutorial 7.
Or is training the term for downstream task (e.g. NER; ..)? And fine-tuning is then?
Fine-tuning is a special case of training where we start with an existing (i.e. already trained) model and just "fine-tune" (make slight modifications to) the weights for a new task. If not fine-tuning (i.e. normal "training"), we start with a randomly initialized model, so it is trained from scratch instead.
Generally, for NER there are two broad ways of creating taggers using language models (LMs):
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
Fine-tuning is a special case of training where we start with an existing (i.e. already trained) model and just "fine-tune" (make slight modifications to) the weights for a new task. If not fine-tuning (i.e. normal "training"), we start with a randomly initialized model, so it is trained from scratch instead.
Generally, for NER there are two broad ways of creating taggers using language models (LMs):