Flair: Fine-tuning the language model

Created on 8 Aug 2018 · 7Comments · Source: flairNLP/flair

Hi,

I've discovered the flair framework recently and the experience so far is great!
Following what has been by Howard and Ruder with ULMFit, and others, I would be interested in fine-tuning the language models to custom datasets and then plug a custom layer to do some tasks.

I think I can work out the language model fine-tuning by downloading one of your pre-trained models and then use it as initialization of the language model training.
However, for the downstream tasks, I wish I could first train on the e.g. classification layer, and then gradually fine-tune the language models layers.

Thank you very much for your help!

language model question

Source

petermartigny

👍3

Most helpful comment

Hi Peter,

thats a great idea and we'd be very interested to see how that would affect downstream NLP tasks!

I think the good news is that fine-tuning the language model should be very easy: you can load a pre-trained LM and then pass it to the LanguageModelTrainer to fine-tune on your target domain corpus:

# load existing language model
language_model = LanguageModel.load_language_model('/path/to/language/model.pt')

# load target domain corpus 
corpus: TextCorpus = TextCorpus('path/to/your/domain/corpus',
                                language_model.dictionary,
                                language_model.is_forward_lm,
                                character_level=True)

# pass the trained language model to the trainer, along with the new corpus
trainer = LanguageModelTrainer(language_model, corpus)

# continue training the model on the new corpus
trainer.train('./results', sequence_length=250, mini_batch_size=100, learning_rate=20)

The pre-trained language models we distribute are downloaded into ~/.flair/embeddings when you first call them. So the big news forward model can be found at ~/.flair/embeddings/lm-news-english-forward-v0.2rc.pt. You could try fine-tuning one of these on the target corpus.

With regards to the additional layers, I have to first study the ULMFit paper in greater detail (probably sometime next week). If you have any progress to share on this, we'd appreciate it!

alanakbik on 10 Aug 2018

👍4

All 7 comments

Hi Peter,

thats a great idea and we'd be very interested to see how that would affect downstream NLP tasks!

# load existing language model
language_model = LanguageModel.load_language_model('/path/to/language/model.pt')

# load target domain corpus 
corpus: TextCorpus = TextCorpus('path/to/your/domain/corpus',
                                language_model.dictionary,
                                language_model.is_forward_lm,
                                character_level=True)

# pass the trained language model to the trainer, along with the new corpus
trainer = LanguageModelTrainer(language_model, corpus)

# continue training the model on the new corpus
trainer.train('./results', sequence_length=250, mini_batch_size=100, learning_rate=20)

With regards to the additional layers, I have to first study the ULMFit paper in greater detail (probably sometime next week). If you have any progress to share on this, we'd appreciate it!

alanakbik on 10 Aug 2018

👍4

Thanks for your answer Alan,

There are several interesting things in the ulmfit paper, I think the gradual unfreezing of layers could be first added to flair. I will look at it probably next week, there's a freeze() method in the fast.ai cpde that we could include here.

petermartigny on 17 Aug 2018

Hello Peter,

that's great! Please let us know if that works - we'd be happy to include it in Flair!

alanakbik on 17 Aug 2018

I am trying to fine-tune a language model on a target corpus but getting the following error
TypeError: unsupported operand type(s) for /: 'str' and 'str'

_My script is as follows:
from pathlib import Path
from flair.data import Dictionary
from flair.models import LanguageModel
from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus

language_model = LanguageModel.load_language_model('./best-lm.pt')

load target domain corpus

corpus: TextCorpus = TextCorpus('./corpus',
language_model.dictionary,
language_model.is_forward_lm,
character_level=True)

pass the trained language model to the trainer, along with the new corpus

trainer = LanguageModelTrainer(language_model, corpus)

continue training the model on the new corpus

trainer.train('./results', sequence_length=250, mini_batch_size=100, learning_rate=20, max_epochs=1)_

I would be happy to get assistance in resolving it

smutuvi on 29 Jan 2019

Hello @smutuvi you need to pass a Path (instead of string) to the corpus to indicate the path to the data folder, like this:

corpus: TextCorpus = TextCorpus(Path('./corpus'),
                                language_model.dictionary,
                                language_model.is_forward_lm,
                                character_level=True)

Hope this helps!

alanakbik on 29 Jan 2019

Thank you @alanakbik. It works!

Am also working on a Swahili LM. Will share it with you soon

smutuvi on 29 Jan 2019

Cool - a Swahili LM would be great to have in Flair! Look forward to hearing about your results!

alanakbik on 30 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Flair 0.5 features

alanakbik · 55Comments

Differences in using several embeddings

pascalhuszar · 20Comments

pytorch-pretrained-bert to pytorch-transformers upgrade

stefan-it · 39Comments

Tokenization MISMATCH and RuntimeError: shape '[...]' is invalid for input

AylaRT · 20Comments

Multilingual NER

igormis · 36Comments