Hi,
I've discovered the flair framework recently and the experience so far is great!
Following what has been by Howard and Ruder with ULMFit, and others, I would be interested in fine-tuning the language models to custom datasets and then plug a custom layer to do some tasks.
I think I can work out the language model fine-tuning by downloading one of your pre-trained models and then use it as initialization of the language model training.
However, for the downstream tasks, I wish I could first train on the e.g. classification layer, and then gradually fine-tune the language models layers.
Thank you very much for your help!
Hi Peter,
thats a great idea and we'd be very interested to see how that would affect downstream NLP tasks!
I think the good news is that fine-tuning the language model should be very easy: you can load a pre-trained LM and then pass it to the LanguageModelTrainer to fine-tune on your target domain corpus:
# load existing language model
language_model = LanguageModel.load_language_model('/path/to/language/model.pt')
# load target domain corpus
corpus: TextCorpus = TextCorpus('path/to/your/domain/corpus',
language_model.dictionary,
language_model.is_forward_lm,
character_level=True)
# pass the trained language model to the trainer, along with the new corpus
trainer = LanguageModelTrainer(language_model, corpus)
# continue training the model on the new corpus
trainer.train('./results', sequence_length=250, mini_batch_size=100, learning_rate=20)
The pre-trained language models we distribute are downloaded into ~/.flair/embeddings when you first call them. So the big news forward model can be found at ~/.flair/embeddings/lm-news-english-forward-v0.2rc.pt. You could try fine-tuning one of these on the target corpus.
With regards to the additional layers, I have to first study the ULMFit paper in greater detail (probably sometime next week). If you have any progress to share on this, we'd appreciate it!
Thanks for your answer Alan,
There are several interesting things in the ulmfit paper, I think the gradual unfreezing of layers could be first added to flair. I will look at it probably next week, there's a freeze() method in the fast.ai cpde that we could include here.
Hello Peter,
that's great! Please let us know if that works - we'd be happy to include it in Flair!
I am trying to fine-tune a language model on a target corpus but getting the following error
TypeError: unsupported operand type(s) for /: 'str' and 'str'
_My script is as follows:
from pathlib import Path
from flair.data import Dictionary
from flair.models import LanguageModel
from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus
language_model = LanguageModel.load_language_model('./best-lm.pt')
corpus: TextCorpus = TextCorpus('./corpus',
language_model.dictionary,
language_model.is_forward_lm,
character_level=True)
trainer = LanguageModelTrainer(language_model, corpus)
trainer.train('./results', sequence_length=250, mini_batch_size=100, learning_rate=20, max_epochs=1)_
I would be happy to get assistance in resolving it
Hello @smutuvi you need to pass a Path (instead of string) to the corpus to indicate the path to the data folder, like this:
corpus: TextCorpus = TextCorpus(Path('./corpus'),
language_model.dictionary,
language_model.is_forward_lm,
character_level=True)
Hope this helps!
Thank you @alanakbik. It works!
Am also working on a Swahili LM. Will share it with you soon
Cool - a Swahili LM would be great to have in Flair! Look forward to hearing about your results!
Most helpful comment
Hi Peter,
thats a great idea and we'd be very interested to see how that would affect downstream NLP tasks!
I think the good news is that fine-tuning the language model should be very easy: you can load a pre-trained LM and then pass it to the
LanguageModelTrainerto fine-tune on your target domain corpus:The pre-trained language models we distribute are downloaded into
~/.flair/embeddingswhen you first call them. So the big news forward model can be found at~/.flair/embeddings/lm-news-english-forward-v0.2rc.pt. You could try fine-tuning one of these on the target corpus.With regards to the additional layers, I have to first study the ULMFit paper in greater detail (probably sometime next week). If you have any progress to share on this, we'd appreciate it!