Flair: How to use the flair.hyperparameter.param_selection to finetune Transformer model?

Created on 16 Jun 2020 · 5Comments · Source: flairNLP/flair

I am working on a sentiment analysis model trained on the SENTEVAL_SST_GRANULAR dataset. I am under the impression that the transformer model can outperform the Stanford NLP sentiment classifier. However, I didn't find a pre-trained model on flair so I was hoping to train one myself on google colab. The initial performance of the model on the suggested parameters

learning_rate=3e-5, mini_batch_size=16, embeddings_storage_mode='gpu', max_epochs=50

was not the best, so I thought finetuning was in order. But when I used flair.hyperparameter.param_selection, I had the following error:

RuntimeError: size mismatch, m1: [9 x 0], m2: [768 x 768] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:283

What should I do?
Code for finetuning

from hyperopt import hp
from flair.hyperparameter.param_selection import SearchSpace, Parameter
corpus: Corpus = SENTEVAL_SST_GRANULAR()
search_space = SearchSpace() 
search_space.add(Parameter.EMBEDDINGS, hp.choice, options=[[ TransformerDocumentEmbeddings('distilbert-base-uncased', fine_tune=True, layers="-1") ]])
search_space.add(Parameter.LEARNING_RATE, hp.choice, options=[0.0005, 0.001, 0.0015, 0.002])
search_space.add(Parameter.MINI_BATCH_SIZE, hp.choice, options=[8, 16, 32])
from flair.hyperparameter.param_selection import TextClassifierParamSelector, OptimizationValue
param_selector = TextClassifierParamSelector(
    corpus, 
    False, 
    'folder_location', 
    'mean',
    max_epochs=50, 
    training_runs=3,
    optimization_value=OptimizationValue.DEV_SCORE
)
param_selector.optimize(search_space, max_evals=10)

question wontfix

Source

M-I-Dx

All 5 comments

Hi @M-I-Dx the hyperparameter selection part is currently not maintained and will likely be replaced by a different implementation at some point.

Some things to try: max_epochs should be much lower, we have it at 5 in our example which is also what is typically used. You could also try increasing or decreasing the mini_batch_size depending on the size of the dataset.

But the main thing to try fine-tuning different transformer models, for instance you could try a RoBERTa model. We're also now adding support for sentence transformers to Flair so that is something to try.

alanakbik on 16 Jun 2020

👍1

Do you have any resources that may help me in the task of hyperparameter tuning of transformer models?

M-I-Dx on 16 Jun 2020

Hi @M-I-Dx the hyperparameter selection part is currently not maintained and will likely be replaced by a different implementation at some point.

Some things to try: max_epochs should be much lower, we have it at 5 in our example which is also what is typically used. You could also try increasing or decreasing the mini_batch_size depending on the size of the dataset.

But the main thing to try fine-tuning different transformer models, for instance you could try a RoBERTa model. We're also now adding support for sentence transformers to Flair so that is something to try.

Hi @alanakbik, so is it that to finetune a model using TransformerDocumentEmbedding on Flair, we can try tuning

Learning rate
Mini batch size
Making use of different transformer models

Please let me know if I missed any other hyperparameter which can be finetuned for the same 😛.

What I observed is, currently hyperparameter tuning is supported only for word embedding by making use of either DocumentRNNEmbedding or DocumentPoolEmbedding. It will be really exciting if we are able to tune over different transformer models using Flair's Hyperparameter tuning. 😄 🔥
Thank you and have a great day!

nightlessbaron on 27 Jul 2020

Hello @nightlessbaron yes those are the parameters I would start with. Embeddings really are the most important ones. And yes: the refactoring of the hyperparameter selection routines will also include transformers :)

alanakbik on 12 Aug 2020

❤1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.