Flair: How to use the flair.hyperparameter.param_selection to finetune Transformer model?

Created on 16 Jun 2020  路  5Comments  路  Source: flairNLP/flair

I am working on a sentiment analysis model trained on the SENTEVAL_SST_GRANULAR dataset. I am under the impression that the transformer model can outperform the Stanford NLP sentiment classifier. However, I didn't find a pre-trained model on flair so I was hoping to train one myself on google colab. The initial performance of the model on the suggested parameters

learning_rate=3e-5, mini_batch_size=16, embeddings_storage_mode='gpu', max_epochs=50

was not the best, so I thought finetuning was in order. But when I used flair.hyperparameter.param_selection, I had the following error:

RuntimeError: size mismatch, m1: [9 x 0], m2: [768 x 768] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:283

What should I do?
Code for finetuning

from hyperopt import hp
from flair.hyperparameter.param_selection import SearchSpace, Parameter
corpus: Corpus = SENTEVAL_SST_GRANULAR()
search_space = SearchSpace() 
search_space.add(Parameter.EMBEDDINGS, hp.choice, options=[[ TransformerDocumentEmbeddings('distilbert-base-uncased', fine_tune=True, layers="-1") ]])
search_space.add(Parameter.LEARNING_RATE, hp.choice, options=[0.0005, 0.001, 0.0015, 0.002])
search_space.add(Parameter.MINI_BATCH_SIZE, hp.choice, options=[8, 16, 32])
from flair.hyperparameter.param_selection import TextClassifierParamSelector, OptimizationValue
param_selector = TextClassifierParamSelector(
    corpus, 
    False, 
    'folder_location', 
    'mean',
    max_epochs=50, 
    training_runs=3,
    optimization_value=OptimizationValue.DEV_SCORE
)
param_selector.optimize(search_space, max_evals=10)
question wontfix

All 5 comments

Hi @M-I-Dx the hyperparameter selection part is currently not maintained and will likely be replaced by a different implementation at some point.

Some things to try: max_epochs should be much lower, we have it at 5 in our example which is also what is typically used. You could also try increasing or decreasing the mini_batch_size depending on the size of the dataset.

But the main thing to try fine-tuning different transformer models, for instance you could try a RoBERTa model. We're also now adding support for sentence transformers to Flair so that is something to try.

Do you have any resources that may help me in the task of hyperparameter tuning of transformer models?

Hi @M-I-Dx the hyperparameter selection part is currently not maintained and will likely be replaced by a different implementation at some point.

Some things to try: max_epochs should be much lower, we have it at 5 in our example which is also what is typically used. You could also try increasing or decreasing the mini_batch_size depending on the size of the dataset.

But the main thing to try fine-tuning different transformer models, for instance you could try a RoBERTa model. We're also now adding support for sentence transformers to Flair so that is something to try.

Hi @alanakbik, so is it that to finetune a model using TransformerDocumentEmbedding on Flair, we can try tuning

  • Learning rate
  • Mini batch size
  • Making use of different transformer models

Please let me know if I missed any other hyperparameter which can be finetuned for the same 馃槢.

What I observed is, currently hyperparameter tuning is supported only for word embedding by making use of either DocumentRNNEmbedding or DocumentPoolEmbedding. It will be really exciting if we are able to tune over different transformer models using Flair's Hyperparameter tuning. 馃槃 馃敟
Thank you and have a great day!

Hello @nightlessbaron yes those are the parameters I would start with. Embeddings really are the most important ones. And yes: the refactoring of the hyperparameter selection routines will also include transformers :)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

happypanda5 picture happypanda5  路  3Comments

frtacoa picture frtacoa  路  3Comments

shoarora picture shoarora  路  3Comments

inyukwo1 picture inyukwo1  路  3Comments

aschmu picture aschmu  路  3Comments