I am working on a sentiment analysis model trained on the SENTEVAL_SST_GRANULAR dataset. I am under the impression that the transformer model can outperform the Stanford NLP sentiment classifier. However, I didn't find a pre-trained model on flair so I was hoping to train one myself on google colab. The initial performance of the model on the suggested parameters
learning_rate=3e-5,
mini_batch_size=16,
embeddings_storage_mode='gpu',
max_epochs=50
was not the best, so I thought finetuning was in order. But when I used flair.hyperparameter.param_selection, I had the following error:
RuntimeError: size mismatch, m1: [9 x 0], m2: [768 x 768] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:283
What should I do?
Code for finetuning
from hyperopt import hp
from flair.hyperparameter.param_selection import SearchSpace, Parameter
corpus: Corpus = SENTEVAL_SST_GRANULAR()
search_space = SearchSpace()
search_space.add(Parameter.EMBEDDINGS, hp.choice, options=[[ TransformerDocumentEmbeddings('distilbert-base-uncased', fine_tune=True, layers="-1") ]])
search_space.add(Parameter.LEARNING_RATE, hp.choice, options=[0.0005, 0.001, 0.0015, 0.002])
search_space.add(Parameter.MINI_BATCH_SIZE, hp.choice, options=[8, 16, 32])
from flair.hyperparameter.param_selection import TextClassifierParamSelector, OptimizationValue
param_selector = TextClassifierParamSelector(
corpus,
False,
'folder_location',
'mean',
max_epochs=50,
training_runs=3,
optimization_value=OptimizationValue.DEV_SCORE
)
param_selector.optimize(search_space, max_evals=10)
Hi @M-I-Dx the hyperparameter selection part is currently not maintained and will likely be replaced by a different implementation at some point.
Some things to try: max_epochs should be much lower, we have it at 5 in our example which is also what is typically used. You could also try increasing or decreasing the mini_batch_size depending on the size of the dataset.
But the main thing to try fine-tuning different transformer models, for instance you could try a RoBERTa model. We're also now adding support for sentence transformers to Flair so that is something to try.
Do you have any resources that may help me in the task of hyperparameter tuning of transformer models?
Hi @M-I-Dx the hyperparameter selection part is currently not maintained and will likely be replaced by a different implementation at some point.
Some things to try:
max_epochsshould be much lower, we have it at 5 in our example which is also what is typically used. You could also try increasing or decreasing themini_batch_sizedepending on the size of the dataset.But the main thing to try fine-tuning different transformer models, for instance you could try a RoBERTa model. We're also now adding support for sentence transformers to Flair so that is something to try.
Hi @alanakbik, so is it that to finetune a model using TransformerDocumentEmbedding on Flair, we can try tuning
Please let me know if I missed any other hyperparameter which can be finetuned for the same 馃槢.
What I observed is, currently hyperparameter tuning is supported only for word embedding by making use of either DocumentRNNEmbedding or DocumentPoolEmbedding. It will be really exciting if we are able to tune over different transformer models using Flair's Hyperparameter tuning. 馃槃 馃敟
Thank you and have a great day!
Hello @nightlessbaron yes those are the parameters I would start with. Embeddings really are the most important ones. And yes: the refactoring of the hyperparameter selection routines will also include transformers :)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.