I am using FLAIR for NER task on my dataset and I have built quite a good model. However, I would like to know how to use this model and fine-tune it with more new samples. Now I am doing everything from scratch, i.e. I am concatenating the old big dataset with the new small dataset and start the training from scratch...
Alternatively, you could try further training the existing model for a few epochs with a small learning rate, e.g.:
# load your previous tagger
tagger = SequenceTagger.load('ner')
# train with your new corpus
trainer: ModelTrainer = ModelTrainer(tagger, corpus)
# 7. start training
trainer.train('resources/taggers/continued_model',
learning_rate=0.01,
mini_batch_size=32,
max_epochs=10)
However, I am not sure how well this works and what parameters are best. The safest option is probably to retrain over everything. But if you test this, I'd be interested to hear your results.
Alternatively, you could try further training the existing model for a few epochs with a small learning rate, e.g.:
# load your previous tagger tagger = SequenceTagger.load('ner') # train with your new corpus trainer: ModelTrainer = ModelTrainer(tagger, corpus) # 7. start training trainer.train('resources/taggers/continued_model', learning_rate=0.01, mini_batch_size=32, max_epochs=10)However, I am not sure how well this works and what parameters are best. The safest option is probably to retrain over everything. But if you test this, I'd be interested to hear your results.
It's a nice work for your flair team. However, I also using FLAIR for NER task on my dataset, but, my dataset has the different label with the ner model, how can I process this issue.
Hi, If your labels are different you can load the weights of the base model and random initialize other weights. You also need to change the tag_dictionary
tagger = SequenceTagger.load('ner')
state = tagger._get_state_dict()
tag_type = 'ner'
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
state['tag_dictionary'] = tag_dictionary
START_TAG: str = "<START>"
STOP_TAG: str = "<STOP>"
state['state_dict']['transitions'] = torch.nn.Parameter(torch.randn(len(tag_dictionary), len(tag_dictionary)))
state['state_dict']['transitions'].detach()[tag_dictionary.get_idx_for_item(START_TAG), :] = -10000
state['state_dict']['transitions'].detach()[:, tag_dictionary.get_idx_for_item(STOP_TAG)] = -10000
num_directions = 2 if tagger.bidirectional else 1
linear_layer = torch.nn.Linear(tagger.hidden_size * num_directions, len(tag_dictionary))
state['state_dict']['linear.weight'] = linear_layer.weight
state['state_dict']['linear.bias'] = linear_layer.bias
model = SequenceTagger._init_model_with_state_dict(state)
trainer: ModelTrainer = ModelTrainer(model, corpus)
trainer.train('finetuned_model',
learning_rate=0.001,
mini_batch_size=64,
max_epochs=10)
Hi, If your labels are different you can load the weights of base model and initialize only weights that are not from base model. You need also to change the tag_dictionary in the state dictionary of the previous model
tagger = SequenceTagger.load('ner') state = tagger._get_state_dict() tag_type = 'ner' tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type) state['tag_dictionary'] = tag_dictionary START_TAG: str = "<START>" STOP_TAG: str = "<STOP>" state['state_dict']['transitions'] = torch.nn.Parameter(torch.randn(len(tag_dictionary), len(tag_dictionary))) state['state_dict']['transitions'].detach()[tag_dictionary.get_idx_for_item(START_TAG), :] = -10000 state['state_dict']['transitions'].detach()[:, tag_dictionary.get_idx_for_item(STOP_TAG)] = -10000 num_directions = 2 if tagger.bidirectional else 1 linear_layer = torch.nn.Linear(tagger.hidden_size * num_directions, len(tag_dictionary)) state['state_dict']['linear.weight'] = linear_layer.weight state['state_dict']['linear.bias'] = linear_layer.bias model = SequenceTagger._init_model_with_state_dict(state) trainer: ModelTrainer = ModelTrainer(tagger, corpus) trainer.train('finetuned_model', learning_rate=0.001, mini_batch_size=64, max_epochs=10)
Thank you very much, I will try it later.
Hi, If your labels are different you can load the weights of the base model and random initialize other weights. You also need to change the tag_dictionary
tagger = SequenceTagger.load('ner') state = tagger._get_state_dict() tag_type = 'ner' tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type) state['tag_dictionary'] = tag_dictionary START_TAG: str = "<START>" STOP_TAG: str = "<STOP>" state['state_dict']['transitions'] = torch.nn.Parameter(torch.randn(len(tag_dictionary), len(tag_dictionary))) state['state_dict']['transitions'].detach()[tag_dictionary.get_idx_for_item(START_TAG), :] = -10000 state['state_dict']['transitions'].detach()[:, tag_dictionary.get_idx_for_item(STOP_TAG)] = -10000 num_directions = 2 if tagger.bidirectional else 1 linear_layer = torch.nn.Linear(tagger.hidden_size * num_directions, len(tag_dictionary)) state['state_dict']['linear.weight'] = linear_layer.weight state['state_dict']['linear.bias'] = linear_layer.bias model = SequenceTagger._init_model_with_state_dict(state) trainer: ModelTrainer = ModelTrainer(model, corpus) trainer.train('finetuned_model', learning_rate=0.001, mini_batch_size=64, max_epochs=10)
Hi msobroza,
The idea :
that you described is indeed the exact idea I want to fine tune the French-NER-model pre-trained and available with Flair.
Unfortunately, the code you gave bring me a model.pt that has a way worse score...
Do you know why ?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
Alternatively, you could try further training the existing model for a few epochs with a small learning rate, e.g.:
However, I am not sure how well this works and what parameters are best. The safest option is probably to retrain over everything. But if you test this, I'd be interested to hear your results.