Flair: Using already trained Flair NER model and fine-tuned with new NER samples

Created on 24 Apr 2020 · 6Comments · Source: flairNLP/flair

I am using FLAIR for NER task on my dataset and I have built quite a good model. However, I would like to know how to use this model and fine-tune it with more new samples. Now I am doing everything from scratch, i.e. I am concatenating the old big dataset with the new small dataset and start the training from scratch...

question wontfix

Source

igormis

Most helpful comment

Alternatively, you could try further training the existing model for a few epochs with a small learning rate, e.g.:

# load your previous tagger
tagger = SequenceTagger.load('ner')

# train with your new corpus
trainer: ModelTrainer = ModelTrainer(tagger, corpus)

# 7. start training
trainer.train('resources/taggers/continued_model',
              learning_rate=0.01,
              mini_batch_size=32,
              max_epochs=10)

However, I am not sure how well this works and what parameters are best. The safest option is probably to retrain over everything. But if you test this, I'd be interested to hear your results.

alanakbik on 25 Apr 2020

👍2

All 6 comments

Alternatively, you could try further training the existing model for a few epochs with a small learning rate, e.g.:

# load your previous tagger
tagger = SequenceTagger.load('ner')

# train with your new corpus
trainer: ModelTrainer = ModelTrainer(tagger, corpus)

# 7. start training
trainer.train('resources/taggers/continued_model',
              learning_rate=0.01,
              mini_batch_size=32,
              max_epochs=10)

However, I am not sure how well this works and what parameters are best. The safest option is probably to retrain over everything. But if you test this, I'd be interested to hear your results.

alanakbik on 25 Apr 2020

👍2

Alternatively, you could try further training the existing model for a few epochs with a small learning rate, e.g.:
# load your previous tagger
tagger = SequenceTagger.load('ner')

# train with your new corpus
trainer: ModelTrainer = ModelTrainer(tagger, corpus)

# 7. start training
trainer.train('resources/taggers/continued_model',
              learning_rate=0.01,
              mini_batch_size=32,
              max_epochs=10)
However, I am not sure how well this works and what parameters are best. The safest option is probably to retrain over everything. But if you test this, I'd be interested to hear your results.

It's a nice work for your flair team. However, I also using FLAIR for NER task on my dataset, but, my dataset has the different label with the ner model, how can I process this issue.

Xuan-ZW on 7 Jul 2020

Hi, If your labels are different you can load the weights of the base model and random initialize other weights. You also need to change the tag_dictionary

tagger = SequenceTagger.load('ner')
state = tagger._get_state_dict()
tag_type = 'ner'
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
state['tag_dictionary'] = tag_dictionary
START_TAG: str = "<START>"
STOP_TAG: str = "<STOP>"
state['state_dict']['transitions'] = torch.nn.Parameter(torch.randn(len(tag_dictionary), len(tag_dictionary)))
state['state_dict']['transitions'].detach()[tag_dictionary.get_idx_for_item(START_TAG), :] = -10000
state['state_dict']['transitions'].detach()[:, tag_dictionary.get_idx_for_item(STOP_TAG)] = -10000
num_directions = 2 if tagger.bidirectional else 1
linear_layer =  torch.nn.Linear(tagger.hidden_size * num_directions, len(tag_dictionary))
state['state_dict']['linear.weight'] = linear_layer.weight
state['state_dict']['linear.bias'] = linear_layer.bias
model = SequenceTagger._init_model_with_state_dict(state)
trainer: ModelTrainer = ModelTrainer(model, corpus)

trainer.train('finetuned_model',
              learning_rate=0.001, 
              mini_batch_size=64,
              max_epochs=10)

msobroza on 16 Jul 2020

Hi, If your labels are different you can load the weights of base model and initialize only weights that are not from base model. You need also to change the tag_dictionary in the state dictionary of the previous model

tagger = SequenceTagger.load('ner')
state = tagger._get_state_dict()
tag_type = 'ner'
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
state['tag_dictionary'] = tag_dictionary
START_TAG: str = "<START>"
STOP_TAG: str = "<STOP>"
state['state_dict']['transitions'] = torch.nn.Parameter(torch.randn(len(tag_dictionary), len(tag_dictionary)))
state['state_dict']['transitions'].detach()[tag_dictionary.get_idx_for_item(START_TAG), :] = -10000
state['state_dict']['transitions'].detach()[:, tag_dictionary.get_idx_for_item(STOP_TAG)] = -10000
num_directions = 2 if tagger.bidirectional else 1
linear_layer =  torch.nn.Linear(tagger.hidden_size * num_directions, len(tag_dictionary))
state['state_dict']['linear.weight'] = linear_layer.weight
state['state_dict']['linear.bias'] = linear_layer.bias
model = SequenceTagger._init_model_with_state_dict(state)
trainer: ModelTrainer = ModelTrainer(tagger, corpus)

trainer.train('finetuned_model',
              learning_rate=0.001, 
              mini_batch_size=64,
              max_epochs=10)

Thank you very much, I will try it later.

Xuan-ZW on 16 Jul 2020

Hi, If your labels are different you can load the weights of the base model and random initialize other weights. You also need to change the tag_dictionary

tagger = SequenceTagger.load('ner')
state = tagger._get_state_dict()
tag_type = 'ner'
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
state['tag_dictionary'] = tag_dictionary
START_TAG: str = "<START>"
STOP_TAG: str = "<STOP>"
state['state_dict']['transitions'] = torch.nn.Parameter(torch.randn(len(tag_dictionary), len(tag_dictionary)))
state['state_dict']['transitions'].detach()[tag_dictionary.get_idx_for_item(START_TAG), :] = -10000
state['state_dict']['transitions'].detach()[:, tag_dictionary.get_idx_for_item(STOP_TAG)] = -10000
num_directions = 2 if tagger.bidirectional else 1
linear_layer =  torch.nn.Linear(tagger.hidden_size * num_directions, len(tag_dictionary))
state['state_dict']['linear.weight'] = linear_layer.weight
state['state_dict']['linear.bias'] = linear_layer.bias
model = SequenceTagger._init_model_with_state_dict(state)
trainer: ModelTrainer = ModelTrainer(model, corpus)

trainer.train('finetuned_model',
              learning_rate=0.001, 
              mini_batch_size=64,
              max_epochs=10)

Hi msobroza,
The idea :

keep existing weights for known entities (PER, LOC, MISC, ORG)
initialize new random weights for new entities

that you described is indeed the exact idea I want to fine tune the French-NER-model pre-trained and available with Flair.

Unfortunately, the code you gave bring me a model.pt that has a way worse score...

Do you know why ?

dacostaHugo on 3 Aug 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.