Transformers: Retrain/reuse fine-tuned models on a different set of labels

Created on 20 Jul 2020 · 16Comments · Source: huggingface/transformers

❓ Questions & Help

Details

Hello,
I am wondering is it possible to reuse or retrain a fine-tuned model with a new set of labels(the set of labels contain new labels or the new set of labels is a subset of the labels used to fine-tune the model)?
What I try to do is fine-tune pre-trained models for a task (e.g. NER) in the domain free dataset, then reuse/retrain this fine-tuned model to do a similar task but in a more specific domain (e.g. NER for healthcare), thus in this specific domain, the set of labels may not be the same.

I already try to fine-tune a BERT model to do NER on WNUT17 data based on token classification example in Transformers GitHub. After that, I try to retrain the fine-tuned model by adding a new label and provide train data that has this label, the train failed with error:

RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([13, 1024]) from checkpoint, the shape in current model is torch.Size([15, 1024]).
size mismatch for classifier.bias: copying a param with shape torch.Size([13]) from checkpoint, the shape in current model is torch.Size([15]).

Is it possible to do this with Transformers and if so how? Maybe there is a method that can do something like this(the method is from spaCy). Thank you in advance!

I already post this in the forum
Retrain/reuse fine-tuned models on a different set of labels

wontfix

Source

kevin-yauris

All 16 comments

@kevin-yauris I had a similar problem with retraining fine-tuned model. Here is what I have done.

Do not pass config parameter when creating your model with from_pretrained(). Just initialize it with something like this:

model = AutoModelForTokenClassification.from_pretrained(
            model_name,
            from_tf=bool(".ckpt" in model_name),
            cache_dir=cache_dir,
        )

Then, you will need to change the last layer in the model. I was using PyTorch to fine-tune a blank model initially, therefore these steps will work for PyTorch models.

The last layer in the TokenClassification model is called classification. It is simply a linear layer, so you can create new one with the correct shape and randomized weights, and assign it to the initialized model classification layer. Say before my layer was (768,5) with the initial 5 classes, and now I want 9 so make a final layer with shape (768,9).

#reinitiallize the final classification layer to match new number of labels
model.classifier = torch.nn.Linear(in_features=model.classifier.in_features, out_features=config.num_labels, bias=True)
model.config = config
model.num_labels = config.num_labels

Since to initialize the model you will be loading config file from the fine-tuned model, you also want to change model config to your current one with the new classes, so the correct config gets exported after your model is trained. Also you will want to modify num_labels of the model, since that was initialized with the old number of classes in the old config.

TarasPriadka on 3 Aug 2020

👍1

Hi @TarasPriadka thank you for answering
I also to the same thing that you did but with Tensorflow https://discuss.huggingface.co/t/retrain-reuse-fine-tuned-models-on-different-set-of-labels/346/5?u=kevinyauris.
I forget about the model.num_labels tho, thank you for the catch.
I wonder if there is another way to do it since if we replace the last layer with randomized weights we can't use the learned weight for some labels that are the same with previous labels/classes.
Let's say there are 3 classes in the initial model and now I want to add 1 more class but the other classes are the same. If we use this method all weights for the last layer are randomized and we need to fine-tune the model with all the data again instead of just give train data for the new class.

kevin-yauris on 4 Aug 2020

@kevin-yauris I've seen your forum post since I've been looking for a solution. My idea is that you already have an id2label and label2id in the model, so you could find if the incoming labels are already trained in the fine-tuned model. You find those which are not and you add randomized layers for them. However I am not sure how you can take a layer, and then just add randomized rows to it.

TarasPriadka on 4 Aug 2020

👍1

Hi @TarasPriadka ,

Thanks for sharing the solution. I followed the same steps which solved this error

RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([17, 1024]) from checkpoint, the shape in current model is torch.Size([13, 1024]).
size mismatch for classifier.bias: copying a param with shape torch.Size([17]) from checkpoint, the shape in current model is torch.Size([13]).

but now, it throws another error -

model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
File "/usr/local/lib/python3.7/site-packages/transformers/trainer.py", line 514, in train
optimizer.step()
File "/usr/local/lib/python3.7/site-packages/torch/optim/lr_scheduler.py", line 67, in wrapper
return wrapped(args, *kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/optimization.py", line 244, in step
exp_avg.mul_(beta1).add_(grad, alpha=1.0 - beta1)
RuntimeError: The size of tensor a (17) must match the size of tensor b (13) at non-singleton dimension 0

I had first trained the model on a dataset having 17 classes and now I want to transfer this model to the 2nd dataset which has 13 labels.

Do we have to change the num_labels for any other layer ?

Thanks,

vikas95 on 9 Aug 2020

@vikas95 I am not sure, but just changing the model's num_labels seemed to be working for me. However, I was scaling up labels, not reducing them. I would assume that it should have the same solution. Maybe you can share your model's layers before and after applying my fix with print(model), and we can take a look into a possible solution.

TarasPriadka on 9 Aug 2020

Hi @TarasPriadka ,

Thanks for the suggestion, I printed the model after loading the checkpoint and after updating the classification layer.

The classification layer output dimension is changing from your mentioned solution i.e.,

initially after loading the checkpoint, model = AutoModelForTokenClassification.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
cache_dir=model_args.cache_dir,
)
The classification layer size is - (classifier): Linear(in_features=1024, out_features=17, bias=True)

and after updating the classification layer, the size is - (classifier): Linear(in_features=1024, out_features=13, bias=True)

The rest of the layers look similar but I am still not sure why its throwing the previously mentioned error.

-Vikas

vikas95 on 9 Aug 2020

@vikas95, so the shape of the model is fine. The issue is in exp_avg.mul_(beta1).add_(grad, alpha=1.0 - beta1). The exception highlights that exp_avg is of size 17 and its trying to add grad which is 13. So the problem is in exp_avg, since it wasn't updated along with everything else. Can you share the whole chunk of code where you initialize the model, trainer, etc?

TarasPriadka on 9 Aug 2020

Hi @TarasPriadka ,

Here is the part where I initialize the model (which is from run_ner.py (https://github.com/huggingface/transformers/blob/6e8a38568eb874f31eb49c42285c3a634fca12e7/examples/token-classification/run_ner.py#L158)) -

labels = get_labels(data_args.labels)
label_map = {i: label for i, label in enumerate(labels)}
num_labels = len(labels)

config = AutoConfig.from_pretrained(
    model_args.config_name if model_args.config_name else model_args.model_name_or_path,
    num_labels=num_labels,
    id2label=label_map,
    label2id={label: i for i, label in enumerate(labels)},
    cache_dir=model_args.cache_dir,
)

tokenizer = AutoTokenizer.from_pretrained(
    model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
    cache_dir=model_args.cache_dir,
    use_fast=model_args.use_fast,
)
model = AutoModelForTokenClassification.from_pretrained(
    model_args.model_name_or_path,
    from_tf=bool(".ckpt" in model_args.model_name_or_path),
    cache_dir=model_args.cache_dir,
)

model.classifier = torch.nn.Linear(in_features=model.classifier.in_features, out_features=config.num_labels, bias=True)
model.config = config    
model.num_labels = config.num_labels

vikas95 on 9 Aug 2020

@vikas95 Can you also share the trainer code

TarasPriadka on 9 Aug 2020

@TarasPriadka - Its the same as in run_ner.py
I haven't changed any other part of the code.

vikas95 on 9 Aug 2020

@vikas95 can you check if in your model folder you have this file optimizer.pt and scheduler.pt

TarasPriadka on 9 Aug 2020

@TarasPriadka ,

Thanks for the help, I was giving a specific checkpoint directory as the model path i.e., "datasetA_model/checkpoint-6000/" which had both optimizer.pt and scheduler.pt

but then I changed the model path to just "datasetA_model/" and it works fine with no errors.
I am guessing that if I just give the "datasetA_model/" as model path then it would select the highest checkpoint ?

Anyway, thanks a lot for looking at the problem and for all the quick responses and help 😬

vikas95 on 9 Aug 2020

@vikas95 This was a great deal of fun. When you are running

trainer.train(
            model_path=model_name if os.path.isdir(model_name) else None
        )

trainer loads in those files, and initializes Adam optimizer with them. Optimizer breaks since you are changing the shape of the output layer, but optimizer was initialized with the other shape. What you can do, is either delete that file, or just run trainer.train() without parameters.

TarasPriadka on 9 Aug 2020

👍1

Cool, this makes sense.
Thanks again for the explanation. I was actually trying with just trainer.train() for last 30 minutes and it works fine.

Thanks again for all the help and explanations.

vikas95 on 10 Aug 2020

👍1

Does anyone know what is the alternative method in Pytorch?

YojanaGadiya on 12 Sep 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.