I'm currently trying to use a pretrained BertModel for finetuning but I want to remove some of the layers from the model before fine-tuning.
How do I do this?
If this is important to anyone, I have found a solution:
def deleteEncodingLayers(model, num_layers_to_keep): # must pass in the full bert model
oldModuleList = model.bert.encoder.layer
newModuleList = nn.ModuleList()
# Now iterate over all layers, only keepign only the relevant layers.
for i in range(0, len(num_layers_to_keep)):
newModuleList.append(oldModuleList[i])
# create a copy of the model, modify it with the new list, and return
copyOfModel = copy.deepcopy(model)
copyOfModel.bert.encoder.layer = newModuleList
return copyOfModel
Hi,
Thank you for your question and solution. I also want to try such kind of thing.
I have a question. If I remove some layers, do I need to do pre-train from scratch again?
How does the performance look if you only do finetuning on GLUE or Squad tasks? Does the accuracy go down dramatically?
Thanks,
ZLK
@ZLKong no, the remaining layers will remain trained. Not quite sure what you mean by only fine-tuning though.
Thank you for your reply!
I want to decrease the FLOPS by simply removing some layers from the model. I want to see if I remove some layers, how much will if effect the accuracy of SQUAD task.
(If the accuracy goes down a lot, that means I might have do the pretraining again?)
Do you have any experiments on this?
Best,
ZLK
I haven't, but I'm sure in the original paper they performed a test like that. If not, I guarantee there will be a paper out there that does given how much research has been chucked at bert :)
OK, I will look if there are any papers about it. I will run some testings, too.
Thank you very much!
If you're dealing with loading a pretrained model, there is an easier way to remove the top layer:
config = XLNetConfig.from_pretrained(checkpoint)
config.n_layer = 29 #was 30 layers, in my case
model = XLNetModel.from_pretrained(checkpoint, config = config)
This will produce a warning that there are unused weights in the checkpoint and you'll get a model with the top layer removed.
Most helpful comment
If this is important to anyone, I have found a solution: