Transformers: Removing pretrained layers?

Created on 9 Jan 2020 · 7Comments · Source: huggingface/transformers

❓ Questions & Help

I'm currently trying to use a pretrained BertModel for finetuning but I want to remove some of the layers from the model before fine-tuning.

How do I do this?

Source

officialpatterson

Most helpful comment

If this is important to anyone, I have found a solution:

def deleteEncodingLayers(model, num_layers_to_keep):  # must pass in the full bert model
    oldModuleList = model.bert.encoder.layer
    newModuleList = nn.ModuleList()

    # Now iterate over all layers, only keepign only the relevant layers.
    for i in range(0, len(num_layers_to_keep)):
        newModuleList.append(oldModuleList[i])

    # create a copy of the model, modify it with the new list, and return
    copyOfModel = copy.deepcopy(model)
    copyOfModel.bert.encoder.layer = newModuleList

    return copyOfModel

officialpatterson on 11 Jan 2020

👍2

All 7 comments

If this is important to anyone, I have found a solution:

def deleteEncodingLayers(model, num_layers_to_keep):  # must pass in the full bert model
    oldModuleList = model.bert.encoder.layer
    newModuleList = nn.ModuleList()

    # Now iterate over all layers, only keepign only the relevant layers.
    for i in range(0, len(num_layers_to_keep)):
        newModuleList.append(oldModuleList[i])

    # create a copy of the model, modify it with the new list, and return
    copyOfModel = copy.deepcopy(model)
    copyOfModel.bert.encoder.layer = newModuleList

    return copyOfModel

officialpatterson on 11 Jan 2020

👍2

Hi,

Thank you for your question and solution. I also want to try such kind of thing.

I have a question. If I remove some layers, do I need to do pre-train from scratch again?

How does the performance look if you only do finetuning on GLUE or Squad tasks? Does the accuracy go down dramatically?

Thanks,
ZLK

ZLKong on 4 Aug 2020

@ZLKong no, the remaining layers will remain trained. Not quite sure what you mean by only fine-tuning though.

officialpatterson on 4 Aug 2020

Thank you for your reply!

I want to decrease the FLOPS by simply removing some layers from the model. I want to see if I remove some layers, how much will if effect the accuracy of SQUAD task.

(If the accuracy goes down a lot, that means I might have do the pretraining again?)

Do you have any experiments on this?

Best,
ZLK

ZLKong on 4 Aug 2020

I haven't, but I'm sure in the original paper they performed a test like that. If not, I guarantee there will be a paper out there that does given how much research has been chucked at bert :)

officialpatterson on 4 Aug 2020

OK, I will look if there are any papers about it. I will run some testings, too.
Thank you very much!

ZLKong on 4 Aug 2020

If you're dealing with loading a pretrained model, there is an easier way to remove the top layer:

config = XLNetConfig.from_pretrained(checkpoint)
config.n_layer = 29 #was 30 layers, in my case
model = XLNetModel.from_pretrained(checkpoint, config = config)

This will produce a warning that there are unused weights in the checkpoint and you'll get a model with the top layer removed.