Transformers: How to add some parameters in gpt-2 (in attention layer) and initialize the original gpt-2 parameters with pre-trained model and the new introduced parameters randomly?

Created on 2 Aug 2019 · 6Comments · Source: huggingface/transformers

Hi,

I want to add some weight matrices inside attention layers of gpt-2 model. However, I want to initialize all original parameters with pre-trained gpt-2 and the newly added ones randomly.
Can someone guide me how that's possible or point me to the right direction?

Thanks

wontfix

Source

fabrahman

Most helpful comment

You should make a class deriving from GPT2Model in which:

the __init__ method
- calls its super class __init__ method (to add the original GPT2 modules),
- you then add the new modules (with names differents from GPT2 original attributes so you don't overwrite over them).
- you call self.init_weights() at the end to initalize your weights (check the init_weights method in GPT2PreTrainedModel to be sure it initialize as you want to)
the forward method has to be written as you want the forward pass to be.

You can then load the pretrained weights and initialize your newly added weights just by doing the usual model = MyGPT2Model.form_pretrained('gpt2').

thomwolf on 5 Aug 2019

👍4

All 6 comments

You should make a class deriving from GPT2Model in which:

the __init__ method
- calls its super class __init__ method (to add the original GPT2 modules),
- you then add the new modules (with names differents from GPT2 original attributes so you don't overwrite over them).
- you call self.init_weights() at the end to initalize your weights (check the init_weights method in GPT2PreTrainedModel to be sure it initialize as you want to)
the forward method has to be written as you want the forward pass to be.

You can then load the pretrained weights and initialize your newly added weights just by doing the usual model = MyGPT2Model.form_pretrained('gpt2').

thomwolf on 5 Aug 2019

👍4

Thanks @thomwolf . Just to clarify, does that mean if I need to change the attention layer a little bit, then I have to make three classes derived from GPT2Model , Block ,and Attention? And for that, can I use the original Attention modules inside my forward pass of myAttention?

Should it be something like following?

class myAttention(Attention):
    def __init__(self, nx, n_ctx, config, scale=False):
        super(myAttention, self).__init__()

    def forward(): ### my customized forward pass


class myBlock(Block):
    def __init__(self, n_ctx, config, scale=False):
        super(myBlock, self).__init__()
    def forward(...):   ### my customized forward pass

class myGPT2Model(GPT2Mode):
    def __init__(self, config):
        super(myGPT2Model, self).__init__(config)
        ....
        self.apply(self.init_weights)
    def forward(...).  ### my customized forward pass

fabrahman on 31 Aug 2019

Maybe but it depends on what you put in the .... parts

thomwolf on 2 Sep 2019

@thomwolf Is it right that I have to have three separate classes each derived from GPT2Model, Block and Attention ?
In general, I want to have one additional input to myGPT2Model forward method and I want to incorporate that in the Attention computation.
What I did is I added that aux input to fw of myGPT2Model, I called the block inside myGPT2Model forward with original and aux input,
Then in the myBlock forward method, I called Attention with the two inputs.

fabrahman on 2 Sep 2019

Probably right.

Maybe the easiest in your case would be to copy the modeling_gpt2 file in whole and modify what you need in the copy.

thomwolf on 2 Sep 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.