Hi,
I want to add some weight matrices inside attention layers of gpt-2 model. However, I want to initialize all original parameters with pre-trained gpt-2 and the newly added ones randomly.
Can someone guide me how that's possible or point me to the right direction?
Thanks
You should make a class deriving from GPT2Model in which:
__init__ method__init__ method (to add the original GPT2 modules),self.init_weights() at the end to initalize your weights (check the init_weights method in GPT2PreTrainedModel to be sure it initialize as you want to)forward method has to be written as you want the forward pass to be.You can then load the pretrained weights and initialize your newly added weights just by doing the usual model = MyGPT2Model.form_pretrained('gpt2').
Thanks @thomwolf . Just to clarify, does that mean if I need to change the attention layer a little bit, then I have to make three classes derived from GPT2Model , Block ,and Attention? And for that, can I use the original Attention modules inside my forward pass of myAttention?
Should it be something like following?
class myAttention(Attention):
def __init__(self, nx, n_ctx, config, scale=False):
super(myAttention, self).__init__()
def forward(): ### my customized forward pass
class myBlock(Block):
def __init__(self, n_ctx, config, scale=False):
super(myBlock, self).__init__()
def forward(...): ### my customized forward pass
class myGPT2Model(GPT2Mode):
def __init__(self, config):
super(myGPT2Model, self).__init__(config)
....
self.apply(self.init_weights)
def forward(...). ### my customized forward pass
Maybe but it depends on what you put in the .... parts
@thomwolf Is it right that I have to have three separate classes each derived from GPT2Model, Block and Attention ?
In general, I want to have one additional input to myGPT2Model forward method and I want to incorporate that in the Attention computation.
What I did is I added that aux input to fw of myGPT2Model, I called the block inside myGPT2Model forward with original and aux input,
Then in the myBlock forward method, I called Attention with the two inputs.
Probably right.
Maybe the easiest in your case would be to copy the modeling_gpt2 file in whole and modify what you need in the copy.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
You should make a class deriving from
GPT2Modelin which:__init__method__init__method (to add the original GPT2 modules),self.init_weights()at the end to initalize your weights (check theinit_weightsmethod inGPT2PreTrainedModelto be sure it initialize as you want to)forwardmethod has to be written as you want the forward pass to be.You can then load the pretrained weights and initialize your newly added weights just by doing the usual
model = MyGPT2Model.form_pretrained('gpt2').