Fairseq: Initialize only encoder decoder weights from a pre-trained model

Created on 2 Jan 2019  路  3Comments  路  Source: pytorch/fairseq

I am trying to fine tune a seq2seq translation model for domain adaptation. I understand I can fine tune the whole model by restoring from the checkpoint file (keeping the vocabularies same). But I only want to initialize some of the parameters (for example only encode-decoder weights) from the pre-trained model. How can this be done in fairseq?

Also, is it possible to initialize weights from a numpy array using torch.from_numpy() and If not, what part of the code do I need to change to accomplish this?

Thanks a lot for your help.

Most helpful comment

Using pretrained weights is possible. Most models in fairseq support using pretrained embeddings, e.g.: https://github.com/pytorch/fairseq/blob/0cb87130e7d843f68a25a44cb7443187a19b7320/fairseq/models/lstm.py#L91-L98

Another good example is the stories generation work, which loaded a pretrained encoder and decoder that were fixed during training: https://github.com/pytorch/fairseq/blob/0cb87130e7d843f68a25a44cb7443187a19b7320/fairseq/models/fconv_self_att.py#L84-L99

In your case you can load a checkpoint and overwrite the corresponding encoder-decoder weights.

All 3 comments

Using pretrained weights is possible. Most models in fairseq support using pretrained embeddings, e.g.: https://github.com/pytorch/fairseq/blob/0cb87130e7d843f68a25a44cb7443187a19b7320/fairseq/models/lstm.py#L91-L98

Another good example is the stories generation work, which loaded a pretrained encoder and decoder that were fixed during training: https://github.com/pytorch/fairseq/blob/0cb87130e7d843f68a25a44cb7443187a19b7320/fairseq/models/fconv_self_att.py#L84-L99

In your case you can load a checkpoint and overwrite the corresponding encoder-decoder weights.

Please re-open if you have issues, otherwise closing based on the examples Myle provided

I would like to reopen this issue.
It seems the issue at hand was different - how to load the model weights, not those of the embeddings? Currently, I am experiencing the same problem and would like to know if there is some existing solution.
To summarize, I would allow word embeddings to initialize in the usual (say, random) way, but I would like to reuse the other learned parameters in the model as a starting point for my training.

Was this page helpful?
0 / 5 - 0 ratings