Transformers: Load a T5ForConditionalGeneration's encoder into a T5Model

Created on 21 Jun 2020 · 3Comments · Source: huggingface/transformers

Hi,
I know that T5ForConditionalGeneration is a T5Model with decoding. I've got a T5ForConditionalGeneration and that I've fine-tuned on a seq2seq task and now I want to use its T5 encoder to initialize the parameters in a T5Model( to further train it on some other task.) I read the code and I didn't understand what should I do. Can you please help me?

Source

Palipoor

👍2

Most helpful comment

Hi @Palipoor, This might do the trick,

first_model = T5ForConditionalGeneration.from_pretrained('your_finetuned_model')
second_model = T5Model.from_pretrained('t5-small') # assuming 'your_finetuned_model' is t5-small

# get first model's encoder weights
first_model_encoder_state_dict = first_model.encoder.state_dict()

# load first model's encoder weights into second_model's encoder
second_model.encoder.load_state_dict(first_model_encoder_state_dict)

@patrickvonplaten can you take a look ?

patil-suraj on 22 Jun 2020

👍2 🎉1

All 3 comments

Hi @Palipoor, This might do the trick,

first_model = T5ForConditionalGeneration.from_pretrained('your_finetuned_model')
second_model = T5Model.from_pretrained('t5-small') # assuming 'your_finetuned_model' is t5-small

# get first model's encoder weights
first_model_encoder_state_dict = first_model.encoder.state_dict()

# load first model's encoder weights into second_model's encoder
second_model.encoder.load_state_dict(first_model_encoder_state_dict)

@patrickvonplaten can you take a look ?

patil-suraj on 22 Jun 2020

👍2 🎉1

@patil-suraj has the correct idea I think. You can even make it easier by just doing

t5_model_no_lm_head = T5.from_pretrained("<path_to_t5_for_cond_generation>")  # this will load all weighs that are present in both models. So It will just skip the lm head weighs

You can verify that this works by doing the following:

t5_model_with_lm_head = T5ForConditionalGeneration.from_pretrained('t5-small')
t5_model_with_lm_head.save_pretrained("./")
t5_model_no_lm_head = T5Model.from_pretrained("./")