Transformers: Load a T5ForConditionalGeneration's encoder into a T5Model

Created on 21 Jun 2020  路  3Comments  路  Source: huggingface/transformers

Hi,
I know that T5ForConditionalGeneration is a T5Model with decoding. I've got a T5ForConditionalGeneration and that I've fine-tuned on a seq2seq task and now I want to use its T5 encoder to initialize the parameters in a T5Model( to further train it on some other task.) I read the code and I didn't understand what should I do. Can you please help me?

Most helpful comment

Hi @Palipoor, This might do the trick,

first_model = T5ForConditionalGeneration.from_pretrained('your_finetuned_model')
second_model = T5Model.from_pretrained('t5-small') # assuming 'your_finetuned_model' is t5-small

# get first model's encoder weights
first_model_encoder_state_dict = first_model.encoder.state_dict()

# load first model's encoder weights into second_model's encoder
second_model.encoder.load_state_dict(first_model_encoder_state_dict)

@patrickvonplaten can you take a look ?

All 3 comments

Hi @Palipoor, This might do the trick,

first_model = T5ForConditionalGeneration.from_pretrained('your_finetuned_model')
second_model = T5Model.from_pretrained('t5-small') # assuming 'your_finetuned_model' is t5-small

# get first model's encoder weights
first_model_encoder_state_dict = first_model.encoder.state_dict()

# load first model's encoder weights into second_model's encoder
second_model.encoder.load_state_dict(first_model_encoder_state_dict)

@patrickvonplaten can you take a look ?

@patil-suraj has the correct idea I think. You can even make it easier by just doing

t5_model_no_lm_head = T5.from_pretrained("<path_to_t5_for_cond_generation>")  # this will load all weighs that are present in both models. So It will just skip the lm head weighs

You can verify that this works by doing the following:

t5_model_with_lm_head = T5ForConditionalGeneration.from_pretrained('t5-small')
t5_model_with_lm_head.save_pretrained("./")
t5_model_no_lm_head = T5Model.from_pretrained("./")

@patrickvonplaten @patil-suraj Thank you both for your resposnes!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

HansBambel picture HansBambel  路  3Comments

siddsach picture siddsach  路  3Comments

hsajjad picture hsajjad  路  3Comments

iedmrc picture iedmrc  路  3Comments

fyubang picture fyubang  路  3Comments