Hi,
I know that T5ForConditionalGeneration is a T5Model with decoding. I've got a T5ForConditionalGeneration and that I've fine-tuned on a seq2seq task and now I want to use its T5 encoder to initialize the parameters in a T5Model( to further train it on some other task.) I read the code and I didn't understand what should I do. Can you please help me?
Hi @Palipoor, This might do the trick,
first_model = T5ForConditionalGeneration.from_pretrained('your_finetuned_model')
second_model = T5Model.from_pretrained('t5-small') # assuming 'your_finetuned_model' is t5-small
# get first model's encoder weights
first_model_encoder_state_dict = first_model.encoder.state_dict()
# load first model's encoder weights into second_model's encoder
second_model.encoder.load_state_dict(first_model_encoder_state_dict)
@patrickvonplaten can you take a look ?
@patil-suraj has the correct idea I think. You can even make it easier by just doing
t5_model_no_lm_head = T5.from_pretrained("<path_to_t5_for_cond_generation>") # this will load all weighs that are present in both models. So It will just skip the lm head weighs
You can verify that this works by doing the following:
t5_model_with_lm_head = T5ForConditionalGeneration.from_pretrained('t5-small')
t5_model_with_lm_head.save_pretrained("./")
t5_model_no_lm_head = T5Model.from_pretrained("./")
@patrickvonplaten @patil-suraj Thank you both for your resposnes!
Most helpful comment
Hi @Palipoor, This might do the trick,
@patrickvonplaten can you take a look ?