I am using EncoderDecoderModel and I have tested the sample code of it which is written in its page.
from transformers import EncoderDecoderModel, BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = EncoderDecoderModel.from_encoder_decoder_pretrained(
'bert-base-uncased', 'bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)
output = model(input_ids=input_ids, decoder_input_ids=input_ids)[0]
but every time I run this code i will get different values for output! I also have used model.eval() but it also couldn't help.
Could you please provide the whole code you use? Your structure works ideally for me, this code outputs the same values:
from transformers import EncoderDecoderModel, BertTokenizer
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = EncoderDecoderModel.from_encoder_decoder_pretrained(
'bert-base-uncased', 'bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)
outputs = []
for _ in range(5):
result = model(input_ids=input_ids, decoder_input_ids=input_ids)[0]
outputs.append(result)
outputs
I see, on each step you initialize your EncoderDecoder model. AFAIU the difference is caused by a randomly initialized layers for decoder in this architecture. You can check it with this code:
params1, params2, models = [], [], []
for _ in range(2):
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = EncoderDecoderModel.from_encoder_decoder_pretrained(
'bert-base-uncased', 'bert-base-uncased')
models.append(model)
pars = models[0].decoder.bert.encoder.parameters()
for _ in range(1000):
try:
params1.append(next(pars))
except:
break
pars = models[1].decoder.bert.encoder.parameters()
for _ in range(1000):
try:
params2.append(next(pars))
except:
break
[torch.all(params1[i] == params2[i]).item() for i in range(len(params1))]
Thanks for answering @Aktsvigun ! Yes, in the encoder decoder framework, when you instantiate an encoder-decodel using two pretrained BERT models the cross attention layer weights are added and randomly initialized. This is an expected behavior. When you set your log level to INFO you will receive a notification about this as well :-)
Most helpful comment
I see, on each step you initialize your EncoderDecoder model. AFAIU the difference is caused by a randomly initialized layers for decoder in this architecture. You can check it with this code: