Transformers: Get prediction_scores from BART forward method

Created on 4 Jul 2020  ยท  3Comments  ยท  Source: huggingface/transformers

โ“ Questions & Help

Details


I'm trynig to implement a model using finetuining BART for dialouge task. This is my sample code:

tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
inputs = tokenizer.encode(" Hello, my dog is cute", return_tensors="pt")
decoder_input = tokenizer.encode(" Oh! I don't know that you have dog! How is it?", return_tensors="pt")
output = model(input_ids=inputs,decoder_input_ids=decoder_input)[0]

I want to get prediction scores for tokens, so I expected that output would have shape [1, 17, 50265] (The length of the decoder_input string), but it has shape [1, 1, 50265].

How can I get the prediction_scores?

Most helpful comment

Great catch. I think you are spot on that the API changed a bit in 3.0. We should have documented it better.

If you pass use_cache=False to model() this problem goes away. (use_cache is set to true by default to speed up seq2seq tasks). You can also passuse_cache=Falsetofrom_pretrained`, as shown below:

tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large', use_cache=False)
inputs = tokenizer.encode(" Hello, my dog is cute", return_tensors="pt")
decoder_input = tokenizer.encode(" Oh! I don't know that you have dog! How is it?", return_tensors="pt")
output = model(input_ids=inputs,decoder_input_ids=decoder_input)[0]
assert output.shape[1] ==17  # passes

All 3 comments

I just noticed the same.

It seems that you have to include the labels parameter as well to get the predictions for your decoder_input_ids. You didn't have to do this in versions prior to 3.0, so it appears as if something has changed so that it's required now.

Maybe @sshleifer knows since I think he's been doing quite a bit of work on the summarization bits as of late.

Great catch. I think you are spot on that the API changed a bit in 3.0. We should have documented it better.

If you pass use_cache=False to model() this problem goes away. (use_cache is set to true by default to speed up seq2seq tasks). You can also passuse_cache=Falsetofrom_pretrained`, as shown below:

tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large', use_cache=False)
inputs = tokenizer.encode(" Hello, my dog is cute", return_tensors="pt")
decoder_input = tokenizer.encode(" Oh! I don't know that you have dog! How is it?", return_tensors="pt")
output = model(input_ids=inputs,decoder_input_ids=decoder_input)[0]
assert output.shape[1] ==17  # passes

Great catch. I think you are spot on that the API changed a bit in 3.0. We should have documented it better.

If you pass use_cache=False to model() this problem goes away. (use_cache is set to true by default to speed up seq2seq tasks). You can also passuse_cache=Falsetofrom_pretrained`, as shown below:

tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large', use_cache=False)
inputs = tokenizer.encode(" Hello, my dog is cute", return_tensors="pt")
decoder_input = tokenizer.encode(" Oh! I don't know that you have dog! How is it?", return_tensors="pt")
output = model(input_ids=inputs,decoder_input_ids=decoder_input)[0]
assert output.shape[1] ==17  # passes

Thanks a lot! I think it's better to publish a post which explains differences between version 2 and 3.0

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alphanlp picture alphanlp  ยท  3Comments

zhezhaoa picture zhezhaoa  ยท  3Comments

guanlongtianzi picture guanlongtianzi  ยท  3Comments

0x01h picture 0x01h  ยท  3Comments

adigoryl picture adigoryl  ยท  3Comments