Transformers: Bart.generate requires config.output_past=True

Created on 30 Mar 2020 · 13Comments · Source: huggingface/transformers

Is there a way to generate using pre-trained BART like one in
https://huggingface.co/blog/how-to-generate

I am currently using BART for a generation task but finetuning it
I was wondering if it's possible to see generation result from pre-trained BART

Source

tuhinjubcse

Most helpful comment

I think I might have found a potential issue with BartForConditionalGeneration. In zero-shot setup, the vanilla bart-large model produces gibberish, while the bart-large-cnn can generate fluent language. I think the problem is with the default setup on output_past attribute of BartConfig

Example:

from transformers import AutoTokenizer, BartForConditionalGeneration

model_name_or_path = 'bart-large'

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = BartForConditionalGeneration(model_name_or_path)

text = "Trump falsely denied that he claimed governors from certain states"
input_ids = tokenizer.batch_encode_plus([text], return_tensors='pt')['input_ids']
output = model.generate(input_ids=input_ids, max_length=50, num_beams=1)
print(tokenizer.decode(output[0]))

If model_name_or_path="bart-large", the result will be <s>Mr\'<s>Mr\'Mr"Mr""<s>Mr"Mr"\'Mr"<s>Mr"<s>Mr"<s>Mr"<s>Mr"Mr"<s>Mr"<s>Mr\'Mr"\'Mr"Mr"\'Mr"Mr.

If it is set to bart-large-cnn, the result will be </s><s><s><s>Trump falsely denied that he claimed governors from certain states. Trump falsely denied he claimed that he had been in contact with governors from some states. He also falsely denied saying he had met with governors of certain states in the past. Trump

But once I override the output_past flag in config, the result of bart-large will be normal:

config = BartConfig.from_pretrained('bart-large')
config.output_past = True
model = BartForConditionalGeneration(model_name_or_path, config=config)
...

Result would be: <s>MrThreatening to deport immigrants from certain states</s>

This seems to be related to autoregressive decoding where the decoder states need to be cached. Not sure if this is intended so that bart-large is always used as a masked language model, correct me if I'm wrong.

XinyuHua on 31 Mar 2020

❤3 👍3

All 13 comments

Bart is a encoder-decoder model. So it should be rather used as translating one sequence to another one. This means that the generation method expects input_ids and creates decoder_input_ids.

Maybe you can take a look at this: https://sshleifer.github.io/blog_v2/jupyter/2020/03/12/bart.html

patrickvonplaten on 30 Mar 2020

Example:

from transformers import AutoTokenizer, BartForConditionalGeneration

model_name_or_path = 'bart-large'

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = BartForConditionalGeneration(model_name_or_path)

text = "Trump falsely denied that he claimed governors from certain states"
input_ids = tokenizer.batch_encode_plus([text], return_tensors='pt')['input_ids']
output = model.generate(input_ids=input_ids, max_length=50, num_beams=1)
print(tokenizer.decode(output[0]))

If model_name_or_path="bart-large", the result will be <s>Mr\'<s>Mr\'Mr"Mr""<s>Mr"Mr"\'Mr"<s>Mr"<s>Mr"<s>Mr"<s>Mr"Mr"<s>Mr"<s>Mr\'Mr"\'Mr"Mr"\'Mr"Mr.

But once I override the output_past flag in config, the result of bart-large will be normal:

config = BartConfig.from_pretrained('bart-large')
config.output_past = True
model = BartForConditionalGeneration(model_name_or_path, config=config)
...

Result would be: <s>MrThreatening to deport immigrants from certain states</s>

XinyuHua on 31 Mar 2020

❤3 👍3

Thanks Xinyu . I owe you a drink :)

tuhinjubcse on 31 Mar 2020

I think I might have found a potential issue with BartForConditionalGeneration. In zero-shot setup, the vanilla bart-large model produces gibberish, while the bart-large-cnn can generate fluent language. I think the problem is with the default setup on output_past attribute of BartConfig

Example:
from transformers import AutoTokenizer, BartForConditionalGeneration

model_name_or_path = 'bart-large'

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = BartForConditionalGeneration(model_name_or_path)

text = "Trump falsely denied that he claimed governors from certain states"
input_ids = tokenizer.batch_encode_plus([text], return_tensors='pt')['input_ids']
output = model.generate(input_ids=input_ids, max_length=50, num_beams=1)
print(tokenizer.decode(output[0]))
If model_name_or_path="bart-large", the result will be <s>Mr\'<s>Mr\'Mr"Mr""<s>Mr"Mr"\'Mr"<s>Mr"<s>Mr"<s>Mr"<s>Mr"Mr"<s>Mr"<s>Mr\'Mr"\'Mr"Mr"\'Mr"Mr.

If it is set to bart-large-cnn, the result will be </s><s><s><s>Trump falsely denied that he claimed governors from certain states. Trump falsely denied he claimed that he had been in contact with governors from some states. He also falsely denied saying he had met with governors of certain states in the past. Trump

But once I override the output_past flag in config, the result of bart-large will be normal:
config = BartConfig.from_pretrained('bart-large')
config.output_past = True
model = BartForConditionalGeneration(model_name_or_path, config=config)
...
Result would be: <s>MrThreatening to deport immigrants from certain states</s>

This seems to be related to autoregressive decoding where the decoder states need to be cached. Not sure if this is intended so that bart-large is always used as a masked language model, correct me if I'm wrong.

@sshleifer - maybe you can answer this better than I can

patrickvonplaten on 31 Mar 2020

@patrickvonplaten

>>> model = BartForConditionalGeneration(model_name_or_path, config=c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__() got multiple values for argument 'config'

Getting this error. Also is there a way to force a generation to contain prefix tokens?
i know fairseq has this feature

tuhinjubcse on 31 Mar 2020

@tuhinjubcse

to pass a model name, you need to instantiate using from_pretrained. You can pass in configuration options as keyword arugments.

BartForConditionalGeneration.from_pretrained(model_name, **c.__dict__)

for prefix tokens, see the decoder_start_input_ids kwarg to generate

sshleifer on 31 Mar 2020

@XinyuHua you are correct!

sshleifer on 31 Mar 2020

Idk the results look pretty bad to me @sshleifer

from transformers import AutoTokenizer, BartForConditionalGeneration ,BartConfig
c = BartConfig.from_pretrained('bart-large')
c.output_past = True

model_name_or_path = 'bart-large'
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = BartForConditionalGeneration.from_pretrained(model_name_or_path, config=c)

text = "Milton scrunched his eyes and moodily turned back to his computer like a"
input_ids = tokenizer.batch_encode_plus([text], return_tensors='pt')['input_ids']

input_ids = tokenizer.batch_encode_plus([text], return_tensors='pt')['input_ids']
output = model.generate(input_ids=input_ids,do_sample=True,max_length=50,top_k=5,temperature=0.7)
print(tokenizer.decode(output[0]))

The output I got is MrMilton

tuhinjubcse on 31 Mar 2020

I'm not super surprised, since 'bart-large' is not finetuned on a generative task.

sshleifer on 31 Mar 2020

@sshleifer do you suggest using a different checkpoint or model
The reason I am asking is I am fine tuning on a novel dataset created for a task
But I need to have a baseline where I wanted to see how BART pretrained does , coz based on GPT2 it seems it does decently on generative tasks

tuhinjubcse on 31 Mar 2020

I think it depends on the task, but I haven't tried using bart for the "text continuation" type workflow. CTRL, GPT2, T5 could work better.

sshleifer on 31 Mar 2020

@sshleifer Let me be a bit clear
I wanted to do something like

text_input = “Milton scrunched his eyes and moodily turned back to his computer helpless”
text_output = “Milton scrunched his eyes and moodily turned back to his computer like a”

I want my output to contain text_output as a prefix

Normally when I was fine-tuning BART where I had paired data

Milton scrunched his eyes and moodily turned back to his computer helpless----->Milton scrunched his eyes and moodily turned back to his computer like a despondent child

The generation result was
Milton scrunched his eyes and moodily turned back to his computer like a child caught in the headlights

I want to be able to get some results without fine-tuning and just using pretrained BART to compare. How do I do that?

tuhinjubcse on 31 Mar 2020

The short answer is I don't know, we don't have that use case supported with Bart.

For now I am going to close this, but feel free to open a discussion issue about your task.

sshleifer on 31 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

BERT output not deterministic

yspaik · 3Comments

ValueError while using --optimize_on_cpu

rsanjaykamath · 3Comments

GPT2 tokenizer is so slow because of sum()

iedmrc · 3Comments

Limit on the input text length?

lcswillems · 3Comments

if crf needed when do ner?

alphanlp · 3Comments