Model I am using (Bert, XLNet ...): BART
Language I am using the model on (English, Chinese ...): English
The problem arises when using:
The tasks I am working on is:
Steps to reproduce the behavior:
from transformers import BartForMaskedLM, BartTokenizer
from transformers.configuration_bart import BartConfig
config = BartConfig(vocab_size=50264, output_past=True)
model = AutoModelWithLMHead.from_pretrained('bart-large-cnn', config=config)
tokenizer = AutoTokenizer.from_pretrained('bart-large-cnn')
ARTICLE_TO_SUMMARIZE = "My friends are <mask> but they eat too many carbs."
inputs = tokenizer.batch_encode_plus([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')
generated_ids = model.generate(inputs['input_ids'], attention_mask=inputs['attention_mask'], num_return_sequences=4)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in generated_ids])
I'd expect some sort of infilling to occur, but instead I see the error:
RuntimeError Traceback (most recent call last)
<ipython-input-13-bad65359ada6> in <module>
10 inputs = tokenizer.batch_encode_plus([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')
11
---> 12 generated_ids = model.generate(inputs['input_ids'], attention_mask=inputs['attention_mask'], num_return_sequences=4)
13 print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in generated_ids])
~/.local/lib/python3.6/site-packages/torch/autograd/grad_mode.py in decorate_no_grad(*args, **kwargs)
47 def decorate_no_grad(*args, **kwargs):
48 with self:
---> 49 return func(*args, **kwargs)
50 return decorate_no_grad
51
~/.local/lib/python3.6/site-packages/transformers/modeling_bart.py in generate(self, input_ids, attention_mask, max_length, num_beams, repetition_penalty, length_penalty, num_return_sequences, min_len, no_repeat_ngram_size)
1106 input_ids, decoder_cache, decoder_input_ids, attention_mask,
1107 )
-> 1108 outputs = self(**model_inputs)
1109 lprobs = F.log_softmax(outputs[0][:, -1, :], dim=-1)
1110
~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
~/.local/lib/python3.6/site-packages/transformers/modeling_bart.py in forward(self, input_ids, attention_mask, encoder_outputs, decoder_input_ids, decoder_attention_mask, decoder_cached_states, lm_labels, **unused)
932 encoder_outputs=encoder_outputs,
933 decoder_attention_mask=decoder_attention_mask,
--> 934 decoder_cached_states=decoder_cached_states,
935 )
936 lm_logits = self.lm_head.forward(outputs[0])
~/.local/lib/python3.6/site-packages/transformers/modeling_bart.py in forward(self, input_ids, attention_mask, decoder_input_ids, encoder_outputs, decoder_attention_mask, decoder_cached_states)
837 assert decoder_input_ids is not None
838 if encoder_outputs is None:
--> 839 encoder_outputs = self.encoder.forward(input_ids=input_ids, attention_mask=attention_mask)
840 assert isinstance(encoder_outputs, tuple)
841 # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
~/.local/lib/python3.6/site-packages/transformers/modeling_bart.py in forward(self, input_ids, attention_mask)
272 During training might not be of length n_layers because of layer dropout.
273 """
--> 274 inputs_embeds = self.embed_tokens(input_ids)
275 embed_pos = self.embed_positions(input_ids)
276 x = inputs_embeds + embed_pos
~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
~/.local/lib/python3.6/site-packages/torch/nn/modules/sparse.py in forward(self, input)
112 return F.embedding(
113 input, self.weight, self.padding_idx, self.max_norm,
--> 114 self.norm_type, self.scale_grad_by_freq, self.sparse)
115
116 def extra_repr(self):
~/.local/lib/python3.6/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1482 # remove once script supports set_grad_enabled
1483 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1484 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1485
1486
RuntimeError: index out of range: Tried to access index 50264 out of table with 50263 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418
Looks to me like the <mask> token ID (50264) is out of bounds?
transformers version: a088d75e510d5641808ccd72f5dca4df36d95b8eif you install from master it seems to work on 'bart-large'.
Seems like it's only an issue on 'bart-large-cnn'
tokenizer = BartTokenizer.from_pretrained('bart-large')
model = BartForMaskedLM.from_pretrained('bart-large',output_past=True)
ARTICLE_TO_SUMMARIZE = "My friends are <mask> but they eat too many carbs."
inputs = tokenizer.batch_encode_plus([ARTICLE_TO_SUMMARIZE], max_length=1024, return_tensors='pt')
generated_ids = model.generate(inputs['input_ids'], attention_mask=inputs['attention_mask'], num_return_sequences=4)
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in generated_ids])
output:
['My kids are good, but they eat too many carbs. My friends are good.', 'My kids are good, but they eat too many carbs. My friends are good.', 'My kids are good, but they eat too many carbs. My friends are good.', 'My kids are good, but they eat too many carbs. My friends are good.']
Bart-large-cnn doesn't have a mask_token_id, which is admittedly confusing.
this is how I would do mask filling
model = BartForMaskedLM.from_pretrained('bart-large')
tokenizer = AutoTokenizer.from_pretrained('bart-large')
ARTICLE_TO_SUMMARIZE = "My friends are <mask> but they eat too many carbs."
inputs = tokenizer.batch_encode_plus([ARTICLE_TO_SUMMARIZE], return_tensors='pt')
input_ids = inputs['input_ids']
#generated_ids = model(, attention_mask=inputs['attention_mask'])[0]
logits = model(input_ids)[0]
masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
probs = logits[0, masked_index].softmax(dim=0)
values, predictions = probs.topk(10)
tokenizer.decode(predictions).split()
# ['good', 'great', 'all', 'really', 'very', 'healthy', 'also', 'not', 'the', 'doing']
One liner courtesy of @julien-c
from transformers import pipeline
nlp = pipeline('fill-mask', 'bart-large')
nlp("My friends are <mask> but they eat too many carbs.")
Thanks @sshleifer, that will do the trick!
The following does work:
tokenizer = AutoTokenizer.from_pretrained('bart-large-cnn')
tokenizer.mask_token_id
>>> 50264
...which is a bit counterintuitive as it implies that <mask> _is_ available. It's also not clear from the docs that bart-large can be used successfully with BartForMaskedLM.
Most helpful comment
One liner courtesy of @julien-c