Transformers: BartTokenizer prepare_seq2seq_batch() does not return decoder_input_ids, decoder_attention_mask as per document after passing tgt_texts

Created on 16 Oct 2020  路  5Comments  路  Source: huggingface/transformers

I am trying to train a seq2seq model using BartModel. As per BartTokenizer documentation if I pass tgt_texts then it should return decoder_attention_mask and decoder_input_ids please check the attachment for clarity.
image
But I am only getting input_ids, attention_mask and labels.
image

Most helpful comment

@MojammelHossain is correct, the docs are wrong.
The correct usage is to allow _prepare_bart_decoder_inputs to make decoder_input_ids and decoder_attention_mask for you. For training, you only need to pass the 3 keys returned by prepare_seq2seq_batch.

All 5 comments

I am facing the same issue and I noticed that the method indeed returns the ["input_ids"] of tgt_texts as labels. I think I could easily fix this to return both input_ids and attention_mask of tgt_texts (as decoder_...) but I noticed the same pattern in other seq2seq models, like T5. I am not sure what's the proper solution but if it is similar to what I suggest, than I'd be happy to make a pull request.

@LysandreJik I'd be happy to hear an opinion and start working on this.

I think https://github.com/huggingface/transformers/pull/6654/ and https://github.com/huggingface/transformers/issues/6624 are related - the PR changed decoder_input_ids to labels. Probably the documentation should be changed but I have to get more familiar with the respective issue and PR to be sure.

Thanks for the feedback @freespirit. Hopefully, they will update the documentation as it is a little bit confusing. But what I found that the modeling_bart.py file already handles the problem. _prepare_bart_decoder_inputs() and shift_tokens_right() solving that if I am not wrong. But I think I have to go deeper for understanding which I am trying to.
image
image

Pinging @sshleifer for advice

@MojammelHossain is correct, the docs are wrong.
The correct usage is to allow _prepare_bart_decoder_inputs to make decoder_input_ids and decoder_attention_mask for you. For training, you only need to pass the 3 keys returned by prepare_seq2seq_batch.

Was this page helpful?
0 / 5 - 0 ratings