Transformers: BERT with sequence pairs & padding

Created on 9 Aug 2019 · 1Comment · Source: huggingface/transformers

❓ Questions & Help

I am having trouble understanding how to setup BERT when doing a classification task like STS, for example, inputting two sentences and getting a classification of some sorts. I am using BertForSequenceClassification for this purpose. However, what boggles me is how to set up attention_mask and token_type_ids when using padding.

Let's assume two sentences: I made a discovery. and I discovered something.

Currently, I'll prepare the input as follows (assume padding).

Input IDs (encoded): [CLS] I made a discovery. [SEP] I discovered something. [SEP] [PAD] [PAD] [PAD]
token_type_ids: everything 0 by the first [SEP] (also included), after which everything will be marked as 1 (padding included).
attention_mask: 1 for everything but the padding.

And, of course, labels are trivial as they are not affected by padding. Anything wrong with my setup? Am I missing anything?

Source