Transformers: BERT with sequence pairs & padding

Created on 9 Aug 2019  ยท  1Comment  ยท  Source: huggingface/transformers

โ“ Questions & Help

I am having trouble understanding how to setup BERT when doing a classification task like STS, for example, inputting two sentences and getting a classification of some sorts. I am using BertForSequenceClassification for this purpose. However, what boggles me is how to set up attention_mask and token_type_ids when using padding.

Let's assume two sentences: I made a discovery. and I discovered something.

Currently, I'll prepare the input as follows (assume padding).

  1. Input IDs (encoded): [CLS] I made a discovery. [SEP] I discovered something. [SEP] [PAD] [PAD] [PAD]
  2. token_type_ids: everything 0 by the first [SEP] (also included), after which everything will be marked as 1 (padding included).
  3. attention_mask: 1 for everything but the padding.

And, of course, labels are trivial as they are not affected by padding. Anything wrong with my setup? Am I missing anything?

Most helpful comment

Hi! Yes, I think your understanding is correct. Your setup seems fine to me!

>All comments

Hi! Yes, I think your understanding is correct. Your setup seems fine to me!

Was this page helpful?
0 / 5 - 0 ratings