I am having trouble understanding how to setup BERT when doing a classification task like STS, for example, inputting two sentences and getting a classification of some sorts. I am using BertForSequenceClassification for this purpose. However, what boggles me is how to set up attention_mask and token_type_ids when using padding.
Let's assume two sentences: I made a discovery. and I discovered something.
Currently, I'll prepare the input as follows (assume padding).
[CLS] I made a discovery. [SEP] I discovered something. [SEP] [PAD] [PAD] [PAD]token_type_ids: everything 0 by the first [SEP] (also included), after which everything will be marked as 1 (padding included).attention_mask: 1 for everything but the padding.And, of course, labels are trivial as they are not affected by padding. Anything wrong with my setup? Am I missing anything?
Hi! Yes, I think your understanding is correct. Your setup seems fine to me!
Most helpful comment
Hi! Yes, I think your understanding is correct. Your setup seems fine to me!