Fairseq: RoBERTa and 514

Created on 26 Sep 2019 · 4Comments · Source: pytorch/fairseq

Hi! We're scratching our heads with RoBERTa and the way it handles its inputs.

The following matrix is of size 514x768:

from fairseq.models.roberta import RobertaModel

model = RobertaModel.from_pretrained("../roberta.base")
print(model.model.decoder.sentence_encoder.embed_positions.weight.size())

# torch.Size([514, 768])

Why is it different from the maximum embedding size which is 512? Furthermore, we observe that the second column of this matrix is full of zeros:

print(model.model.decoder.sentence_encoder.embed_positions.weight[1, :])
# tensor([0., 0., 0., 0., 0., 0., 0., 0. ...

Why is that? Thank you.

Source

LysandreJik

Most helpful comment

Yeah the first vector is randomly initialized values that never get used. There's really no particular reason why padding_idx is 1 instead of 0, other than it's the first token added to the dictionary. We need to use the same padding_idx value for both embed_tokens and embed_positions

lematt1991 on 26 Sep 2019

👍2

All 4 comments

Yes padding_idx is usually equal to 1, so it should always be a vector of all zeros. The positional embeddings then start at padding_idx+1 (i.e. 2 - 514). Hope this clears it up!

lematt1991 on 26 Sep 2019

Ah okay, that makes sense. In this case, what is the first column, is it full of randomly initialized values? In that case, why not use padding_idx = 0? Thank you for your answer.

LysandreJik on 26 Sep 2019