Hello,
I'm trying to fine-tune RoBERTa for a sentence-pair classification task. With Bert, I used the token_type_ids to identify sentence A and B. But it seems that the Roberta "token_type" Embedding is configured with a dictionnairy of size 1 from what I understand of the model summary : (token_type_embeddings): Embedding(1, 768).
So, does RoBERTa needs token_type_ids ? If not, why there is an Embedding layer for token_type_ids ?
The documentation of the RobertaModel class omit to talk about the token_type_ids present among the parameter : modeling_roberta.py.
Thank you in advance.
RoBERTa does not use token_type_ids. We made a choice to still have an embedding layer (which is all zeros, so they don't contribute anything additively) so that we use the exact same implementation as BERT.
Understood, thanks for the quick answer ! :)
Most helpful comment
RoBERTa does not use
token_type_ids. We made a choice to still have an embedding layer (which is all zeros, so they don't contribute anything additively) so that we use the exact same implementation as BERT.