Transformers: ❓ [BART] Different embedding sizes between pre-trained / fine-tuned checkpoint

Created on 25 May 2020  ·  3Comments  ·  Source: huggingface/transformers

❓ Questions & Help

Running this code :

from transformers import BartModel

x = BartModel.from_pretrained('bart-large')
x2 = BartModel.from_pretrained('bart-large-cnn')
print(x.shared)
print(x2.shared)

Gives :

Embedding(50265, 1024, padding_idx=1)
Embedding(50264, 1024, padding_idx=1)


Why the vocabulary size is different ? Isn't it supposed to be the same ? Is it just from the original authors' checkpoint ?

@sshleifer

Most helpful comment

Good catch. There is no mask token in the second checkpoint. I believe that is the same as the authors' implementation.

Completely off topic: if you still have the xsum data you used I would love a copy. I'm sam [at] huggingface.co .

All 3 comments

Good catch. There is no mask token in the second checkpoint. I believe that is the same as the authors' implementation.

Completely off topic: if you still have the xsum data you used I would love a copy. I'm sam [at] huggingface.co .

Thanks for your fast answer !

Do you know why there is no mask token in the second checkpoint ? And if it has any impact on score ?

I have a hunch the there is no <mask> token because of fairseq's --find-unused-parameters clarg, but I'm not certain.

I would guess no impact on score because <mask> does not show up in the finetuning data.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

siddsach picture siddsach  ·  3Comments

HansBambel picture HansBambel  ·  3Comments

iedmrc picture iedmrc  ·  3Comments

lcswillems picture lcswillems  ·  3Comments

HanGuo97 picture HanGuo97  ·  3Comments