Fairseq: [mBART/BART] finetuned model is much bigger than pretrained model.

Created on 7 Jul 2020  路  2Comments  路  Source: pytorch/fairseq

馃悰 Bug

Hi,

I'm using mBART from fairseq and the file size of finetuned mBART is way bigger than I expected. Does anyone know why finetuned mBART is a lot bigger than pretrained mBART?

Model | Description | # params | Download
---|---|---|---
mbart.CC25 | mBART model with 12 encoder and decoder layers trained on 25 languages' monolingual corpus | 610M | mbart.CC25.tar.gz

pre-trained mBART size: 5.7G
finetuned mBART size: 8.8G

question

Most helpful comment

Checkpoint files usually include the optimizer state, which for Adam is 2x the number of model parameters, thus the larger file sizes. We usually strip these from the released models since the optimizer state is only needed if you鈥檙e going to continue pretraining. You can set the flag --no-save-optimizer-state during finetuning, but if you do this then you won鈥檛 be able to resume finetuning from these checkpoints later.

All 2 comments

I'm observing similar issue with BART-large finetuned on CNN/DM dataset.

My own fine-tuned checkpoint is bigger than the checkpoint provided by the authors.
Also inference time seems to be slower when using my own fine-tuned checkpoint than when using the checkpoint provided by the authors.

Any idea where it comes from ? And how to fix ?

@ngoyal2707 @yinhanliu

Checkpoint files usually include the optimizer state, which for Adam is 2x the number of model parameters, thus the larger file sizes. We usually strip these from the released models since the optimizer state is only needed if you鈥檙e going to continue pretraining. You can set the flag --no-save-optimizer-state during finetuning, but if you do this then you won鈥檛 be able to resume finetuning from these checkpoints later.

Was this page helpful?
0 / 5 - 0 ratings