Bert: Model become 3 times larger after finetune?

Created on 6 Nov 2018 · 4Comments · Source: google-research/bert

A pretrained bert large model's ckpt file is about 1.3GB, after finetuning on downstream task, the saved ckpt file become 3.8GB. How did this happen?

Source

wangwei7175878

Most helpful comment

The distributed checkpoints only include the actual model weights, but the checkpoints written during training include the Adam momentum and variance variables for each weight variable, which are not actually part of the model are needed to be able to pause and resume training in the middle. So the training checkpoints are 3x the size of the distributed checkpoint.

jacobdevlin-google on 6 Nov 2018

👍9

All 4 comments

I have the same problem with BERT base which becomes ~1.3 GB.

artemisart on 6 Nov 2018

jacobdevlin-google on 6 Nov 2018

👍9

The distributed checkpoints only include the actual model weights, but the checkpoints written during training include the Adam momentum and variance variables for each weight variable, which are not actually part of the model are needed to be able to pause and resume training in the middle. So the training checkpoints are 3x the size of the distributed checkpoint.

Thank you for your advice. Could you tell me how to only save model weights (not include momentum and variance), just like the pretreated model you provide?

zhezhaoa on 7 Dec 2018

The distributed checkpoints only include the actual model weights, but the checkpoints written during training include the Adam momentum and variance variables for each weight variable, which are not actually part of the model are needed to be able to pause and resume training in the middle. So the training checkpoints are 3x the size of the distributed checkpoint.

Thank you for your advice. Could you tell me how to only save model weights (not include momentum and variance), just like the pretreated model you provide?

@zhezhaoa I have a solution here: https://github.com/google-research/bert/issues/99
I guess there must be some better and tidier solutions, but at least this one works for me, and the size of the weight file drops from 1.3GB to 400MB.