Bert: Recommended GPU size when training BERT-base

Created on 14 May 2019  路  5Comments  路  Source: google-research/bert

What is the minimum GPU spec for training the base model?

Obviously I realise it depends on the hyperparameters, but I have a 4GB GPU that I'm trying to train BERT-base on with the run_classifier example, and I'm hitting on out of memory problems. Even if I reduce down to seq_len = 200 and batch_size = 4 I hit on problems, and not much point going below that as the training will most likely collapse.

Evidently 4GB will not suffice and I'll need to upgrade. What are people using successfully and with what seq_len and batch_size?

Most helpful comment

Thanks @AndreasFdev, I concluded there was no way I'd be able to do training with a 4GB GPU, so I managed to lay my hands on a second-hand Titan X with 12GB - working fine now.

All 5 comments

Hey,
maybe this will help. With fp16 support I survived the OOM message, even with batch_size=32 (GTX1080 8GB).
https://github.com/thorjohnsen/bert/tree/gpu_optimizations

Thanks @AndreasFdev, I concluded there was no way I'd be able to do training with a 4GB GPU, so I managed to lay my hands on a second-hand Titan X with 12GB - working fine now.

@BigBadBurrow What batch size & float precision did you end up on Titan X (12GB)?

@AndreasFdev How do you implement the fp16 support? Use Apex?

I have 15g GPU my batch size is 2 and it always collapse

Was this page helpful?
0 / 5 - 0 ratings