What is the minimum GPU spec for training the base model?
Obviously I realise it depends on the hyperparameters, but I have a 4GB GPU that I'm trying to train BERT-base on with the run_classifier example, and I'm hitting on out of memory problems. Even if I reduce down to seq_len = 200 and batch_size = 4 I hit on problems, and not much point going below that as the training will most likely collapse.
Evidently 4GB will not suffice and I'll need to upgrade. What are people using successfully and with what seq_len and batch_size?
Hey,
maybe this will help. With fp16 support I survived the OOM message, even with batch_size=32 (GTX1080 8GB).
https://github.com/thorjohnsen/bert/tree/gpu_optimizations
Thanks @AndreasFdev, I concluded there was no way I'd be able to do training with a 4GB GPU, so I managed to lay my hands on a second-hand Titan X with 12GB - working fine now.
@BigBadBurrow What batch size & float precision did you end up on Titan X (12GB)?
@AndreasFdev How do you implement the fp16 support? Use Apex?
I have 15g GPU my batch size is 2 and it always collapse
Most helpful comment
Thanks @AndreasFdev, I concluded there was no way I'd be able to do training with a 4GB GPU, so I managed to lay my hands on a second-hand Titan X with 12GB - working fine now.