May I know how much time BART pre-training took in which GPU configuration? I can see in the paper its written 500K steps with batch size 8k but I want to know the time it took. Many thanks.
The time can depend on the type and numbers of gpus. We trained for around 11-12 days on 256 gpus.
Most helpful comment
The time can depend on the type and numbers of gpus. We trained for around 11-12 days on 256 gpus.