Bert: How long for training models with RTX 2080 8G

Created on 15 Jan 2019  Â·  5Comments  Â·  Source: google-research/bert

I got problems with out-of-memory issues when running experiments on SQUAD v1.1 (GPU config: RTX 2080 8GB). Thefore, I changed the batch_size=6, and the max_seq_length=256. Do you know the estimated time for training with 2.0 in num_train_epochs?

Most helpful comment

I got problems with out-of-memory issues when running experiments on SQUAD v1.1 (GPU config: RTX 2080 8GB). Thefore, I changed the batch_size=6, and the max_seq_length=256. Do you know the estimated time for training with 2.0 in num_train_epochs?

On RTX2070, with batch_size = 8, max_seq_length = 320, num_train_epochs = 2, it took around 2 hours 30 minutes to complete the training.

All 5 comments

I got problems with out-of-memory issues when running experiments on SQUAD v1.1 (GPU config: RTX 2080 8GB). Thefore, I changed the batch_size=6, and the max_seq_length=256. Do you know the estimated time for training with 2.0 in num_train_epochs?

On RTX2070, with batch_size = 8, max_seq_length = 320, num_train_epochs = 2, it took around 2 hours 30 minutes to complete the training.

I got problems with out-of-memory issues when running experiments on SQUAD v1.1 (GPU config: RTX 2080 8GB). Thefore, I changed the batch_size=6, and the max_seq_length=256. Do you know the estimated time for training with 2.0 in num_train_epochs?

On RTX2070, with batch_size = 8, max_seq_length = 320, num_train_epochs = 2, it took around 2 hours 30 minutes to complete the training.

Thank you for the promptly reply

@bcbcbcbcbcl
Is batch_size=8 max_seq_length = 320 the best possible for 8GB? or can we go higher with the sequence length?

@bcbcbcbcbcl
Is batch_size=8 max_seq_length = 320 the best possible for 8GB? or can we go higher with the sequence length?

With these parameters, the GPU memory usage is already near to 8GB during training. You can try decrease the batch_size if you want to try with higher sequence length or vice versa.

HAve you anyone been able to get mirrored strategy to work?

On Mar 12, 2019, at 3:24 AM, Jason notifications@github.com wrote:

@bcbcbcbcbcl
Is batch_size=8 max_seq_length = 320 the best possible for 8GB? or can we go higher with the sequence length?

With these parameters, the GPU memory usage is already near to 8GB during training. You can try decrease the batch_size if you want to try with higher sequence length or vice versa.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wangwei7175878 picture wangwei7175878  Â·  4Comments

dangal95 picture dangal95  Â·  3Comments

LucasLLC picture LucasLLC  Â·  3Comments

miyamonz picture miyamonz  Â·  3Comments

waallf picture waallf  Â·  4Comments