Bert: How long for training models with RTX 2080 8G

Created on 15 Jan 2019 · 5Comments · Source: google-research/bert

I got problems with out-of-memory issues when running experiments on SQUAD v1.1 (GPU config: RTX 2080 8GB). Thefore, I changed the batch_size=6, and the max_seq_length=256. Do you know the estimated time for training with 2.0 in num_train_epochs?

Source

ntson2002

Most helpful comment

I got problems with out-of-memory issues when running experiments on SQUAD v1.1 (GPU config: RTX 2080 8GB). Thefore, I changed the batch_size=6, and the max_seq_length=256. Do you know the estimated time for training with 2.0 in num_train_epochs?

On RTX2070, with batch_size = 8, max_seq_length = 320, num_train_epochs = 2, it took around 2 hours 30 minutes to complete the training.

bcbcbcbcbcl on 15 Jan 2019

👍5

All 5 comments

I got problems with out-of-memory issues when running experiments on SQUAD v1.1 (GPU config: RTX 2080 8GB). Thefore, I changed the batch_size=6, and the max_seq_length=256. Do you know the estimated time for training with 2.0 in num_train_epochs?

On RTX2070, with batch_size = 8, max_seq_length = 320, num_train_epochs = 2, it took around 2 hours 30 minutes to complete the training.

bcbcbcbcbcl on 15 Jan 2019

👍5

I got problems with out-of-memory issues when running experiments on SQUAD v1.1 (GPU config: RTX 2080 8GB). Thefore, I changed the batch_size=6, and the max_seq_length=256. Do you know the estimated time for training with 2.0 in num_train_epochs?

On RTX2070, with batch_size = 8, max_seq_length = 320, num_train_epochs = 2, it took around 2 hours 30 minutes to complete the training.

Thank you for the promptly reply

ntson2002 on 15 Jan 2019

@bcbcbcbcbcl
Is batch_size=8 max_seq_length = 320 the best possible for 8GB? or can we go higher with the sequence length?

Anton-Velikodnyy on 9 Mar 2019

@bcbcbcbcbcl
Is batch_size=8 max_seq_length = 320 the best possible for 8GB? or can we go higher with the sequence length?

With these parameters, the GPU memory usage is already near to 8GB during training. You can try decrease the batch_size if you want to try with higher sequence length or vice versa.

bcbcbcbcbcl on 12 Mar 2019

HAve you anyone been able to get mirrored strategy to work?

On Mar 12, 2019, at 3:24 AM, Jason notifications@github.com wrote:

@bcbcbcbcbcl
Is batch_size=8 max_seq_length = 320 the best possible for 8GB? or can we go higher with the sequence length?

With these parameters, the GPU memory usage is already near to 8GB during training. You can try decrease the batch_size if you want to try with higher sequence length or vice versa.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.