I got problems with out-of-memory issues when running experiments on SQUAD v1.1 (GPU config: RTX 2080 8GB). Thefore, I changed the batch_size=6, and the max_seq_length=256. Do you know the estimated time for training with 2.0 in num_train_epochs?
I got problems with out-of-memory issues when running experiments on SQUAD v1.1 (GPU config: RTX 2080 8GB). Thefore, I changed the batch_size=6, and the max_seq_length=256. Do you know the estimated time for training with 2.0 in num_train_epochs?
On RTX2070, with batch_size = 8, max_seq_length = 320, num_train_epochs = 2, it took around 2 hours 30 minutes to complete the training.
I got problems with out-of-memory issues when running experiments on SQUAD v1.1 (GPU config: RTX 2080 8GB). Thefore, I changed the batch_size=6, and the max_seq_length=256. Do you know the estimated time for training with 2.0 in num_train_epochs?
On RTX2070, with batch_size = 8, max_seq_length = 320, num_train_epochs = 2, it took around 2 hours 30 minutes to complete the training.
Thank you for the promptly reply
@bcbcbcbcbcl
Is batch_size=8 max_seq_length = 320 the best possible for 8GB? or can we go higher with the sequence length?
@bcbcbcbcbcl
Is batch_size=8 max_seq_length = 320 the best possible for 8GB? or can we go higher with the sequence length?
With these parameters, the GPU memory usage is already near to 8GB during training. You can try decrease the batch_size if you want to try with higher sequence length or vice versa.
HAve you anyone been able to get mirrored strategy to work?
On Mar 12, 2019, at 3:24 AM, Jason notifications@github.com wrote:
@bcbcbcbcbcl
Is batch_size=8 max_seq_length = 320 the best possible for 8GB? or can we go higher with the sequence length?With these parameters, the GPU memory usage is already near to 8GB during training. You can try decrease the batch_size if you want to try with higher sequence length or vice versa.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
Most helpful comment
On RTX2070, with batch_size = 8, max_seq_length = 320, num_train_epochs = 2, it took around 2 hours 30 minutes to complete the training.