Fairseq: XLM-RoBERTa example for XNLI

Created on 11 Nov 2019 · 6Comments · Source: pytorch/fairseq

Could you please provide an example of XNLI tasks for XLM-RoBERTa?
Current example (https://github.com/pytorch/fairseq/tree/master/examples/xlmr) is quite simple and it is for single sentence.
Thanks a lot!

Source

ericwtlin

👍3

Most helpful comment

@tomking1988 I'm guessing you're talking about the zero-shot setting here. Following is the set up we used for the numbers published in the paper:

Batch Size / GPU = 16 on 8 GPUs (Effective BS = 128)
Adam with a LR of 0.000005
We run validation after each epoch - where the epoch consists of 5K batches with data randomly sampled from the training set - and select the checkpoint with the best validation set result. This is quite important.
We run training for 30 epochs with early stopping (stop if the validation accuracy has not improved for 5 epochs) where epoch is defined as above.

kartikayk on 19 Nov 2019

👍3 ❤1

All 6 comments

Hey, we will release XNLI fine-tuning instructions soon.

ngoyal2707 on 11 Nov 2019

👍1

thanks! Looking forward to it.

ericwtlin on 12 Nov 2019

Hey, we will release XNLI fine-tuning instructions soon.

I am using the same format as BERT. My result is 0.828 for En, 0.732 for Zh. Using XLMR-base, 4 epoch, learning rate 2e-5, batch size 16. Could you please give the hyperparms for reproduce the results published in the paper?

tomking1988 on 19 Nov 2019

@kartikayk Can you please share above details?

ngoyal2707 on 19 Nov 2019

@tomking1988 I'm guessing you're talking about the zero-shot setting here. Following is the set up we used for the numbers published in the paper:

Batch Size / GPU = 16 on 8 GPUs (Effective BS = 128)
Adam with a LR of 0.000005
We run validation after each epoch - where the epoch consists of 5K batches with data randomly sampled from the training set - and select the checkpoint with the best validation set result. This is quite important.
We run training for 30 epochs with early stopping (stop if the validation accuracy has not improved for 5 epochs) where epoch is defined as above.

kartikayk on 19 Nov 2019

👍3 ❤1

closing after @kartikayk 's answer