Fairseq: About how to finetune RoBERTa on SQuAD

Created on 31 Jul 2019 · 8Comments · Source: pytorch/fairseq

Thanks for the RoBERTa.
I want to fine tune the model on SQuAD dataset, but I couldn't find the detail script and some codes for that. Do you will give something about that?

enhancement

Source

kugwzk

Most helpful comment

I followed the hyper parameters in the roBERTa paper, using fairseq to train roBERTa on SQuAD task.

Input representation: <s> Passage here. </s> Q: Question here? </s>

Effective Batch Size:   48 

Batch Size per GPU:     3 
No. of GPU:             8 
Update Freq.:           2

Max Epochs:             2
Total No. of Updates:   5430
Warmup ratio:           0.06
Warmup Updates:         326

Learning Rate:          1.5e-5 
Weight Decay:           0.01 
Learning Rate Decay:    Linear
Adam eps:               1e-6

I got the following evaluation:

{
  "best_exact": 85.21014065526826,
  "best_exact_thresh": -1.6142578125,
  "best_f1": 88.29940280085873,
  "best_f1_thresh": -1.572265625
}

Some distance from 89.4 in the paper :(

Did I miss something? Thank you for the great work!

Link to the repo in case anyone is interested before the official code is available and let's learn pytorch and fairseq! (my code is a bit of hacky though as I am new here)
https://github.com/ecchochan/roberta-squad

ecchochan on 9 Sep 2019

👍7 🎉4

All 8 comments

Yes, we will provide scripts and a README for SQuAD finetuning soon.

myleott on 1 Aug 2019

any update?

Damcy on 28 Aug 2019

Sorry, we still need to refactor and cleanup the code. We will release it, but in the meantime you can try using the SQuAD finetuning for RoBERTa in pytorch-transformers.

myleott on 29 Aug 2019

I followed the hyper parameters in the roBERTa paper, using fairseq to train roBERTa on SQuAD task.

Input representation: <s> Passage here. </s> Q: Question here? </s>

Effective Batch Size:   48 

Batch Size per GPU:     3 
No. of GPU:             8 
Update Freq.:           2

Max Epochs:             2
Total No. of Updates:   5430
Warmup ratio:           0.06
Warmup Updates:         326

Learning Rate:          1.5e-5 
Weight Decay:           0.01 
Learning Rate Decay:    Linear
Adam eps:               1e-6

I got the following evaluation:

{
  "best_exact": 85.21014065526826,
  "best_exact_thresh": -1.6142578125,
  "best_f1": 88.29940280085873,
  "best_f1_thresh": -1.572265625
}

Some distance from 89.4 in the paper :(

Did I miss something? Thank you for the great work!

ecchochan on 9 Sep 2019

👍7 🎉4

Hi! Thanks for all the RoBERTa usage examples. I wonder if we can expect SQuAD finetuning example anytime soon?
I've been experimenting with adding different intermediate finetuning tasks to RoBERTa and I'm seeing some interesting results on GLUE. It would be great if I could test my approach on SQuAD as well.
By the way, GLUE finetuning example is awesome!

OlegPlatonov on 3 Nov 2019

@yinhanliu

myleott on 17 Dec 2019

Any update?

motefly on 7 May 2020

This is one of the most important benchmark for question answering. The reproduction of the SOTA number is very useful to help the community to further improve accuracy. Would be appreciated if you can make the fine-tuning script available soon. Thanks!