Fairseq: Base-size pre-trained models

Created on 27 Jan 2020  ยท  5Comments  ยท  Source: pytorch/fairseq

โ“ Questions and Help

What is your question?

1) Does Bart offer base-size(6-layer encoder, 6-layer decoder, hidden size 768) pre-trained models? Since in the summarization task, the baseline BERTSUMABS is trained on bert-base(12-layer encoder, 6-layer decoder, both hidden size 768), have you ever compared base-size Bart with it?

2) Could you please offer a README file for XSum (similar with the CNN one)?

3) How much time does the XSum fine-tuning take with smaller GPUs (like 4 11GB GPUs)?

@myleott @yinhanliu @ngoyal2707

question

Most helpful comment

Will the Bart base-size(6-layer encoder, 6-layer decoder, hidden size 768) pre-trained models be released? I would like to play with them and it is hard for me to fine-tune the large model.

All 5 comments

  1. our base model is trained on wiki-bookcorpus only.
  2. will do
  3. we use 16 32gpus for 1 hour (30K steps). so in your case it is 8 hours.

@XinnuoXu Hi, Have you evaluated the bart.large.cnn model? Did you get the same R-2 score on CNN/DM datase as published? I used pre-trained model to fine-tune CNN/DM training. But the ROUGE-2 is 19.19 (R-2 in published paper is 21.28).
Thank you very much!

@YizhuLiu you need to use the right max-len, min-len, Len-penalty and beam size values.

@yinhanliu Thank you for your reply. We set these values as shown in "Evaluating the bart.large.cnn model": beam=4, lenpen=2.0, max_len_b=140, min_len=55. With this setting, the R-2 score is 20.03. Are they right? If not, how can I get the same R-2 score on CNN/DM as published?

Will the Bart base-size(6-layer encoder, 6-layer decoder, hidden size 768) pre-trained models be released? I would like to play with them and it is hard for me to fine-tune the large model.

Was this page helpful?
0 / 5 - 0 ratings