Fairseq: How to reproduce fine tuning EN-RO with mbart?

Created on 24 Apr 2020 · 4Comments · Source: pytorch/fairseq

❓ Questions and Help

I followed the fine-tuning example described in here: https://github.com/pytorch/fairseq/blob/master/examples/mbart/README.md
However I didn't manage to reproduce the results described in the paper for EN-RO translation.

How to reproduce fine tuning with mbart?

Can you clarify where did you get the data and how did you preprocess it for training in more detail? Did you use anything else for preparaing data than:
https://github.com/pytorch/fairseq/blob/master/examples/mbart/README.md#bpe-data-1
https://github.com/pytorch/fairseq/blob/master/examples/mbart/README.md#preprocess-data
What setup did you use for training? Did you use 256 Nvidia V100 GPUs (32GB)? How would it be possible to reproduce this with 8 Nvidia V100 GPUs (16GB)? How would the learning rate and batch size need to be changed?

Code

Code from here: https://github.com/pytorch/fairseq/blob/master/examples/mbart/README.md
Modified training script only with these changes:
--memory-efficient-fp16
--max-sentences 8
--required-batch-size-multiple 8

What have you tried?

Trained with described setup and got BLEU of 2.3 only.

What's your environment?

fairseq Version (master - from commit 57526c63433c0b1c997fc91c0881867532567266):
PyTorch Version (1.4.0)
OS (Ubuntu 18.04):
How you installed fairseq (pip install --editable ., https://github.com/pytorch/fairseq/commit/57526c63433c0b1c997fc91c0881867532567266):
Python version: 3.6.6
CUDA/cuDNN version: 10.0/7
GPU models and configuration: aws 8xV100

documentation

Source

KasparPeterson

Most helpful comment

You should find #1758 useful (it’d be nice if they updated the documentation here).

mjpost on 24 Apr 2020

👍2

All 4 comments

CC @ngoyal2707

lematt1991 on 24 Apr 2020

You should find #1758 useful (it’d be nice if they updated the documentation here).

mjpost on 24 Apr 2020

👍2

Thanks @mjpost
I found that issue really useful and that's how I validated the fine-tuned model.
However the question remains, how to reproduce fine-tuning results for en-ro translation? (Meaning training the base model (mbart.CC25) on en-ro data)

KasparPeterson on 27 Apr 2020

cc @ngoyal2707, could you please synthesize the discussion in #1758 and update the README?