Fairseq: How to reproduce fine tuning EN-RO with mbart?

Created on 24 Apr 2020  ·  4Comments  ·  Source: pytorch/fairseq

❓ Questions and Help

I followed the fine-tuning example described in here: https://github.com/pytorch/fairseq/blob/master/examples/mbart/README.md
However I didn't manage to reproduce the results described in the paper for EN-RO translation.

How to reproduce fine tuning with mbart?

  1. Can you clarify where did you get the data and how did you preprocess it for training in more detail? Did you use anything else for preparaing data than:
  2. https://github.com/pytorch/fairseq/blob/master/examples/mbart/README.md#bpe-data-1
  3. https://github.com/pytorch/fairseq/blob/master/examples/mbart/README.md#preprocess-data
  4. What setup did you use for training? Did you use 256 Nvidia V100 GPUs (32GB)? How would it be possible to reproduce this with 8 Nvidia V100 GPUs (16GB)? How would the learning rate and batch size need to be changed?

Code

Code from here: https://github.com/pytorch/fairseq/blob/master/examples/mbart/README.md
Modified training script only with these changes:
--memory-efficient-fp16
--max-sentences 8
--required-batch-size-multiple 8

What have you tried?

Trained with described setup and got BLEU of 2.3 only.

What's your environment?

  • fairseq Version (master - from commit 57526c63433c0b1c997fc91c0881867532567266):
  • PyTorch Version (1.4.0)
  • OS (Ubuntu 18.04):
  • How you installed fairseq (pip install --editable ., https://github.com/pytorch/fairseq/commit/57526c63433c0b1c997fc91c0881867532567266):
  • Python version: 3.6.6
  • CUDA/cuDNN version: 10.0/7
  • GPU models and configuration: aws 8xV100
documentation

Most helpful comment

You should find #1758 useful (it’d be nice if they updated the documentation here).

All 4 comments

CC @ngoyal2707

You should find #1758 useful (it’d be nice if they updated the documentation here).

Thanks @mjpost
I found that issue really useful and that's how I validated the fine-tuned model.
However the question remains, how to reproduce fine-tuning results for en-ro translation? (Meaning training the base model (mbart.CC25) on en-ro data)

cc @ngoyal2707, could you please synthesize the discussion in #1758 and update the README?

Was this page helpful?
0 / 5 - 0 ratings