Fairseq: Is bidirectional translation possible in single training?

Created on 30 Apr 2020  ·  10Comments  ·  Source: pytorch/fairseq

❓ Bidirectional translation in single training

Before asking:

  1. search the issues. >> No such related issue as far as I have seen.
  2. search the docs. >> Could not find any relevant parameters.

What is your question?

Is it possible to train a bidirectional translation system in one training? Please note that I am not talking about multilingual systems. I am talking about bidirectional, for example, en-fr and fr-en.

Code

Not relevant.

What have you tried?

One option is of course to train 2 different models using the same corpora as src-tgt and tgt-src. However, I want to perform training once if possible. I have trained this kind of bidirectional system in OpenNMT-py before using source and target tokens. I tried the same approach with fairseq and it gave me bleu scores as well. However, as fairseq does not save the final generated/predicted text file, it is not possible for me to remove the language tokens from the text and it affects the final bleu score.

What's your environment?

Not relevant.

question

All 10 comments

That’s supported using the multilingual translation code. You can train both directions simultaneously and share the encoder and decoder parameters.

Here’s an example: https://github.com/pytorch/fairseq/tree/master/examples/translation#multilingual-translation

@myleott ... Thanks for the quick reply. I see, so does it mean that after preprocessing (binarizing) the data, I can use --lang-pairs en-fr,fr-en to make it bidirectional?

Yep, exactly. Please let us know how it goes, and open a new issue if you run into any problems!

The relevant options for sharing the encoder and decoder params are here: https://github.com/pytorch/fairseq/blob/master/fairseq/models/multilingual_transformer.py#L33

Thanks a lot. I will try it out and let you know here if it works.

@myleott Hello again. I just wanted to confirm if the steps I am following are correct because this is the first time I am using multilingual training in fairseq. Here are the commands I am currently using.

Preprocessing:
CUDA_VISIBLE_DEVICES=3 python3 ~/fairseq/preprocess.py -s de -t en --joined-dictionary --trainpref /final_data/train --validpref /final_data/valid --testpref /final_data/test --destdir /final_data --dataset-impl raw --workers 50 2>&1 | tee /logs/fairseq_preprocess.log

Training:
CUDA_VISIBLE_DEVICES=3 python3 ~/fairseq/train.py /final_data --task multilingual_translation --lang-pairs de-en,en-de --arch multilingual_transformer --share-decoders --share-decoder-input-output-embed --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 --dropout 0.3 --weight-decay 0.0001 --criterion label_smoothed_cross_entropy --label-smoothing 0.1 --max-tokens 4096 --dataset-impl raw --num-workers 50 --max-epoch 50 --save-dir /models 2>&1 | tee /logs/fairseq_train.log

Questions:

  1. The --eval-bleu does not work with the multilingual training. As a result, I am unable to check the validation bleu scores during training. I am sure I will be able to calculate bleu during generation on the test set, I am not sure if it is expected or maybe I am wrong because I can use --eval-bleu when I am not using multilingual translation.
  2. As you said, I am using --share-decoders and --share-decoder-input-output-embed parameters. I am not using --share-encoders. Is this enough for the bidirectional system?
  3. In the example here, there are 2 preprocessing steps for de-en and fr-en. I have done preprocessing for de-en direction only and used --lang-pairs de-en,en-de in training. Is this enough for the bidirectional system or do I need to perform preprocessing twice like the example?
  4. Also, do we need to use sacrebleu during the evaluation? Or can I just use generate with --remove-bpe? I am sure how to do the bidirectional evaluation with generate command.

1) Correct, --eval-bleu isn't implemented for the multilingual translation task, so you'll need to calculate BLEU yourself using fairseq-generate (generate.py)
2) It's up to you. If you only share the decoders, then you'll have two encoder models (one for each language) and a single shared decoder model
3) Yes, that's fine, since you're reusing the same dataset
4) You don't have to use sacrebleu, that's just what the example uses. The important options are those passed to fairseq-interactive (or fairseq-generate if you prefer). Namely, you'll need to add these flags to your generate command --task multilingual_translation --lang-pairs de-en,en-de --source-lang en --target-lang de.

@myleott Thanks a lot for all your help. It worked out well and I got the bleu scores (without sacrebleu) for both directions with much less overall code.

Just a note, using --share-encoders and --share-decoders together gave me better bleu scores (at least +1).

Hello @myleott @pipibjc @SouravDutta91,
I want to extend the bidirectional translation for more than two languages, i.e de-en,fr-en,en-de and en-fr. Now, if I follow the preprocessing based on this multilingual example, I have to to fix the target side dict.

But this case has more than one target language. So is it possible to binarize in a way to enable encoder and decoder sharing?

@myleott Thanks a lot for all your help. It worked out well and I got the bleu scores (without sacrebleu) for both directions with much less overall code.

Just a note, using --share-encoders and --share-decoders together gave me better bleu scores (at least +1).

Hello @SouravDutta91, did you use --share-decoder-input-output-embed when using --share-encoders?

Was this page helpful?
0 / 5 - 0 ratings