Fairseq: Multilingual Transformer with shared decoder

Created on 17 Nov 2018 · 9Comments · Source: pytorch/fairseq

Hi!

If we share decoder parameters in the multilingual transformer, we need to tell shared decoder in which language to decode.

It might be done by (embedding and) passing target language id directly to the decoder.

Alternatively, one might need to append this language tag to actual sentence (so that it will be e.g. first word in the sentence).

How is it done in fairseq's multilingual transformer?

Thank you,
Maksym

Source

maksym-del

Most helpful comment

@madaanpulkit sure, I have some draft example about how to train a many-to-many multilingual translation model, but I need to clean it up a bit. I will update the example page shortly.

pipibjc on 6 Jun 2019

❤2

All 9 comments

How do you use it for multiple target languages?
The example only covers mulitple sources and one target language.

madaanpulkit on 5 Jun 2019

We added --decoder-langtok supports in #620. You can specify --decoder-langtok in both training and inference. It feeds the target language token as the first token to decoder.

pipibjc on 6 Jun 2019

We added --decoder-langtok supports in #620. You can specify --decoder-langtok in both training and inference. It feeds the target language token as the first token to decoder.

@pipibjc can you please add an example of many-to-many multilingual translation case, right now the example only covers many-to-one scenario.

madaanpulkit on 6 Jun 2019

@madaanpulkit sure, I have some draft example about how to train a many-to-many multilingual translation model, but I need to clean it up a bit. I will update the example page shortly.

pipibjc on 6 Jun 2019

❤2

A draft would work for the time being (pulkit.[email protected]). Thanks for the quick replies.

madaanpulkit on 6 Jun 2019

Here is an example that uses the binarized data from the multilingual example. Here I just demonstrate how to specify the command line correctly without tuning the hyper-parameters:

Training:

CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt17.de_fr.en.bpe16k/   --max-epoch 50   --ddp-backend=no_c10d   --task multilingual_translation --arch multilingual_transformer_iwslt_de_en   --share-decoders --share-decoder-input-output-embed   --optimizer adam --adam-betas '(0.9, 0.98)'   --lr 0.0005 --lr-scheduler inverse_sqrt --min-lr '1e-09'   --warmup-updates 4000 --warmup-init-lr '1e-07'   --label-smoothing 0.1 --criterion label_smoothed_cross_entropy   --dropout 0.3 --weight-decay 0.0001   --save-dir checkpoints/multilingual_transformer   --max-tokens 4000   --update-freq 8     --max-update 20 --log-format json --lang-pairs de-en,fr-en,en-fr,en-de --encoder-langtok tgt

Inference:

CUDA_VISIBLE_DEVICES=0 python generate.py data-bin/iwslt17.de_fr.en.bpe16k/   --task multilingual_translation --path checkpoints/multilingual_transformer/checkpoint_best.pt --source-lang en --target-lang fr --gen-subset valid  --lang-pairs de-en,fr-en,en-fr,en-de --encoder-langtok tgt

pipibjc on 8 Jun 2019

Here is an example that uses the binarized data from the multilingual example. Here I just demonstrate how to specify the command line correctly without tuning the hyper-parameters:

Training:

CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt17.de_fr.en.bpe16k/   --max-epoch 50   --ddp-backend=no_c10d   --task multilingual_translation --arch multilingual_transformer_iwslt_de_en   --share-decoders --share-decoder-input-output-embed   --optimizer adam --adam-betas '(0.9, 0.98)'   --lr 0.0005 --lr-scheduler inverse_sqrt --min-lr '1e-09'   --warmup-updates 4000 --warmup-init-lr '1e-07'   --label-smoothing 0.1 --criterion label_smoothed_cross_entropy   --dropout 0.3 --weight-decay 0.0001   --save-dir checkpoints/multilingual_transformer   --max-tokens 4000   --update-freq 8     --max-update 20 --log-format json --lang-pairs de-en,fr-en,en-fr,en-de --encoder-langtok tgt

Inference:

CUDA_VISIBLE_DEVICES=0 python generate.py data-bin/iwslt17.de_fr.en.bpe16k/   --task multilingual_translation --path checkpoints/multilingual_transformer/checkpoint_best.pt --source-lang en --target-lang fr --gen-subset valid  --lang-pairs de-en,fr-en,en-fr,en-de --encoder-langtok tgt

@pipibjc thanks for the help.
Any particular reason behind using --encoder-langtok and not --decoder-langtok ?

madaanpulkit on 8 Jun 2019

I have experimented both --encoder-langtok tgt and --docoder-langtok on many-to-many case, but I didn't find any difference. I use --encoder-langtok tgt as example is just because the original paper suggested to do so.

pipibjc on 9 Jun 2019

❤1

I have experimented both --encoder-langtok tgt and --docoder-langtok on many-to-many case, but I didn't find any difference. I use --encoder-langtok tgt as example is just because the original paper suggested to do so.

I tried and --encoder-langtok tgt worked better for me.

madaanpulkit on 11 Jun 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings