Fairseq: Error with mutilingual model and fairseq-generate

Created on 19 Nov 2019 · 5Comments · Source: pytorch/fairseq

Hi,
I got an error when generating with a trained multilingual model. I hope you can help me understand what went wrong and how to fix it. Some context: I'm basically trying to use the multilingual architecture as a multitask model to combine different datasets for a monolingual task (each task is a "language" pair).

The command used for training a many-to-one model (i.e. shared decoder) is:

CUDA_VISIBLE_DEVICES=1,2,3 fairseq-train "${data_dir}/bin" \
  --ddp-backend=no_c10d \
  --task multilingual_translation --lang-pairs orig-simp,complex-simp,long-simp \
  --arch multilingual_transformer \
  --share-decoders --share-decoder-input-output-embed \
  --encoder-embed-path "${glove}" --encoder-embed-dim 300 --encoder-ffn-embed-dim 300 \
  --decoder-embed-path "${glove}" --decoder-embed-dim 300 --decoder-ffn-embed-dim 300 \
  --encoder-attention-heads 5 --decoder-attention-heads 5 \
  --encoder-layers 4 --decoder-layers 4 \
  --optimizer adam --adam-betas '(0.9, 0.98)' \
  --lr 0.0005 --lr-scheduler inverse_sqrt --min-lr '1e-09' \
  --label-smoothing 0.1 --dropout 0.3 --weight-decay 0.0001 \
  --criterion label_smoothed_cross_entropy --max-update 10000 \
  --warmup-updates 4000 --warmup-init-lr '1e-07' \
  --max-tokens 4000 --update-freq 4 \
  --save-dir "${model_dir}" --tensorboard-logdir "${log_dir}" \

Training proceeds without problems. Now, I want to generate the output for the 'test' subset of one of the "language" pairs (orig-simp) that the model was trained on.

fairseq-generate "${data_dir}/bin" \
  --path "${model_dir}/${checkpoint_name}.pt" \
  --lang-pairs orig-simp,complex-simp,long-simp \
  --task multilingual_translation --source-lang orig --target-lang simp \
  --batch-size 128 --beam 5 --remove-bpe=sentencepiece \
  --gen-subset test > "${experiment_dir}/outputs/${output_name}.out"

After running the command I get the following error:

/experiments/falva/tools/fairseq/fairseq/models/fairseq_model.py:280: UserWarning: FairseqModel is deprecated, please use FairseqEncoderDecoderModel or BaseFairseqModel instead
  for key in self.keys
Traceback (most recent call last):
  File "/home/falva/anaconda3/envs/mtl4ts/bin/fairseq-generate", line 11, in <module>
    load_entry_point('fairseq', 'console_scripts', 'fairseq-generate')()
  File "/experiments/falva/tools/fairseq/fairseq_cli/generate.py", line 190, in cli_main
    main(args)
  File "/experiments/falva/tools/fairseq/fairseq_cli/generate.py", line 47, in main
    task=task,
  File "/experiments/falva/tools/fairseq/fairseq/checkpoint_utils.py", line 167, in load_model_ensemble
    ensemble, args, _task = load_model_ensemble_and_task(filenames, arg_overrides, task)
  File "/experiments/falva/tools/fairseq/fairseq/checkpoint_utils.py", line 186, in load_model_ensemble_and_task
    model.load_state_dict(state['model'], strict=True, args=args)
TypeError: load_state_dict() got an unexpected keyword argument 'args'

Could you help me understand what's going on?

Some additional and perhaps useful information and questions:

For all 'language' pairs, all dataset splits (train/valid/test) were binarized before training using fairseq-preprocess. That's why I decided to use fairseq-generate instead of fairseq-interactive. I don't think this could be the source of the problem, right? Or is there a particular reason why, for the multilingual model, it's recommended to use interactive rather than generate as in the example you provide in your repo?
Since in this case I'm using a many-to-one model (just as in the example you provide), there is no need to use the --encoder-langtok or --decoder-langtok arguments. To my understanding, --encoder-langtok comes into play if I wanted to train a one-to-many model (--encoder-langtok tgt). But, when would --decoder-langtok be necessary in your experience?

Thank you in advance for all the help.

bug

Source

feralvam

Most helpful comment

+1.
I ran the commands shown in https://github.com/pytorch/fairseq/tree/master/examples/translation#multilingual-translation exactly as they are written, but during generation it gives the "missing --lang-pairs" error. I then add --lang-pairs de-en,fr-en, and it gives the error: TypeError: load_state_dict() got an unexpected keyword argument 'args'

Extra info: I even tried adding an "args" argument to the load_state_dict() method infairseq/models/multilingual_transformer.py, but then it gives the error:

 Traceback (most recent call last):

  File "/home/ubuntu/anaconda3/envs/pytorch_p36/bin/fairseq-interactive", line 11, in 

    load_entry_point('fairseq', 'console_scripts', 'fairseq-interactive')()

  File "/home/ubuntu/NMT_project/fairseq_cli/interactive.py", line 190, in cli_main

    main(args)

  File "/home/ubuntu/NMT_project/fairseq_cli/interactive.py", line 149, in main

    translations = task.inference_step(generator, models, sample)

  File "/home/ubuntu/NMT_project/fairseq/tasks/multilingual_translation.py", line 309, in inference_step

    if self.args.decoder_langtok else self.target_dictionary.eos(),

  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad

    return func(args, *kwargs)

  File "/home/ubuntu/NMT_project/fairseq/sequence_generator.py", line 113, in generate

    return self._generate(model, sample, *kwargs)

  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad

    return func(args, *kwargs)

  File "/home/ubuntu/NMT_project/fairseq/sequence_generator.py", line 295, in _generate

    tokens[:, :step + 1], encoder_outs, temperature=self.temperature,

  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad

    return func(args, **kwargs)

  File "/home/ubuntu/NMT_project/fairseq/sequence_generator.py", line 553, in forward_decoder

    temperature=temperature,

  File "/home/ubuntu/NMT_project/fairseq/sequence_generator.py", line 583, in _decode_one

    decoder_out = list(model.forward_decoder(

  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 591, in __getattr__

    type(self).__name__, name))

AttributeError: 'MultilingualTransformerModel' object has no attribute 'forward_decoder'


                    
                        
                            
                                
                                ICEtinger
                                on 22 Nov 2019
                            
                            
                                                                👍2


            
            
                
                    
                    
                
            

            
                                All 5 comments
                
                            

            
                                
                    
                        +1

                    
                    
                        
                            
                                
                                MrHuhoo
                                on 22 Nov 2019
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        +1.

I ran the commands shown in https://github.com/pytorch/fairseq/tree/master/examples/translation#multilingual-translation exactly as they are written, but during generation it gives the "missing --lang-pairs" error. I then add --lang-pairs de-en,fr-en, and it gives the error: TypeError: load_state_dict() got an unexpected keyword argument 'args'

Extra info: I even tried adding an "args" argument to the load_state_dict() method infairseq/models/multilingual_transformer.py, but then it gives the error:

 Traceback (most recent call last):

  File "/home/ubuntu/anaconda3/envs/pytorch_p36/bin/fairseq-interactive", line 11, in 

    load_entry_point('fairseq', 'console_scripts', 'fairseq-interactive')()

  File "/home/ubuntu/NMT_project/fairseq_cli/interactive.py", line 190, in cli_main

    main(args)

  File "/home/ubuntu/NMT_project/fairseq_cli/interactive.py", line 149, in main

    translations = task.inference_step(generator, models, sample)

  File "/home/ubuntu/NMT_project/fairseq/tasks/multilingual_translation.py", line 309, in inference_step

    if self.args.decoder_langtok else self.target_dictionary.eos(),

  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad

    return func(args, *kwargs)

  File "/home/ubuntu/NMT_project/fairseq/sequence_generator.py", line 113, in generate

    return self._generate(model, sample, *kwargs)

  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad

    return func(args, *kwargs)

  File "/home/ubuntu/NMT_project/fairseq/sequence_generator.py", line 295, in _generate

    tokens[:, :step + 1], encoder_outs, temperature=self.temperature,

  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad

    return func(args, **kwargs)

  File "/home/ubuntu/NMT_project/fairseq/sequence_generator.py", line 553, in forward_decoder

    temperature=temperature,

  File "/home/ubuntu/NMT_project/fairseq/sequence_generator.py", line 583, in _decode_one

    decoder_out = list(model.forward_decoder(

  File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 591, in __getattr__

    type(self).__name__, name))

AttributeError: 'MultilingualTransformerModel' object has no attribute 'forward_decoder' 

                    

                    
                        
                            
                                
                                ICEtinger
                                on 22 Nov 2019
                            
                            
                                                                👍2
                            
                        
                    
                


                                                
                    
                        @pipibjc ?

                    
                    
                        
                            
                                
                                huihuifan
                                on 26 Nov 2019
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        The problem seems to be dabbef467692ef4ffb7de8a01235876bd7320a93.  If you can add , args=None to load_state_dict in multilingual_transformer.py of your local checkout it should fix it.  I'll submit a fix soon.  Specifically, it should be:

    def load_state_dict(self, state_dict, strict=True, args=None):
        state_dict_subset = state_dict.copy()
        for k, _ in state_dict.items():
            assert k.startswith('models.')
            lang_pair = k.split('.')[1]
            if lang_pair not in self.models:
                del state_dict_subset[k]
        super().load_state_dict(state_dict_subset, strict=strict, args=args)


                    
                    
                        
                            
                                
                                lematt1991
                                on 6 Dec 2019
                            
                            
                                                                                            
                        
                    
                

                                                
                    
                        Should be fixed with 4c5934ac61354d9b6d164f7317905e4ac2ae1064

                    
                    
                        
                            
                                
                                myleott
                                on 19 Dec 2019
                            
                            
                                                                                            
                        
                    
                

                                            


            
                
                    
                        Was this page helpful?
                                                                                                    
                                                                                                                        
                                                                
                                                                
                                                                
                                                                
                                                                                    
                        0 / 5 - 0 ratings


    

        

        
            
                Related issues
                                                
                    
                        errors trying to decode with mbart model
                    
                
                
                    
                    mjpost
                                         · 
                    3Comments
                                    
                 
                                                
                    
                        Invalid syntax when running the pre-processing script
                    
                
                
                    
                    ajesujoba
                                         · 
                    3Comments
                                    
                 
                                                
                    
                        Enable per-token classification in RoBERTa
                    
                
                
                    
                    prihoda
                                         · 
                    3Comments
                                    
                 
                                                
                    
                        Accuracy drop after adding quant_noise to InceptionResnetV1 
                    
                
                
                    
                    jmatak
                                         · 
                    3Comments
                                    
                 
                                                
                    
                        Reproduce Billion Word benchmark for paper by Baevski and Auli, 2018.
                    
                
                
                    
                    yilegu
                                         · 
                    3Comments