It doesn't seem to be possible to enable attention when running lstm-lm
Here is an example running the lstm-lm archicture
$(which fairseq-train) $DATA_DIR \
--task language_modeling \
--optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-6 --clip-norm 0.0 \
--lr-scheduler polynomial_decay --lr $PEAK_LR --warmup-updates $WARMUP_UPDATES --total-num-update $TOTAL_UPDATES \
--max-sentences $MAX_SENTENCES --update-freq $UPDATE_FREQ \
--max-update $TOTAL_UPDATES --log-format simple --log-interval 100 \
--ddp-backend=no_c10d \
--arch lstm_lm \
--decoder-attention \
--tensorboard-logdir $TB_DIR \
--bpe gpt2 --memory-efficient-fp16 \
--num-workers $NPROC_PER_NODE \
--save-interval-updates 3600 \
--save-dir $SAVE_DIR
And here is the reported error
Traceback (most recent call last):
File "/mnt/home/jmorton/venvs/roberta/bin/fairseq-train", line 11, in <module>
load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
File "/mnt/home/jmorton/software/fairseq/fairseq_cli/train.py", line 307, in cli_main
distributed_main(args.device_id, args)
File "/mnt/home/jmorton/software/fairseq/fairseq_cli/train.py", line 286, in distributed_main
main(args, init_distributed=True)
File "/mnt/home/jmorton/software/fairseq/fairseq_cli/train.py", line 96, in main
train(args, trainer, task, epoch_itr)
File "/cm/shared/sw/pkg/devel/python3/3.7.3/lib64/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/mnt/home/jmorton/software/fairseq/fairseq_cli/train.py", line 176, in train
log_output = trainer.train_step(samples)
File "/cm/shared/sw/pkg/devel/python3/3.7.3/lib64/python3.7/contextlib.py", line 74, in inner
return func(*args, **kwds)
File "/mnt/home/jmorton/software/fairseq/fairseq/trainer.py", line 319, in train_step
ignore_grad=is_dummy_batch,
File "/mnt/home/jmorton/software/fairseq/fairseq/tasks/fairseq_task.py", line 337, in train_step
loss, sample_size, logging_output = criterion(model, sample)
File "/mnt/home/jmorton/venvs/roberta/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/home/jmorton/software/fairseq/fairseq/criterions/cross_entropy.py", line 29, in forward
net_output = model(**sample['net_input'])
File "/mnt/home/jmorton/venvs/roberta/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/home/jmorton/venvs/roberta/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 447, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/mnt/home/jmorton/venvs/roberta/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/home/jmorton/software/fairseq/fairseq/models/fairseq_model.py", line 436, in forward
return self.decoder(src_tokens, **kwargs)
File "/mnt/home/jmorton/venvs/roberta/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/home/jmorton/software/fairseq/fairseq/models/lstm.py", line 374, in forward
prev_output_tokens, encoder_out, incremental_state
File "/mnt/home/jmorton/software/fairseq/fairseq/models/lstm.py", line 431, in extract_features
"attention is not supported if there are no encoder outputs"
AssertionError: attention is not supported if there are no encoder outputs
I bet that this line is the culprit: https://github.com/pytorch/fairseq/blob/master/fairseq/models/lstm_lm.py#L100
@joshim5
The LSTM Language Model in fairseq is a decoder-only model, so it does not support attention over encoder inputs. The line you mention is correct, but we should disable the option to use --decoder-attention
Seems this is resolved. Please open a new issue if you are still having problems.
@lematt1991: I will open a PR to remove the --decoder-attention argument. If this option is not used, then the model should train correctly.
Fixed by b2ee110c853c5effdd8d21f50a8437485bafb285