Fairseq: Fairseq-generate not reading max_positions from --max-len-a

Created on 17 Mar 2020 · 5Comments · Source: pytorch/fairseq

🐛 Bug

I trained a transformer model on a simple seq2seq task. Since, my source and target sequences are much larger than 1024, I used --max-source-positions 3000 and --max-target-positions 3000 while training. The model trains well. But when I am using fairseq-generate to make predictions on the test set, it throws an exception saying: Size of sample #608 is invalid (=(1012, 1092)) since max_positions=(1024, 1024), skip this example with --skip-invalid-size-inputs-valid-test.

I am using the following command for fairseq-generate:

fairseq-generate \
    --path experiments/$EXP_NAME/checkpoints/checkpoint_best.pt \
    --quiet \--max
    --results-path experiments/$EXP_NAME/results \
    --beam 5 \
    --max-len-a 3000 \
    --max-len-b 3000 \
    data-bin/char_level/

Is this a Bug or am I missing something?

Thanks

bug needs triage

Source

zeeshansayyed

All 5 comments

You need to set --max-source-positions and --max-target-positions for fairseq-generate, otherwise they will default to DEFAULT_MAX_SOURCE_POSITIONS in the given model.py file. --max-len-a and --max-len-b control the max length of what you generate, not your input: the max output length is --max-len-a * x + --max-len-b where x is the size of the source input.

Alex-Fabbri on 17 Mar 2020

Thank you @Alex-Fabbri . I think I made a mistake while trying that out.

zeeshansayyed on 20 Mar 2020

👍1

Does src seq_len include padding or EOS? I set max-len-a to 2 and max-len-b to -2. When I feed seq ".", I got "00" generated by fairseq. 2 * 1 - 2 = 0. The output should be None. I am confused.
By the way, where is the code parsing the max-len-a and max-len-b args ? Thank you @Alex-Fabbri

JaireYu on 14 Nov 2020

@JaireYu Check out the code here up to about line 237. Looks like src_len used in the max_len calculation does include padding/EOS.

Alex-Fabbri on 14 Nov 2020

👍1

Thank you for your reply! @Alex-Fabbri. I think I understand it. The input sequence length x includes EOS, while the output length y does not include EOS. And max(y) = ax + b.

JaireYu on 16 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings