I trained a transformer model on a simple seq2seq task. Since, my source and target sequences are much larger than 1024, I used --max-source-positions 3000 and --max-target-positions 3000 while training. The model trains well. But when I am using fairseq-generate to make predictions on the test set, it throws an exception saying: Size of sample #608 is invalid (=(1012, 1092)) since max_positions=(1024, 1024), skip this example with --skip-invalid-size-inputs-valid-test.
I am using the following command for fairseq-generate:
fairseq-generate \
--path experiments/$EXP_NAME/checkpoints/checkpoint_best.pt \
--quiet \--max
--results-path experiments/$EXP_NAME/results \
--beam 5 \
--max-len-a 3000 \
--max-len-b 3000 \
data-bin/char_level/
Is this a Bug or am I missing something?
Thanks
You need to set --max-source-positions and --max-target-positions for fairseq-generate, otherwise they will default to DEFAULT_MAX_SOURCE_POSITIONS in the given model.py file. --max-len-a and --max-len-b control the max length of what you generate, not your input: the max output length is --max-len-a * x + --max-len-b where x is the size of the source input.
Thank you @Alex-Fabbri . I think I made a mistake while trying that out.
Does src seq_len include padding or EOS? I set max-len-a to 2 and max-len-b to -2. When I feed seq ".", I got "00" generated by fairseq. 2 * 1 - 2 = 0. The output should be None. I am confused.
By the way, where is the code parsing the max-len-a and max-len-b args ? Thank you @Alex-Fabbri
@JaireYu Check out the code here up to about line 237. Looks like src_len used in the max_len calculation does include padding/EOS.
Thank you for your reply! @Alex-Fabbri. I think I understand it. The input sequence length x includes EOS, while the output length y does not include EOS. And max(y) = ax + b.