The results generated from fairseq-generate are shuffled. I am not sure if it is caused by fairseq-preprocess or fairseq-generate, but sometimes we want the results to be in the original order. I suspect it is a way to make batches in optimal shapes.
Although it is not hard to order them based on the 'H-XXX' prefix of each line, I wonder if there is any feature/mechanism to avoid the shuffling of test data.
The results from fairseq-interactive are in original order.
Yes, you should sort the results after the fact based on the H-XXX prefix. The order is not preserved when using generate, because we sort sequences by length to minimize padding and increase efficiency. This is not the case for interactive, since you don't have all the sequences up front to sort.
Most helpful comment
Yes, you should sort the results after the fact based on the
H-XXXprefix. The order is not preserved when using generate, because we sort sequences by length to minimize padding and increase efficiency. This is not the case for interactive, since you don't have all the sequences up front to sort.