fairseq-generate shuffles test data?

Created on 21 Apr 2020  路  1Comment  路  Source: pytorch/fairseq

The results generated from fairseq-generate are shuffled. I am not sure if it is caused by fairseq-preprocess or fairseq-generate, but sometimes we want the results to be in the original order. I suspect it is a way to make batches in optimal shapes.
Although it is not hard to order them based on the 'H-XXX' prefix of each line, I wonder if there is any feature/mechanism to avoid the shuffling of test data.

The results from fairseq-interactive are in original order.

needs triage question

Most helpful comment

Yes, you should sort the results after the fact based on the H-XXX prefix. The order is not preserved when using generate, because we sort sequences by length to minimize padding and increase efficiency. This is not the case for interactive, since you don't have all the sequences up front to sort.

>All comments

Yes, you should sort the results after the fact based on the H-XXX prefix. The order is not preserved when using generate, because we sort sequences by length to minimize padding and increase efficiency. This is not the case for interactive, since you don't have all the sequences up front to sort.

Was this page helpful?
0 / 5 - 0 ratings