Fairseq: Difference between BLEU and SacreBLEU

Created on 16 Apr 2019  路  1Comment  路  Source: pytorch/fairseq

Hello!

Could you, please, elaborate on the difference between BLEU and SacreBLEU scores reported in the Fairseq paper? How can I calculate SacreBLEU, for example, for the output of DynamicConv model? I can reproduce 29.7 BLEU with fairseq-score but when I run fairseq-score with flag --sacrebleu, I get ridiculously high score of 33.8.

Thanks

Most helpful comment

In general: Sacrebleu is the number obtained through this script: https://github.com/mjpost/sacreBLEU
And BLEU is from this one: https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/multi-bleu.perl

You can read about the difference between them here: https://arxiv.org/abs/1804.08771

I'm not sure why you see the ridiculously high BLEU, but maybe you don't have --remove-bpe?

>All comments

In general: Sacrebleu is the number obtained through this script: https://github.com/mjpost/sacreBLEU
And BLEU is from this one: https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/multi-bleu.perl

You can read about the difference between them here: https://arxiv.org/abs/1804.08771

I'm not sure why you see the ridiculously high BLEU, but maybe you don't have --remove-bpe?

Was this page helpful?
0 / 5 - 0 ratings