Fairseq: Difference between BLEU and SacreBLEU

Created on 16 Apr 2019 · 1Comment · Source: pytorch/fairseq

Hello!

Could you, please, elaborate on the difference between BLEU and SacreBLEU scores reported in the Fairseq paper? How can I calculate SacreBLEU, for example, for the output of DynamicConv model? I can reproduce 29.7 BLEU with fairseq-score but when I run fairseq-score with flag --sacrebleu, I get ridiculously high score of 33.8.

Thanks

Source

AlexGrinch

Most helpful comment

In general: Sacrebleu is the number obtained through this script: https://github.com/mjpost/sacreBLEU
And BLEU is from this one: https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/multi-bleu.perl

You can read about the difference between them here: https://arxiv.org/abs/1804.08771

I'm not sure why you see the ridiculously high BLEU, but maybe you don't have --remove-bpe?

edunov on 22 Apr 2019

👍4

>All comments

You can read about the difference between them here: https://arxiv.org/abs/1804.08771

I'm not sure why you see the ridiculously high BLEU, but maybe you don't have --remove-bpe?

edunov on 22 Apr 2019

👍4

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Any performance comparison between pre-norm and post-norm for Transformer on Machine Translation

gaopengcuhk · 3Comments

Enable per-token classification in RoBERTa

prihoda · 3Comments

(wav2vec 2.0)Can you provide detailed hyperparameters for finetune?

zqs01 · 3Comments

fairseq/clib/libbleu/libbleu.cpp:10:10: fatal error: 'array' file not found

galphag · 3Comments

Reproduce Billion Word benchmark for paper by Baevski and Auli, 2018.

yilegu · 3Comments