Tensor2tensor: How to get real bleu score? [approx_bleu_score]

Created on 14 Feb 2018 · 3Comments · Source: tensorflow/tensor2tensor

I see that only the approx_bleu_score is sent to the tensorboard. How can evaluate the real bleu?
What's the difference between approx_bleu_score to real bleu?

question

Source

ndvbd

Most helpful comment

How can evaluate the real bleu?

use t2t-bleu

What's the difference between approx_bleu_score to real bleu?

The main difference is that approx_bleu is computed on the internal subwords instead of words, thus it is not replicable (not comparable with other models) and not suitable for reporting in publications.
Another problem is the autoregressive evaluation using gold previous tokens, which is a kind of cheating.
See #407, #522 and #436 for more details.

martinpopel on 14 Feb 2018

👍4

All 3 comments

How can evaluate the real bleu?

use t2t-bleu

What's the difference between approx_bleu_score to real bleu?

martinpopel on 14 Feb 2018

👍4

@nadavb Are there still questions left? Otherwise I think we could close that issue :)

stefan-it on 25 Apr 2018

👍1

@martinpopel thanks,

But how can we use t2t-bleu on the already existing model-data directory (the one that has all the train files, and one dev file) - so it will use the dev file for evaluation?