Tensor2tensor: How to get real bleu score? [approx_bleu_score]

Created on 14 Feb 2018  路  3Comments  路  Source: tensorflow/tensor2tensor

I see that only the approx_bleu_score is sent to the tensorboard. How can evaluate the real bleu?
What's the difference between approx_bleu_score to real bleu?

question

Most helpful comment

How can evaluate the real bleu?

use t2t-bleu

What's the difference between approx_bleu_score to real bleu?

The main difference is that approx_bleu is computed on the internal subwords instead of words, thus it is not replicable (not comparable with other models) and not suitable for reporting in publications.
Another problem is the autoregressive evaluation using gold previous tokens, which is a kind of cheating.
See #407, #522 and #436 for more details.

All 3 comments

How can evaluate the real bleu?

use t2t-bleu

What's the difference between approx_bleu_score to real bleu?

The main difference is that approx_bleu is computed on the internal subwords instead of words, thus it is not replicable (not comparable with other models) and not suitable for reporting in publications.
Another problem is the autoregressive evaluation using gold previous tokens, which is a kind of cheating.
See #407, #522 and #436 for more details.

@nadavb Are there still questions left? Otherwise I think we could close that issue :)

@martinpopel thanks,

  1. But how can we use t2t-bleu on the already existing model-data directory (the one that has all the train files, and one dev file) - so it will use the dev file for evaluation?
  1. What script can we run in the command line to get the approx_bleu?
Was this page helpful?
0 / 5 - 0 ratings

Related issues

anglil picture anglil  路  5Comments

apeterswu picture apeterswu  路  3Comments

Jong-Won picture Jong-Won  路  5Comments

goodmansasha picture goodmansasha  路  4Comments

jsawruk picture jsawruk  路  4Comments