Models: Can Not Replicate Transformer Base Bleu Scores

Created on 12 Dec 2018 · 12Comments · Source: tensorflow/models

System information

What is the top-level directory of the model you are using: /models/official/transformer
Have I written custom code : No
OS Platform and Distribution :Ubuntu 16.04.5 LTS
TensorFlow installed from (source or binary): Binary
TensorFlow version : v1.12.0-0-ga6d8ffae09 1.12.0
CUDA/cuDNN version: release 9.0, V9.0.176
GPU model and memory: Tesla K40m/12GB
Exact command to reproduce: Official Instructions
Bazel version: N/A

Describe the problem

I have been trying to replicate models/offical/tensorflow/ Base. I followed the official instructions yet I was faced with two problems:
1- The size of the generated vocabulary was bigger than the one defined in /models/offical/tranformer/model/model_params.py. The program would not work, I fixed it by changing the value in model_params.py to the actual vocabulary size of 33945.
2- The second problem, and the one I could not solve, is that Blue scores after 10 epochs are not consistent with what is reported in models/official/tensorflow. Running, as instructed, compute_bleu.py gives case-insensitive Bleu scores of 26.04 far bellow the "promised" 27.7 Bleu for Base Transformer.

I have trained 3 models and even though there are fluctuation in Bleu scores these are minimal being the biggest difference 0,1 Bleu.

Source code / logs

python compute_bleu.py --translation=translation.en --reference=test_data/newstest2014.de
I1212 11:03:35.484694 140343540246272 tf_logging.py:115] Case-insensitive results: 26.038009
I1212 11:03:40.145630 140343540246272 tf_logging.py:115] Case-sensitive results: 25.506699

bug

Source

rodasoares

Most helpful comment

I'm having the same problem too.
I've tried TF version 1.10 and 1.12, with tensorflow models repo branch 1.10 and master.
Here's my environment.

What is the top-level directory of the model you are using: /models/official/transformer
Have I written custom code : No
OS Platform and Distribution :Ubuntu 16.04.5 LTS
TensorFlow installed from (source or binary): Source
TensorFlow version : 'v1.12.0-0-ga6d8ffa' 1.12.0
CUDA/cuDNN version: release 9.2, 7.2.1
GPU model and memory: 1080 Ti, 11GB
Exact command to reproduce: Official Instructions
Bazel version: 0.19.2

I've also noticed that the downloaded data size doesn't match the official instructions.
The raw files are 7.8GB (official instructions says 8.4GB), and the TFRecord files are 689MB (official says 722MB). Vocabulary size needed to be changed too (to 33945) as OP has mentioned.

It also seems like the base model quickly overfits, around 6th epoch.
At epoch 5, I'm getting 24.66 case-insensitive BLEU, but at epoch 10 I get 22.85.
(#5573 )

Any advice on where to look, or a working combination of TF version and models repo branch would be really appreciated.

chanhee0222 on 14 Jan 2019

👍2

All 12 comments

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
Bazel version
GPU model and memory

tensorflowbutler on 13 Dec 2018

Is there any update on this issue? I am having the same problem:
Case-insensitive results: 25.97
Case-sensitive results: 25.42

deephanson94 on 11 Jan 2019

I'm having the same problem too.
I've tried TF version 1.10 and 1.12, with tensorflow models repo branch 1.10 and master.
Here's my environment.

It also seems like the base model quickly overfits, around 6th epoch.
At epoch 5, I'm getting 24.66 case-insensitive BLEU, but at epoch 10 I get 22.85.
(#5573 )

Any advice on where to look, or a working combination of TF version and models repo branch would be really appreciated.

chanhee0222 on 14 Jan 2019

👍2

Im still running into this issue with the latest release. Seeing similar bleu scores and plots as others mentioned above. Is there any update to this?

vishalsubbiah on 18 Jan 2019

Any updates on this issue? I am running into the same problem with branch r1.12.0.

timxzz on 4 May 2019

Having the same issue with "latest stable release" v1.11
Case-insensitive results: 23.499498
Case-sensitive results: 23.019947

Any update by the previous commenters ?

jwallbridge on 24 Jul 2019

I gave up on it. For what I get the scores they report are not reproducible!

Em 24/07/2019 3:03 da manhã, James Wallbridge notifications@github.com escreveu:

Having the same issue with "latest stable release" v1.11
Case-insensitive results: 23.499498
Case-sensitive results: 23.019947

Any update by the previous commenters ?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHubhttps://github.com/tensorflow/models/issues/5901?email_source=notifications&email_token=AJXW437KJW7RTOSD6P4PENDQA62AFA5CNFSM4GJ5Y3F2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2U52ZQ#issuecomment-514448742, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJXW43YMLABIQDFLJVTPFWLQA62AFANCNFSM4GJ5Y3FQ.

rodasoares on 24 Jul 2019

I gave up as well and changed to tensor2tensor

timxzz on 24 Jul 2019

I used the code from mlperf. That matched pretty well.

vishalsubbiah on 25 Jul 2019

I used the code from mlperf. That matched pretty well.

Do you means the code from mlperf can't replicate the bleu score and came across the overfitting problem too??

dreamingo on 23 Oct 2019

I came across the same problem, although I do not use the dataset from example. I use my dataset with tensor2tensor and tensorflow/models code

Tensor2tensor can produce much better result(bleu or loss), and tensorflow/models did came across the overfitting problem as mentioned above;

dreamingo on 23 Oct 2019

I used the code from mlperf. That matched pretty well.

Do you means the code from mlperf can't replicate the bleu score and came across the overfitting problem too??

The mlperf code did replicate the bleu score and did not have the overfitting problem.

vishalsubbiah on 23 Oct 2019

Was this page helpful?

0 / 5 - 0 ratings