Bert: Expected masked_lm_accuracy

Created on 9 Nov 2018 · 4Comments · Source: google-research/bert

The paper stated that the model was pretrained for 1M steps. May I know how high roughly the masked_lm_accuracy is expected to be at the end of training? Is a development set used in paper?

Source

okgrammer

👍3 👀1

Most helpful comment

I don't know the exact accuracy, but the held-out natural log likelihood for BERT-Base was around -1.4 (so e^1.4 = 4.0 was the perplexity I show in a table). It depends on the language and corpus but you should definitely expect something better than -2.0 (better than == closer to zero).

But a much better way of measuring progress is to take intermediate checkpoints and use them to fine-tune a downstream task.

jacobdevlin-google on 9 Nov 2018

👍3 👀1

All 4 comments

https://multimediashop.net

IntelOSt on 9 Nov 2018

👎19

But a much better way of measuring progress is to take intermediate checkpoints and use them to fine-tune a downstream task.

jacobdevlin-google on 9 Nov 2018

👍3 👀1

Same question, thank you.

guotong1988 on 12 Nov 2018

in the readme 👍

Eval results
global_step = 20
loss = 0.0979674
masked_lm_accuracy = 0.985479
masked_lm_loss = 0.0979328
next_sentence_accuracy = 1.0
next_sentence_loss = 3.45724e-05