Bert: Expected masked_lm_accuracy

Created on 9 Nov 2018  路  4Comments  路  Source: google-research/bert

The paper stated that the model was pretrained for 1M steps. May I know how high roughly the masked_lm_accuracy is expected to be at the end of training? Is a development set used in paper?

Most helpful comment

I don't know the exact accuracy, but the held-out natural log likelihood for BERT-Base was around -1.4 (so e^1.4 = 4.0 was the perplexity I show in a table). It depends on the language and corpus but you should definitely expect something better than -2.0 (better than == closer to zero).

But a much better way of measuring progress is to take intermediate checkpoints and use them to fine-tune a downstream task.

All 4 comments

I don't know the exact accuracy, but the held-out natural log likelihood for BERT-Base was around -1.4 (so e^1.4 = 4.0 was the perplexity I show in a table). It depends on the language and corpus but you should definitely expect something better than -2.0 (better than == closer to zero).

But a much better way of measuring progress is to take intermediate checkpoints and use them to fine-tune a downstream task.

Same question, thank you.

in the readme 馃憤

Eval results
global_step = 20
loss = 0.0979674
masked_lm_accuracy = 0.985479
masked_lm_loss = 0.0979328
next_sentence_accuracy = 1.0
next_sentence_loss = 3.45724e-05

Was this page helpful?
0 / 5 - 0 ratings