Bert: masked_lm_accuracy is low at 0.51, but next_sentence_accuracy is high at 0.93

Created on 6 Apr 2019 · 4Comments · Source: google-research/bert

how to explain that,

my training set about 1M line, runing 50000 steps, batchsize is 32

Source

loveJasmine

👀1

All 4 comments

Hi, Did you solve this issue?

I am pretraining bert on domain specific dataset ( 1 million sentences, all as one document ) masked lm accuracy doesnt go beyond 70%

Tried running for 100000 steps also, didnt help much.

What to do about this?

I am pretraining to get sentence level embeddings and compare the similarities between them.

KavyaGujjala on 11 Apr 2019

👀1

acturally, after 300000 steps, masked_lm_accuracy is low at 0.88
so, there is no problemn, just not enough training steps

loveJasmine on 12 Apr 2019

Hello, my data isn't that large: 720 Million words only, so 300,000 steps with sequence length 128 would mean 50 epochs for me, would that lead to overfitting? Also what learning rate and warmup steps did you use? @loveJasmine

maggieezzat on 14 May 2019

👀1

Hi, I follow the guide for pre-trained, which use sample_text.txt, but got low mlm accuracy and high nsp accuracy, do you know why?

linWujl on 24 Oct 2019

👀1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

run_classifier.py gets struck while saving checkpoint

santhoshkolloju · 3Comments

train_batch_size in run_classifier.py

awasthiabhijeet · 3Comments

IndexError in run_classifier.py::MrpcProcessor::_create_examples (2)

alter-bug-tracer · 3Comments

run run_classifier.py on chinese data, Failed to find any matching files for /path/chinese_L-12_H-768_A-12/bert_model.ckpt

qiugen · 4Comments

File "run_classifier.py", line 326, in _create_examples text_b = tokenization.convert_to_unicode(line[4]) IndexError: list index out of range

ishita-gupta98 · 3Comments