how to explain that,
my training set about 1M line, runing 50000 steps, batchsize is 32
Hi, Did you solve this issue?
I am pretraining bert on domain specific dataset ( 1 million sentences, all as one document ) masked lm accuracy doesnt go beyond 70%
Tried running for 100000 steps also, didnt help much.
What to do about this?
I am pretraining to get sentence level embeddings and compare the similarities between them.
acturally, after 300000 steps, masked_lm_accuracy is low at 0.88
so, there is no problemn, just not enough training steps
Hello, my data isn't that large: 720 Million words only, so 300,000 steps with sequence length 128 would mean 50 epochs for me, would that lead to overfitting? Also what learning rate and warmup steps did you use? @loveJasmine
Hi, I follow the guide for pre-trained, which use sample_text.txt, but got low mlm accuracy and high nsp accuracy, do you know why?