Hi, I am running the same task with the same hyper parameters as the official Google Tensorflow implementation of BERT, however, I am getting around 1.5% lower accuracy. Can you please give any hint about the possible cause?
Thanks!
Hi!
Could it be different seeds?
See e.g. https://github.com/huggingface/pytorch-pretrained-BERT/issues/53#issuecomment-441565229
Hi @ejld, yes BERT has a large variance on many fine-tuning tasks (see also the discussion in #64).
You should try a bunch of different seeds (like 10 seeds for example) and compare the mean and standard deviation of the results.
Most helpful comment
Hi @ejld, yes BERT has a large variance on many fine-tuning tasks (see also the discussion in #64).
You should try a bunch of different seeds (like 10 seeds for example) and compare the mean and standard deviation of the results.