Hello,
Has anyone solved a problem like this, or knows of a solution:
I want to pre-train BERT on a custom dataset, but this data is much smaller than the one used by Google.
So is it possible to train it on a "micro" bert with much lesser layers, etc.
Thanks in advance
I am using a much smaller dataset with my project, but it doesn't mean I need a bert with lesser layers. Otherwise, I have no way to utilize the pre-trained model.
What is the problem you have with the smaller dataset?
My dataset is very esoteric, in the sense that BERTs pretrained weights will almost be like noise.
YOU NEED ALBERT
Einstein?
They are referring to this new ALBERT paper. No weights are available however so give it a few months.
Definitely try fine-tuning a pre-trained BERT first, you can also just edit the BertConfig class to get a smaller network, but you probably can't train it from scratch on a small amount of data.
Interesting. Can't I train a very small BERT as you said(maybe 2 layers) on like 4million tokens
I'm not sure what the minimum tokens and layers are, I'm not sure anyone has published that. Best to try it out.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
YOU NEED ALBERT