Transformers: A Micro BERT

Created on 24 Sep 2019 · 8Comments · Source: huggingface/transformers

❓ Questions & Help

Hello,
Has anyone solved a problem like this, or knows of a solution:
I want to pre-train BERT on a custom dataset, but this data is much smaller than the one used by Google.
So is it possible to train it on a "micro" bert with much lesser layers, etc.
Thanks in advance

wontfix

Source

aditya-malte

👍1

Most helpful comment

YOU NEED ALBERT

cavities on 26 Sep 2019

👍4

All 8 comments

I am using a much smaller dataset with my project, but it doesn't mean I need a bert with lesser layers. Otherwise, I have no way to utilize the pre-trained model.

What is the problem you have with the smaller dataset?

royhuang9 on 26 Sep 2019

My dataset is very esoteric, in the sense that BERTs pretrained weights will almost be like noise.

aditya-malte on 26 Sep 2019

YOU NEED ALBERT

cavities on 26 Sep 2019

👍4

Einstein?

aditya-malte on 26 Sep 2019

They are referring to this new ALBERT paper. No weights are available however so give it a few months.

Definitely try fine-tuning a pre-trained BERT first, you can also just edit the BertConfig class to get a smaller network, but you probably can't train it from scratch on a small amount of data.

wassname on 28 Sep 2019

Interesting. Can't I train a very small BERT as you said(maybe 2 layers) on like 4million tokens

aditya-malte on 28 Sep 2019

I'm not sure what the minimum tokens and layers are, I'm not sure anyone has published that. Best to try it out.

wassname on 28 Sep 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.