Transformers: A Micro BERT

Created on 24 Sep 2019  ยท  8Comments  ยท  Source: huggingface/transformers

โ“ Questions & Help

Hello,
Has anyone solved a problem like this, or knows of a solution:
I want to pre-train BERT on a custom dataset, but this data is much smaller than the one used by Google.
So is it possible to train it on a "micro" bert with much lesser layers, etc.
Thanks in advance

wontfix

Most helpful comment

YOU NEED ALBERT

All 8 comments

I am using a much smaller dataset with my project, but it doesn't mean I need a bert with lesser layers. Otherwise, I have no way to utilize the pre-trained model.

What is the problem you have with the smaller dataset?

My dataset is very esoteric, in the sense that BERTs pretrained weights will almost be like noise.

YOU NEED ALBERT

Einstein?

They are referring to this new ALBERT paper. No weights are available however so give it a few months.

Definitely try fine-tuning a pre-trained BERT first, you can also just edit the BertConfig class to get a smaller network, but you probably can't train it from scratch on a small amount of data.

Interesting. Can't I train a very small BERT as you said(maybe 2 layers) on like 4million tokens

I'm not sure what the minimum tokens and layers are, I'm not sure anyone has published that. Best to try it out.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fabiocapsouza picture fabiocapsouza  ยท  3Comments

chuanmingliu picture chuanmingliu  ยท  3Comments

iedmrc picture iedmrc  ยท  3Comments

rsanjaykamath picture rsanjaykamath  ยท  3Comments

HanGuo97 picture HanGuo97  ยท  3Comments