Transformers: bert-base-multilingual-cased - Text bigger than 512

Created on 5 Dec 2018 · 2Comments · Source: huggingface/transformers

Hello,

I am trying to extract features from German text using bert-base-multilingual-cased. However, my text is bigger than 512 words.
Is there any way to use the pertained Bert for text greater than 512 words

Source

agemagician

👍1

Most helpful comment

Hi @agemagician, you cannot really use pretrained bert for text longer than 512 tokens per se but you can use the sliding window approach.

Check this issue of the original bert repo for more details: https://github.com/google-research/bert/issues/66

thomwolf on 9 Dec 2018

👍5

All 2 comments

Hello,

I do not think that it is possible out of the box. The article states the following:

We use learned positional embeddings with supported sequence lengths up to 512 tokens.

The positional embeddings are therefore limited to 512 tokens. You may be able to add positional embeddings for position greater than 512 and learn them on your specific dataset but I don't know how efficient that would be.