Hello,
I am trying to extract features from German text using bert-base-multilingual-cased. However, my text is bigger than 512 words.
Is there any way to use the pertained Bert for text greater than 512 words
Hello,
I do not think that it is possible out of the box. The article states the following:
We use learned positional embeddings with supported sequence lengths up to 512 tokens.
The positional embeddings are therefore limited to 512 tokens. You may be able to add positional embeddings for position greater than 512 and learn them on your specific dataset but I don't know how efficient that would be.
Hi @agemagician, you cannot really use pretrained bert for text longer than 512 tokens per se but you can use the sliding window approach.
Check this issue of the original bert repo for more details: https://github.com/google-research/bert/issues/66
Most helpful comment
Hi @agemagician, you cannot really use pretrained bert for text longer than 512 tokens per se but you can use the sliding window approach.
Check this issue of the original bert repo for more details: https://github.com/google-research/bert/issues/66