Flair: Add Fine-Tunable Transformers to Flair

Created on 25 Mar 2020 · 6Comments · Source: flairNLP/flair

We currently support word embeddings from Huggingface's various transformer models (BERT, XLM, etc.), but two important features are missing: (1) we don't yet support sentence embeddings extracted directly from the transformer model using the [CLS] token and (2) the transformers currently are not fine-tuneable via Flair. This is a shame since transformers really shine when sentence embeddings are directly extracted from a fine-tuned transformer.

So with this issue, we want to add

[x] The ability to get sentence embeddings directly from transformers, by adding new DocumentEmbeddings classes
[x] The ability to fine-tune all transformer word and document embeddings classes

feature

Source

alanakbik

👍6

Most helpful comment

Supporting longer texts (more than 512 subtokens) would be helpful (at least for prediction). My research show that processing paragraphs rather than sentences decreases error by 10%.

djstrong on 30 Mar 2020

👍2

All 6 comments

Supporting longer texts (more than 512 subtokens) would be helpful (at least for prediction). My research show that processing paragraphs rather than sentences decreases error by 10%.

djstrong on 30 Mar 2020

👍2

Yes good point - what is the 'standard' way of working around the 512 subtoken limitation of transformers? I guess easiest would be to truncate the text to max length 512, but maybe there is a better way?

alanakbik on 30 Mar 2020

I have in mind sequence tagging so truncating in prediction mode is unacceptable. The text should be divided into splits with some overlapping context and then reconstructed.

For text classification there are some truncating strategies. However, in simple-transformers text is divided and each part is predicted separately, then the mode of text predictions is a final result.

djstrong on 30 Mar 2020

Thanks - yes for TransformerWordEmbeddings an overlapping segment strategy should be doable and sounds like the best approach. For TransformerDocumentEmbeddings we require a strategy that outputs a single embedding for a text of arbitrary length so maybe truncation is the way to go here.

alanakbik on 30 Mar 2020

Just for reference, some truncation strategies are evaluated in this paper.

alanakbik on 2 Apr 2020

👍1

Fine-tuning now part of Flair 0.5.

alanakbik on 24 May 2020

Was this page helpful?

0 / 5 - 0 ratings