Flair: Add Fine-Tunable Transformers to Flair

Created on 25 Mar 2020  路  6Comments  路  Source: flairNLP/flair

We currently support word embeddings from Huggingface's various transformer models (BERT, XLM, etc.), but two important features are missing: (1) we don't yet support sentence embeddings extracted directly from the transformer model using the [CLS] token and (2) the transformers currently are not fine-tuneable via Flair. This is a shame since transformers really shine when sentence embeddings are directly extracted from a fine-tuned transformer.

So with this issue, we want to add

  • [x] The ability to get sentence embeddings directly from transformers, by adding new DocumentEmbeddings classes
  • [x] The ability to fine-tune all transformer word and document embeddings classes
feature

Most helpful comment

Supporting longer texts (more than 512 subtokens) would be helpful (at least for prediction). My research show that processing paragraphs rather than sentences decreases error by 10%.

All 6 comments

Supporting longer texts (more than 512 subtokens) would be helpful (at least for prediction). My research show that processing paragraphs rather than sentences decreases error by 10%.

Yes good point - what is the 'standard' way of working around the 512 subtoken limitation of transformers? I guess easiest would be to truncate the text to max length 512, but maybe there is a better way?

I have in mind sequence tagging so truncating in prediction mode is unacceptable. The text should be divided into splits with some overlapping context and then reconstructed.

For text classification there are some truncating strategies. However, in simple-transformers text is divided and each part is predicted separately, then the mode of text predictions is a final result.

Thanks - yes for TransformerWordEmbeddings an overlapping segment strategy should be doable and sounds like the best approach. For TransformerDocumentEmbeddings we require a strategy that outputs a single embedding for a text of arbitrary length so maybe truncation is the way to go here.

Just for reference, some truncation strategies are evaluated in this paper.

Fine-tuning now part of Flair 0.5.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Aditya715 picture Aditya715  路  3Comments

mittalsuraj18 picture mittalsuraj18  路  3Comments

inyukwo1 picture inyukwo1  路  3Comments

Rahulvks picture Rahulvks  路  3Comments

ciaochiaociao picture ciaochiaociao  路  3Comments