Flair: transformer models for language model training and tag prediction instead of LSTM's

Created on 15 Aug 2018 · 26Comments · Source: flairNLP/flair

I recently read the generative pretraining paper of openAI.
According to the benchmarks, fine-tuning the openAI model on a custom dataset takes a very less amount of time compared to a LSTM based approach.
Also the model has shown to improve SOTA in a lot of tasks.
So I was wondering if it is possible to replace the pipeline by a transformer based model implemented by OpenAI.

feature help wanted wontfix

Source

mittalsuraj18

👍1

Most helpful comment

Hi guys, I've made some update and a new release for these stuff: https://github.com/huggingface/pytorch-pretrained-BERT/releases/tag/v0.5.1

Keep up with the good work on flair.

thomwolf on 13 Feb 2019

🎉2 👍1

All 26 comments

Great idea - we've been discussing this internally and really want to try it out, and compare the two approaches! Any help / pointers are appreciated :)

alanakbik on 15 Aug 2018

👍1

https://github.com/huggingface/pytorch-openai-transformer-lm has an implementation of transformer model in Pytorch and scripts to load openai transformer weights.
Will have a look at it this weekend and check out the feasibility of the implementation.

mittalsuraj18 on 15 Aug 2018

Great, thanks! Perhaps this code can be the basis of new transformer-based LanguageModel and LanguageModelTrainer classes!

alanakbik on 15 Aug 2018

A deep Transformer model achieves state-of-the-art results also in language modeling now, see this paper. So I think integrating such an architecture in flair would be awesome :heart:

But don't look at the evaluation section in the paper mentioned above ;) it took more than 7 days on a single Cloud TPU :scream:

stefan-it on 16 Aug 2018

64 layers wow...
i don't think implementing such a huge network would be feasible since it would slow down the training of further models in the pipeline quite considerably. However their 12 layer network also yielded some decent results.
The concept of auxiliary losses is good and will have to test out and see how that works out.

mittalsuraj18 on 17 Aug 2018

Small update: We are going to add the BERT embeddings (see https://github.com/zalandoresearch/flair/issues/251) in the next release to flair. They are based on transformers.

We are still thinking of adding our own transformer model at one point. But not in the near future.

tabergma on 13 Dec 2018

👍2

alright 👍

mittalsuraj18 on 21 Dec 2018

@alanakbik and @tabergma : Here's another great paper about a Transformer-based LM:

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

-> Yesterday they provided both a TensorFlow and PyTorch implementation of the model. I'm going to play with the implementation now, maybe I find a way to get embeddings for a sentence (like it is done with FlairEmbeddings).

stefan-it on 10 Jan 2019

👍2

Wow this looks really interesting!

alanakbik on 10 Jan 2019

Two PR's from the pytorch-pretrained-BERT repository are very interesting:

Once they're merged I would like to add them to flair :)

Training a Transformer-XL model is possible, but on one GPU I had to use a smaller Transformer model (but I'm currently do some experiments with it...)

stefan-it on 28 Jan 2019

Yeah that would be great! :) Also, we'd be very interested to hear about your experiments with Transformer-XL!

alanakbik on 28 Jan 2019

Version 0.5.0 is out now: https://github.com/huggingface/pytorch-pretrained-BERT/releases/tag/v0.5.0

I'll check the integration of OpenAI GPT and the Transformer-XL now :)

stefan-it on 11 Feb 2019

Wow awesome!

alanakbik on 12 Feb 2019

Wow this is awesome. Really look forward to transformer-based models and fine-tuning-based models.

gccome on 12 Feb 2019

Two current caveats:

OpenAI GPT needs two libraries to be installed (not covered by pytorch-pretrained-BERTs dependency management): ftfy and spacy. For spacy you also need to manually install the English model with: python -m spacy download en. Then it works fine, I was able to get embeddings of a sentence
Transformer-XL: I wasn't able to get proper embeddings, a "nan" tensor was returned. But I opened an issue, see here :)

stefan-it on 12 Feb 2019

Ah thanks for the update - do you know why OpenAI requires spacy, and why the English models? Only for tokenization?

alanakbik on 13 Feb 2019

Hi guys, I've made some update and a new release for these stuff: https://github.com/huggingface/pytorch-pretrained-BERT/releases/tag/v0.5.1

Keep up with the good work on flair.

thomwolf on 13 Feb 2019

🎉2 👍1

I've implemented an early draft of TransformerXLEmbeddings + I'm currently training on CoNLL 2003 dataset. I'll report the results here soon :)

stefan-it on 14 Feb 2019

Bzw: Second version of GPT is out: https://github.com/openai/gpt-2/blob/master/README.md

stefan-it on 14 Feb 2019

@stefan-it In my understanding, TransformerXLEmbeddings supports varied sentences length, so it won't have out-of-index issue from BertEmbedding, because Bert has fixed length of 512. Is it correct?

gccome on 15 Feb 2019

@stefan-it @thomwolf wow that's great - really looking forward to seeing this in action! And very interested to hear how well it does on CoNLL 03 and other tasks.

alanakbik on 15 Feb 2019

Here's another Transformer-based architecture, that uses a new approach for pretraining (cloze-style token reconstruction task is embedded during training):

https://arxiv.org/abs/1903.07785

It also achieves new SOTA on CoNLL-2003 NER: 93.5% (compared to flair: 93.18%)

stefan-it on 20 Mar 2019

Very impressive results - look forward to taking a closer look at this!

alanakbik on 23 Mar 2019

One major drawback is the ridiculous amount of training data 🤣 Unfortunately, there's currently no implementation/model available.

stefan-it on 26 Mar 2019

I just asked @michaelauli if they plan to release the code and model :) [I could imagine that it will be integrated in fairseq, but this is just speculation]

stefan-it on 26 Mar 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.