Allennlp: Feature Request: CNN / Transformer version of ELMo

Created on 6 Aug 2018  路  4Comments  路  Source: allenai/allennlp

Is your feature request related to a problem? Please describe.
ELMo framework is great, but most (all?) implementations are based on deep LSTM models. Deep LSTM models are slow to train and slow during inference. A faster algorithm with similar accuracy would greatly help so users don't have to use GPUs during inference.

Describe the solution you'd like
There is currently some talk on how RNNs can be replaced by other methods. A simpler architecture like the temporal convolutional network can match the performance of RNNs after adding some tricks during training. An ELMo version would simply be a bidirectional TCN.

Describe alternatives you've considered
AllenNLP has a CNN encoder and a transformer encoder but as far as I can tell, it's not built for language modeling. There are other packages that implement these specific models but AllenNLP is such a nice framework, I think it would be a great addition.

Additional context
Of course this sounds like a paper-sized project 馃槄. But I wanted to put it out there.

Most helpful comment

Great! I think this is the paper.

https://arxiv.org/abs/1808.08949

All 4 comments

FWIW, we have CNN and Transformer implementations of biLMs and an upcoming EMNLP paper that compares them to the LSTM based model used in the ELMo paper. We plan to add this functionality to allennlp over the next few weeks.

Great! I think this is the paper.

https://arxiv.org/abs/1808.08949

Assigning @joelgrus since he's working on integrating the ELMo training code with AllenNLP (we can close this when that's done).

This work is complete. Sorry for the delay! Please see the documentation at https://github.com/allenai/allennlp/blob/master/tutorials/how_to/training_transformer_elmo.md and let me know if you have any questions!

Was this page helpful?
0 / 5 - 0 ratings