Is your feature request related to a problem? Please describe.
ELMo framework is great, but most (all?) implementations are based on deep LSTM models. Deep LSTM models are slow to train and slow during inference. A faster algorithm with similar accuracy would greatly help so users don't have to use GPUs during inference.
Describe the solution you'd like
There is currently some talk on how RNNs can be replaced by other methods. A simpler architecture like the temporal convolutional network can match the performance of RNNs after adding some tricks during training. An ELMo version would simply be a bidirectional TCN.
Describe alternatives you've considered
AllenNLP has a CNN encoder and a transformer encoder but as far as I can tell, it's not built for language modeling. There are other packages that implement these specific models but AllenNLP is such a nice framework, I think it would be a great addition.
Additional context
Of course this sounds like a paper-sized project 馃槄. But I wanted to put it out there.
FWIW, we have CNN and Transformer implementations of biLMs and an upcoming EMNLP paper that compares them to the LSTM based model used in the ELMo paper. We plan to add this functionality to allennlp over the next few weeks.
Great! I think this is the paper.
Assigning @joelgrus since he's working on integrating the ELMo training code with AllenNLP (we can close this when that's done).
This work is complete. Sorry for the delay! Please see the documentation at https://github.com/allenai/allennlp/blob/master/tutorials/how_to/training_transformer_elmo.md and let me know if you have any questions!
Most helpful comment
Great! I think this is the paper.
https://arxiv.org/abs/1808.08949