Allennlp: ELMo SentEval Metrics

Created on 6 Mar 2018 · 4Comments · Source: allenai/allennlp

Hi Allen NLP team!

Fantastic work with ELMo so far. I'm curious if you have tested any pretrained vectors that your team has released on SentEval. It would allow others to compare it more easily to projects like InferSent.

If not, I may be able to give it a try in the coming weeks!

Nick

Source

ncammarata

Most helpful comment

If you guys are interested, we did an extensive evaluation, not integrating ELMo but doing a simple BoW of ELMo in comparison with InferSent, Google Universal Sentence Embedder, etc, and the results are really impressive, a simple ELMo BoW was better in many downstream classification tasks. The paper is out today here: https://arxiv.org/abs/1806.06259

PS: we also executed evaluation on all linguistic probing tasks.

perone on 19 Jun 2018

🎉3 👍1

All 4 comments

We haven't tested our vectors on sentence classification, although it would be easy to make a direct comparison to the InferSent approach by substituting the InferSent pre-trained GloVe vectors with the ELMo vectors. As ELMo based models have improved over the corresponding GloVe versions in all tasks we've tried (including single model and ensemble SOTA for SNLI in our paper), I expect they would provide similar gains for sentence representations. If you do decide to try this I would be very curious to hear the results!

matt-peters on 6 Mar 2018

👍1

PS: we also executed evaluation on all linguistic probing tasks.

perone on 19 Jun 2018

🎉3 👍1

Super interesting, thanks for sharing!!

matt-peters on 19 Jun 2018

👍1

@perone I read your paper and I was trying to reproduce the results, I am kind of confused of how you calculate the BOW for ELMO. For a batch of sentences, I used the ElmoEmbedder and the corresponding output has shape (b_size, token_len, 1024). What is next, would you elaborate on how to get to one embedding per sentence? I am also planning on testing with SOWE AND MOWE, any insights on that vs the BOW model?