Allennlp: Confusion about ELMo vectors

Created on 19 Sep 2018 · 4Comments · Source: allenai/allennlp

First, thanks for the great work. ELMo is a very inspiring model.
But when I start to use it, I have little confusion about the ELMo vector.

According to the tutorial(https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md),
we can easily get the ELMo vectors like:

from allennlp.modules.elmo import Elmo, batch_to_ids

options_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json"
weight_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5"

elmo = Elmo(options_file, weight_file, 2, dropout=0)

# use batch_to_ids to convert sentences to character ids
sentences = [['First', 'sentence', '.'], ['Another', '.']]
character_ids = batch_to_ids(sentences)

embeddings = elmo(character_ids)

# embeddings['elmo_representations'] is length two list of tensors.
# Each element contains one layer of ELMo representations with shape
# (2, 3, 1024).
#   2    - the batch size
#   3    - the sequence length of the batch
#   1024 - the length of each ELMo vector

My confusion comes from the embeddings.
Clearly, the embeddings[0] contains the ELMo vectors of the first layer for the input.
If I understand correctly, embeddings[0][0][0] is a tensor with size 1024 representing the vector for word "First" on the first layer.
However, what i really need is the weighted EMLo vector, i.e. the equation (1) in the paper.
I don't want the vectors for each layer separately. I just need $ELMO_k^{task}$ and pass it to my downstream task.
I can not find examples to construct $ELMO_k^{task}$.
Maybe my understanding is wrong?
Any help will appreciate.

Source

flyaway1217

Most helpful comment

We should modify the tutorial to only compute one as the default as this is confusing. The two layers are for the case in some of the models where we included two separate learned weighted layers (e.g. the SNLI model, the sentiment model, etc). In most cases in follow up work, we are just using a single weighted layer as the second only provides small improvement, but can lead to over fitting.

matt-peters on 20 Sep 2018

👍3

All 4 comments

embeddings[0] contains the weighted ELMo vector ($ELMO_k^{task}$, equation (1) from the paper) for both sentences. embeddings[1] contains a different weighted ELMo vector (using different learnable scalar parameters).

matt-peters on 19 Sep 2018

@matt-peters
Why there are two weighted ELMo vectors?
The tutorial says :