First, thanks for the great work. ELMo is a very inspiring model.
But when I start to use it, I have little confusion about the ELMo vector.
According to the tutorial(https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md),
we can easily get the ELMo vectors like:
from allennlp.modules.elmo import Elmo, batch_to_ids
options_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json"
weight_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5"
elmo = Elmo(options_file, weight_file, 2, dropout=0)
# use batch_to_ids to convert sentences to character ids
sentences = [['First', 'sentence', '.'], ['Another', '.']]
character_ids = batch_to_ids(sentences)
embeddings = elmo(character_ids)
# embeddings['elmo_representations'] is length two list of tensors.
# Each element contains one layer of ELMo representations with shape
# (2, 3, 1024).
# 2 - the batch size
# 3 - the sequence length of the batch
# 1024 - the length of each ELMo vector
My confusion comes from the embeddings.
Clearly, the embeddings[0] contains the ELMo vectors of the first layer for the input.
If I understand correctly, embeddings[0][0][0] is a tensor with size 1024 representing the vector for word "First" on the first layer.
However, what i really need is the weighted EMLo vector, i.e. the equation (1) in the paper.
I don't want the vectors for each layer separately. I just need $ELMO_k^{task}$ and pass it to my downstream task.
I can not find examples to construct $ELMO_k^{task}$.
Maybe my understanding is wrong?
Any help will appreciate.
embeddings[0] contains the weighted ELMo vector ($ELMO_k^{task}$, equation (1) from the paper) for both sentences. embeddings[1] contains a different weighted ELMo vector (using different learnable scalar parameters).
@matt-peters
Why there are two weighted ELMo vectors?
The tutorial says :
embeddings['elmo_representations']is length two list of tensors. Each element contains one layer of ELMo representations with shape (2, 3, 1024).
It looks like embeddings['elmo_representations'][i] is the embedding for i-th layer.
We should modify the tutorial to only compute one as the default as this is confusing. The two layers are for the case in some of the models where we included two separate learned weighted layers (e.g. the SNLI model, the sentiment model, etc). In most cases in follow up work, we are just using a single weighted layer as the second only provides small improvement, but can lead to over fitting.
@matt-peters
Thanks for the clarification. This really helps!
The tutorial really confuses me.
Most helpful comment
We should modify the tutorial to only compute one as the default as this is confusing. The two layers are for the case in some of the models where we included two separate learned weighted layers (e.g. the SNLI model, the sentiment model, etc). In most cases in follow up work, we are just using a single weighted layer as the second only provides small improvement, but can lead to over fitting.