I tried my best to read through the paper and understand details but a few things still confuse me a little:
When I apply ELMO to my task, I should add additional parameters in my own model that is s_task and eta_task. eta_task is a scalar, which is one parameter. s_task is a vector of size L (in the tutorial case L=3), and s_task_j is applied across the length dimension N. Is this correct? So I'm basically only training 4 parameters if I want task-specific representations.
After reading through Where to include ELMo?, I'm confused by the terminology of "input layer" and "output layer"...I assume all the task architectures has their LSTM that takes in ELMO as word embeddings, so input layer makes sense, but how is ELMO used in the output layer?
Yes, in this case adding ELMo only adds 4 additional parameters. The Elmo class here: https://github.com/allenai/allennlp/blob/master/allennlp/modules/elmo.py#L27 will introduce these parameters into your model and apply the weighting scheme (Eqn (1) in the paper).
All of the models in our paper include a task specific LSTM layer. When ELMo is included at the output, we introduce another weighted ELMo layer and concatenate it to the task specific LSTM output layer. This will introduce an additional 4 parameters. The Elmo class has the option to return multiple ELMo layers (num_output_representations parameter), so to use this option set num_output_representations=2.
Hi Matt,
Thank you for such quick answer! I'm following https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md and in this tutorial, I'm using the following code:
from allennlp.commands.elmo import ElmoEmbedder
ee = ElmoEmbedder()
Is there a tutorial on how to directly use Elmo class instead?
Also, if I construct model using Elmo class, I should make sure that Elmo parameters are not updated right? It seems like it's easier if I mix Elmo representations myself and just add 4 parameters to my code :)
We don't have a tutorial on using the Elmo class directly. The ElmoEmbedder is really designed to only be used for computing static embeddings to write to a file. If you are training models then use Elmo. This will allow the 4 weighting parameters to be trainable in the model, but fix all of the pre-trained parameters in the biLM component during training. You could also use the pre-trained biLM directly and introduce your own parameters, but that requires some additional lower level details that you'd also need to worry about -- look inside Elmo class for details.
Cool, thanks :)
Hi @matt-peters ,
What is the input to this task-specific LSTM layer and also how do you computer s_task from this? And also how do you compute lambda_task? It's not clear in the paper
Most helpful comment
We don't have a tutorial on using the
Elmoclass directly. TheElmoEmbedderis really designed to only be used for computing static embeddings to write to a file. If you are training models then useElmo. This will allow the 4 weighting parameters to be trainable in the model, but fix all of the pre-trained parameters in the biLM component during training. You could also use the pre-trained biLM directly and introduce your own parameters, but that requires some additional lower level details that you'd also need to worry about -- look insideElmoclass for details.