Allennlp: Question: Selected Part of elmo embedding

Created on 31 Oct 2018  路  10Comments  路  Source: allenai/allennlp

Is it possible to use the character embedding only(I want to exclude word embedding and context embedding) for NER task?

Most helpful comment

What do you want to do with those two layers? Average them? Or use them separately for different things? If you want to average them, then [50.0, 50.0, 0.0] would work. If you want to use them separately, you could use two Elmo classes, or you could set num_output_representations=2, though it looks like we can't currently configure different scalar mixes for multiple output representations... It looks like you could hack the weights for the scalar mix after calling the constructor - using num_output_representations=2 is really the right way to go here, because it will only load the ELMo weights once, though it unfortunately requires some hacking to do what you want. See here to get some idea of the hacking that would be required.

Alternatively, depending on your goals, something like @nelson-liu's contextual analysis repo might be a good fit.

All 10 comments

I'm a little bit confused by what you want---do you just want to use the output of the pretrained elmo character convnet as your representation?

If so, you can add the key "scalar_mix_parameters": [1.0, 0.0, 0.0] to the elmo specification in the config, and the representations returned by elmo will just be the character-convnet-generated vectors.

Closing this for now, feel free to reopen if that doesn't answer your question.

Actually, I want to train a NER and use elmo embedding for that. Also, I am thinking if there is any way to do the embedding feature ablation, like: 1) use the character embedding 2) use the word embedding only 3) use the context embedding only

Yes, you can accomplish this though the method that Nelson mentioned. However, note that we apply a softmax to the scalar_mix_parameters so you need to specify parameters as e.g. "scalar_mix_parameters": [50.0, 0.0, 0.0] to only select the first layer.

Yes, you can accomplish this though the method that Nelson mentioned. However, note that we apply a softmax to the scalar_mix_parameters so you need to specify parameters as e.g. "scalar_mix_parameters": [50.0, 0.0, 0.0] to only select the first layer.

May I ask how to select the FIRST and the SECOND layer? Do I have to define two Elmo class with scalar_mix_parameters=[50.0, 0.0, 0.0] and [0.0, 50.0, 0.0] respectively?

What do you want to do with those two layers? Average them? Or use them separately for different things? If you want to average them, then [50.0, 50.0, 0.0] would work. If you want to use them separately, you could use two Elmo classes, or you could set num_output_representations=2, though it looks like we can't currently configure different scalar mixes for multiple output representations... It looks like you could hack the weights for the scalar mix after calling the constructor - using num_output_representations=2 is really the right way to go here, because it will only load the ELMo weights once, though it unfortunately requires some hacking to do what you want. See here to get some idea of the hacking that would be required.

Alternatively, depending on your goals, something like @nelson-liu's contextual analysis repo might be a good fit.

Opening this again as I also faced this issue, and there might be a simple solution.
In https://github.com/allenai/allennlp/blob/master/allennlp/modules/token_embedders/elmo_token_embedder.py#L62 there's a hard coded parameter which sets the number of resulting layers, currently it's set to 1. Can't we simply add a customized parameter? It can be set to 1 as default to not break things, but then users will have control on how many layers to get back from elmo.

I forget the specifics here, so someone please correct me if I'm wrong, but I thought that the TokenEmbedder API was meant to go from tokens -> vector for each token. If you want multiple vectors for each token, why not just use the Elmo class directly?

FWIW, I could see it being a bit less convenient, since you might want to easily swap out elmo / bert / whatever pretrained model you're using---is that the issue here?

yep.
I'd like to get an easy access to different layers of Elmo

As @nelson-liu said, the TokenEmbedder API assumes there is a single vector output for each token, so adding the ability to return multiple layers will break that API. You will need to use the Elmo class directly to access all of the individual layers.

what about concatenating everything and let the user what to do with it? or at least give the option to give back everything?
I think it can be determined by a simple parameter.

Was this page helpful?
0 / 5 - 0 ratings