Allennlp: Training ELMo Transformer, and using it as an embedder

Created on 12 Aug 2020 · 14Comments · Source: allenai/allennlp

Hello! I am trying to train transformer ELMo using my own dataset. I am trying to use this tutorial but it is proving a little difficult for me because of the older version. In particular I had a question regarding the example config file. It has the line "allow_unmatched_keys": true, which is a feature that is now deprecated, and I was wondering how to correct for this as when I try to train the model I'm now getting the error allennlp.common.checks.ConfigurationError: Mismatched token keys: dict_keys(['token_characters']) and dict_keys(['token_characters', 'tokens']) Thank you!

Contributions welcome

Source

lianna1016

👍1

All 14 comments

The model still exists, but it was moved to the allennlp-models repository. There is an updated config here: https://github.com/allenai/allennlp-models/blob/master/training_config/lm/bidirectional_language_model.jsonnet

Can you try that?

I don't know for sure that this config will train properly. It takes a long time to train this model, so we can't run tests for it all the time. But if you find that it no longer works in that state, we'll be happy to help you fix it.

dirkgr on 14 Aug 2020

@dirkgr
Thank you!

I see that these lines were able to fix this problem

"tokens": {
    "type": "empty"
},

And I was able to do a quick training using the config file you pointed me too. However, now when trying to use these elmo embeddings in a different model, I run into the following error.

allennlp.common.checks.ConfigurationError: LM from tmp/elmo_test1 trained with multiple embedders!

and trying to solve this problem, I am looking at language_model_token_embedder.py that states

 # We don't currently support embedding with language models trained with multiple
 # embedded indices.

Does this have anything to do with the tokens lines of code I referenced above, and if so, do you have any recommendations? Thanks!

lianna1016 on 14 Aug 2020

👍1

@nelson-liu, do you know how this is supposed to be used? It says quite clearly in LanguageModelTokenEmbedder that you should use it with only one indexer, but isn't ELMo always trained with two indexers, one for tokens, and one for characters?

dirkgr on 18 Aug 2020

👀1

Hmm, I know that I'm listed on the blame, but I think brendan might know more (he wrote this code originally https://github.com/allenai/allennlp/pull/2138 ). Sorry to not be of more help! @brendan-ai2

nelson-liu on 18 Aug 2020

👀1

Appreciate the prompt attention to this.

There appears to be a fundamental incompatibility where we cannot use AllenNLP to train an ELMo langauge model on a custom corpus and then use that with, e.g., an NER tagger.

tpanza on 18 Aug 2020

Hmm, can you try running everything on the older version? i.e., pip install allennlp==0.9.0, and then follow the instructions from the tutorial (since it was written for v0.9.0)

nelson-liu on 18 Aug 2020

I've looked at the code a little bit. It seems to me that it shouldn't be that difficult to make LanguageModelTokenEmbedder accept multiple indexers. I'd love to review a PR that fixes this!

dirkgr on 19 Aug 2020

I've looked at the code a little bit. It seems to me that it shouldn't be that difficult to make LanguageModelTokenEmbedder accept multiple indexers. I'd love to review a PR that fixes this!

Hi @dirkgr would you mind sharing a broad outline of what you are envisioning for extending the LanguageModelTokenEmbedder to accept multiple indexers?

tpanza on 29 Aug 2020

Actually, this doesn't make sense. You don't really have multiple embedders anyways, because the embedder type for "tokens" is "empty". So we just have to make sure the LanguageModelTokenEmbedder doesn't get confused by that.

On line 78, it reads out of dict_config the available embedders. It should be possible at that point to ignore all those that are "empty". In a pinch, I'd accept a PR that just ignores the one called "tokens", because the comment below states categorically that "tokens" is always ignored.

Then you don't have to embark on a complicated re-engineering of LanguageModelTokenEmbedder.

dirkgr on 2 Sep 2020

👀1

Hi @dirkgr please see allenai/allennlp-models/pull/129

However... after applying that patch to my environment consisting of

allennlp==1.0.0
allennlp-models==1.0.0
torch==1.5.1

I am getting this bizarre error when trying to train an NER model that uses the `bidirectional_lm_token_embedder" :

              "elmo": {
                  "type": "bidirectional_lm_token_embedder",
                  "archive_file": 'tmp/elmo_lm_tran1/model.tar.gz',
                  "dropout": 0.2,
                  "bos_eos_tokens": ["<S>", "</S>"],
                  "remove_bos_eos": true,
                  "requires_grad": false
                }

2020-09-02 20:58:01,244 - INFO - allennlp.training.trainer - Beginning training.
2020-09-02 20:58:01,244 - INFO - allennlp.training.trainer - Epoch 0/21
2020-09-02 20:58:01,244 - INFO - allennlp.training.trainer - Worker 0 memory usage MB: 2877.268
2020-09-02 20:58:01,380 - INFO - allennlp.training.trainer - GPU 0 memory usage MB: 1250
2020-09-02 20:58:01,380 - INFO - allennlp.training.trainer - GPU 1 memory usage MB: 11
2020-09-02 20:58:01,380 - INFO - allennlp.training.trainer - GPU 2 memory usage MB: 11
2020-09-02 20:58:01,380 - INFO - allennlp.training.trainer - GPU 3 memory usage MB: 11
2020-09-02 20:58:01,380 - INFO - allennlp.training.trainer - GPU 4 memory usage MB: 11
2020-09-02 20:58:01,381 - INFO - allennlp.training.trainer - GPU 5 memory usage MB: 11
2020-09-02 20:58:01,381 - INFO - allennlp.training.trainer - GPU 6 memory usage MB: 11
2020-09-02 20:58:01,381 - INFO - allennlp.training.trainer - GPU 7 memory usage MB: 11
2020-09-02 20:58:01,382 - INFO - allennlp.training.trainer - Training
  0%|          | 0/169 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/app/local/anaconda3/envs/ner_pytorch3/bin/allennlp", line 8, in <module>
    sys.exit(run())
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/__main__.py", line 19, in run
    main(prog="allennlp")
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 92, in main
    args.func(args)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/train.py", line 112, in train_model_from_args
    dry_run=args.dry_run,
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/train.py", line 171, in train_model_from_file
    dry_run=dry_run,
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/train.py", line 230, in train_model
    dry_run=dry_run,
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/train.py", line 428, in _train_worker
    metrics = train_loop.run()
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/train.py", line 490, in run
    return self.trainer.train()
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/training/trainer.py", line 802, in train
    train_metrics = self._train_epoch(epoch)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/training/trainer.py", line 564, in _train_epoch
    batch_outputs = self.batch_outputs(batch, for_training=True)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/training/trainer.py", line 462, in batch_outputs
    output_dict = self._pytorch_model(**batch)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/localdata/tony/data-science/ner-refactor-pytorch/src/models/lstm.py", line 46, in forward
    embedded = self._embedder(tokens)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 84, in forward
    token_vectors = embedder(list(tensors.values())[0], **forward_params_values)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp_models/lm/modules/token_embedders/language_model.py", line 191, in forward
    tokens, mask, self._bos_indices, self._eos_indices
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/nn/util.py", line 1471, in add_sentence_boundary_token_ids
    tensor_with_boundary_tokens[i, j + 1, :] = sentence_end_token
RuntimeError: expected device cuda:0 but got device cpu

The trainer section of my jsonnet has this section. No idea why in the world it would be trying to use CPU.

  trainer: {
      num_epochs: 22,
      patience: 10,
      cuda_device: 0,
      grad_clipping: 5.0,
      validation_metric: '-loss',
      optimizer: {
          type: 'adam',
          lr: 0.01
      }
  }

Looks like a separate issue. I'll try stepping up to PyTorch 1.6 + allennlp 1.1.0rc4 and applying the patch to that...

tpanza on 3 Sep 2020

Let me know if this still happens with the latest RC! If it does, it could very well be a bug on our end that I'll want to fix quickly.

dirkgr on 3 Sep 2020

Let me know if this still happens with the latest RC! If it does, it could very well be a bug on our end that I'll want to fix quickly.

Hi @dirkgr I am having a _different_ problem with the latest RC... With 1.1.0rc4, I cannot even train the language model, nevermind a downstream NER model that uses the LM. Opened allenai/allennlp/issues/4623 to track that issue.

As a workaround to _that_, I tried training the LM with PyTorch 1.5.1 + AllenNLP 1.0, and then switched to back to PyTorch 1.6 + AllenNLP 1.1.0RC4. Then applied the patch from allenai/allennlp-models/pull/129

I got yet another error when trying to train the NER model that uses the ELMo Transformer:

2020-09-03 14:26:59,859 - INFO - allennlp.training.trainer - Beginning training.
2020-09-03 14:26:59,859 - INFO - allennlp.training.trainer - Epoch 0/21
2020-09-03 14:26:59,859 - INFO - allennlp.training.trainer - Worker 0 memory usage MB: 3117.996
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 0 memory usage MB: 1314
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 1 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 2 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 3 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 4 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 5 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 6 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 7 memory usage MB: 11
2020-09-03 14:27:00,003 - INFO - allennlp.training.trainer - Training
  0%|          | 0/169 [00:00<?, ?it/s]
2020-09-03 14:27:00,050 - CRITICAL - root - Uncaught exception
Traceback (most recent call last):
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/bin/allennlp", line 8, in <module>
    sys.exit(run())
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/__main__.py", line 34, in run
    main(prog="allennlp")
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 92, in main
    args.func(args)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 118, in train_model_from_args
    file_friendly_logging=args.file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 177, in train_model_from_file
    file_friendly_logging=file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 238, in train_model
    file_friendly_logging=file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 439, in _train_worker
    metrics = train_loop.run()
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 501, in run
    return self.trainer.train()
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 867, in train
    train_metrics = self._train_epoch(epoch)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 589, in _train_epoch
    batch_outputs = self.batch_outputs(batch, for_training=True)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 479, in batch_outputs
    output_dict = self._pytorch_model(**batch)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/localdata/tony/data-science/ner-refactor-pytorch/src/models/lstm.py", line 46, in forward
    embedded = self._embedder(tokens)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 84, in forward
    token_vectors = embedder(list(tensors.values())[0], **forward_params_values)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp_models/lm/modules/token_embedders/language_model.py", line 179, in forward
    tokens, mask, self._bos_indices, self._eos_indices
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/nn/util.py", line 1471, in add_sentence_boundary_token_ids
    tensor_with_boundary_tokens[i, j + 1, :] = sentence_end_token
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
2020-09-03 14:27:00,054 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpykdm068k

Hopefully the fix from @epwalsh for /issues/4623 will also fix that...

tpanza on 3 Sep 2020

Hi @dirkgr , with this fix, looks like we are able to get a little further, but the error referenced above is still happening...

My environment is:

allennlp==1.1.0
allennlp-models==1.1.0
torch==1.6.0
torchvision==0.7.0

plus patches for allenai/allennlp-models#129 and #4632 applied.

After training the ELMo transformer language model, I still cannot use it as an embedder within an NER tagger model.

2020-09-16 17:06:02,589 - INFO - allennlp.training.trainer - Beginning training.
2020-09-16 17:06:02,589 - INFO - allennlp.training.trainer - Epoch 0/21
2020-09-16 17:06:02,589 - INFO - allennlp.training.trainer - Worker 0 memory usage MB: 3118.312
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 0 memory usage MB: 1314
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 1 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 2 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 3 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 4 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 5 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 6 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 7 memory usage MB: 11
2020-09-16 17:06:02,732 - INFO - allennlp.training.trainer - Training
  0%|          | 0/169 [00:00<?, ?it/s]
2020-09-16 17:06:02,784 - CRITICAL - root - Uncaught exception
Traceback (most recent call last):
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/bin/allennlp", line 8, in <module>
    sys.exit(run())
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/__main__.py", line 34, in run
    main(prog="allennlp")
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 92, in main
    args.func(args)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 118, in train_model_from_args
    file_friendly_logging=args.file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 177, in train_model_from_file
    file_friendly_logging=file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 238, in train_model
    file_friendly_logging=file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 443, in _train_worker
    metrics = train_loop.run()
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 505, in run
    return self.trainer.train()
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 867, in train
    train_metrics = self._train_epoch(epoch)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 589, in _train_epoch
    batch_outputs = self.batch_outputs(batch, for_training=True)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 479, in batch_outputs
    output_dict = self._pytorch_model(**batch)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/localdata/tony/data-science/ner-refactor-pytorch/src/models/lstm.py", line 46, in forward
    embedded = self._embedder(tokens)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 84, in forward
    token_vectors = embedder(list(tensors.values())[0], **forward_params_values)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp_models/lm/modules/token_embedders/language_model.py", line 187, in forward
    tokens, mask, self._bos_indices, self._eos_indices
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/nn/util.py", line 1565, in add_sentence_boundary_token_ids
    tensor_with_boundary_tokens[i, j + 1, :] = sentence_end_token
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
2020-09-16 17:06:02,789 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpobwn4xyg

Do you want to reopen this or create a new issue?

tpanza on 17 Sep 2020

Just to give closure here on this issue, @tpanza helpfully made another PR in #4761, and that's where we're continuing this saga.

dirkgr on 4 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings