Allennlp: Training ELMo Transformer, and using it as an embedder

Created on 12 Aug 2020  路  14Comments  路  Source: allenai/allennlp

Hello! I am trying to train transformer ELMo using my own dataset. I am trying to use this tutorial but it is proving a little difficult for me because of the older version. In particular I had a question regarding the example config file. It has the line "allow_unmatched_keys": true, which is a feature that is now deprecated, and I was wondering how to correct for this as when I try to train the model I'm now getting the error allennlp.common.checks.ConfigurationError: Mismatched token keys: dict_keys(['token_characters']) and dict_keys(['token_characters', 'tokens']) Thank you!

Contributions welcome

All 14 comments

The model still exists, but it was moved to the allennlp-models repository. There is an updated config here: https://github.com/allenai/allennlp-models/blob/master/training_config/lm/bidirectional_language_model.jsonnet

Can you try that?

I don't know for sure that this config will train properly. It takes a long time to train this model, so we can't run tests for it all the time. But if you find that it no longer works in that state, we'll be happy to help you fix it.

@dirkgr
Thank you!

I see that these lines were able to fix this problem

"tokens": {
    "type": "empty"
},

And I was able to do a quick training using the config file you pointed me too. However, now when trying to use these elmo embeddings in a different model, I run into the following error.

allennlp.common.checks.ConfigurationError: LM from tmp/elmo_test1 trained with multiple embedders!

and trying to solve this problem, I am looking at language_model_token_embedder.py that states

 # We don't currently support embedding with language models trained with multiple
 # embedded indices.

Does this have anything to do with the tokens lines of code I referenced above, and if so, do you have any recommendations? Thanks!

@nelson-liu, do you know how this is supposed to be used? It says quite clearly in LanguageModelTokenEmbedder that you should use it with only one indexer, but isn't ELMo always trained with two indexers, one for tokens, and one for characters?

Hmm, I know that I'm listed on the blame, but I think brendan might know more (he wrote this code originally https://github.com/allenai/allennlp/pull/2138 ). Sorry to not be of more help! @brendan-ai2

Appreciate the prompt attention to this.

There appears to be a fundamental incompatibility where we cannot use AllenNLP to train an ELMo langauge model on a custom corpus and then use that with, e.g., an NER tagger.

Hmm, can you try running everything on the older version? i.e., pip install allennlp==0.9.0, and then follow the instructions from the tutorial (since it was written for v0.9.0)

I've looked at the code a little bit. It seems to me that it shouldn't be that difficult to make LanguageModelTokenEmbedder accept multiple indexers. I'd love to review a PR that fixes this!

I've looked at the code a little bit. It seems to me that it shouldn't be that difficult to make LanguageModelTokenEmbedder accept multiple indexers. I'd love to review a PR that fixes this!

Hi @dirkgr would you mind sharing a broad outline of what you are envisioning for extending the LanguageModelTokenEmbedder to accept multiple indexers?

Actually, this doesn't make sense. You don't really have multiple embedders anyways, because the embedder type for "tokens" is "empty". So we just have to make sure the LanguageModelTokenEmbedder doesn't get confused by that.

On line 78, it reads out of dict_config the available embedders. It should be possible at that point to ignore all those that are "empty". In a pinch, I'd accept a PR that just ignores the one called "tokens", because the comment below states categorically that "tokens" is always ignored.

Then you don't have to embark on a complicated re-engineering of LanguageModelTokenEmbedder.

Hi @dirkgr please see allenai/allennlp-models/pull/129

However... after applying that patch to my environment consisting of

allennlp==1.0.0
allennlp-models==1.0.0
torch==1.5.1

I am getting this bizarre error when trying to train an NER model that uses the `bidirectional_lm_token_embedder" :

              "elmo": {
                  "type": "bidirectional_lm_token_embedder",
                  "archive_file": 'tmp/elmo_lm_tran1/model.tar.gz',
                  "dropout": 0.2,
                  "bos_eos_tokens": ["<S>", "</S>"],
                  "remove_bos_eos": true,
                  "requires_grad": false
                }
2020-09-02 20:58:01,244 - INFO - allennlp.training.trainer - Beginning training.
2020-09-02 20:58:01,244 - INFO - allennlp.training.trainer - Epoch 0/21
2020-09-02 20:58:01,244 - INFO - allennlp.training.trainer - Worker 0 memory usage MB: 2877.268
2020-09-02 20:58:01,380 - INFO - allennlp.training.trainer - GPU 0 memory usage MB: 1250
2020-09-02 20:58:01,380 - INFO - allennlp.training.trainer - GPU 1 memory usage MB: 11
2020-09-02 20:58:01,380 - INFO - allennlp.training.trainer - GPU 2 memory usage MB: 11
2020-09-02 20:58:01,380 - INFO - allennlp.training.trainer - GPU 3 memory usage MB: 11
2020-09-02 20:58:01,380 - INFO - allennlp.training.trainer - GPU 4 memory usage MB: 11
2020-09-02 20:58:01,381 - INFO - allennlp.training.trainer - GPU 5 memory usage MB: 11
2020-09-02 20:58:01,381 - INFO - allennlp.training.trainer - GPU 6 memory usage MB: 11
2020-09-02 20:58:01,381 - INFO - allennlp.training.trainer - GPU 7 memory usage MB: 11
2020-09-02 20:58:01,382 - INFO - allennlp.training.trainer - Training
  0%|          | 0/169 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/app/local/anaconda3/envs/ner_pytorch3/bin/allennlp", line 8, in <module>
    sys.exit(run())
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/__main__.py", line 19, in run
    main(prog="allennlp")
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 92, in main
    args.func(args)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/train.py", line 112, in train_model_from_args
    dry_run=args.dry_run,
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/train.py", line 171, in train_model_from_file
    dry_run=dry_run,
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/train.py", line 230, in train_model
    dry_run=dry_run,
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/train.py", line 428, in _train_worker
    metrics = train_loop.run()
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/commands/train.py", line 490, in run
    return self.trainer.train()
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/training/trainer.py", line 802, in train
    train_metrics = self._train_epoch(epoch)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/training/trainer.py", line 564, in _train_epoch
    batch_outputs = self.batch_outputs(batch, for_training=True)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/training/trainer.py", line 462, in batch_outputs
    output_dict = self._pytorch_model(**batch)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/localdata/tony/data-science/ner-refactor-pytorch/src/models/lstm.py", line 46, in forward
    embedded = self._embedder(tokens)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 84, in forward
    token_vectors = embedder(list(tensors.values())[0], **forward_params_values)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp_models/lm/modules/token_embedders/language_model.py", line 191, in forward
    tokens, mask, self._bos_indices, self._eos_indices
  File "/app/local/anaconda3/envs/ner_pytorch3/lib/python3.7/site-packages/allennlp/nn/util.py", line 1471, in add_sentence_boundary_token_ids
    tensor_with_boundary_tokens[i, j + 1, :] = sentence_end_token
RuntimeError: expected device cuda:0 but got device cpu

The trainer section of my jsonnet has this section. No idea why in the world it would be trying to use CPU.

  trainer: {
      num_epochs: 22,
      patience: 10,
      cuda_device: 0,
      grad_clipping: 5.0,
      validation_metric: '-loss',
      optimizer: {
          type: 'adam',
          lr: 0.01
      }
  }

Looks like a separate issue. I'll try stepping up to PyTorch 1.6 + allennlp 1.1.0rc4 and applying the patch to that...

Let me know if this still happens with the latest RC! If it does, it could very well be a bug on our end that I'll want to fix quickly.

Let me know if this still happens with the latest RC! If it does, it could very well be a bug on our end that I'll want to fix quickly.

Hi @dirkgr I am having a _different_ problem with the latest RC... With 1.1.0rc4, I cannot even train the language model, nevermind a downstream NER model that uses the LM. Opened allenai/allennlp/issues/4623 to track that issue.

As a workaround to _that_, I tried training the LM with PyTorch 1.5.1 + AllenNLP 1.0, and then switched to back to PyTorch 1.6 + AllenNLP 1.1.0RC4. Then applied the patch from allenai/allennlp-models/pull/129

I got yet another error when trying to train the NER model that uses the ELMo Transformer:

2020-09-03 14:26:59,859 - INFO - allennlp.training.trainer - Beginning training.
2020-09-03 14:26:59,859 - INFO - allennlp.training.trainer - Epoch 0/21
2020-09-03 14:26:59,859 - INFO - allennlp.training.trainer - Worker 0 memory usage MB: 3117.996
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 0 memory usage MB: 1314
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 1 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 2 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 3 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 4 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 5 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 6 memory usage MB: 11
2020-09-03 14:27:00,001 - INFO - allennlp.training.trainer - GPU 7 memory usage MB: 11
2020-09-03 14:27:00,003 - INFO - allennlp.training.trainer - Training
  0%|          | 0/169 [00:00<?, ?it/s]
2020-09-03 14:27:00,050 - CRITICAL - root - Uncaught exception
Traceback (most recent call last):
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/bin/allennlp", line 8, in <module>
    sys.exit(run())
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/__main__.py", line 34, in run
    main(prog="allennlp")
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 92, in main
    args.func(args)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 118, in train_model_from_args
    file_friendly_logging=args.file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 177, in train_model_from_file
    file_friendly_logging=file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 238, in train_model
    file_friendly_logging=file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 439, in _train_worker
    metrics = train_loop.run()
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 501, in run
    return self.trainer.train()
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 867, in train
    train_metrics = self._train_epoch(epoch)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 589, in _train_epoch
    batch_outputs = self.batch_outputs(batch, for_training=True)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 479, in batch_outputs
    output_dict = self._pytorch_model(**batch)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/localdata/tony/data-science/ner-refactor-pytorch/src/models/lstm.py", line 46, in forward
    embedded = self._embedder(tokens)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 84, in forward
    token_vectors = embedder(list(tensors.values())[0], **forward_params_values)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp_models/lm/modules/token_embedders/language_model.py", line 179, in forward
    tokens, mask, self._bos_indices, self._eos_indices
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/nn/util.py", line 1471, in add_sentence_boundary_token_ids
    tensor_with_boundary_tokens[i, j + 1, :] = sentence_end_token
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
2020-09-03 14:27:00,054 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpykdm068k

Hopefully the fix from @epwalsh for /issues/4623 will also fix that...

Hi @dirkgr , with this fix, looks like we are able to get a little further, but the error referenced above is still happening...

My environment is:

allennlp==1.1.0
allennlp-models==1.1.0
torch==1.6.0
torchvision==0.7.0

plus patches for allenai/allennlp-models#129 and #4632 applied.

After training the ELMo transformer language model, I still cannot use it as an embedder within an NER tagger model.

2020-09-16 17:06:02,589 - INFO - allennlp.training.trainer - Beginning training.
2020-09-16 17:06:02,589 - INFO - allennlp.training.trainer - Epoch 0/21
2020-09-16 17:06:02,589 - INFO - allennlp.training.trainer - Worker 0 memory usage MB: 3118.312
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 0 memory usage MB: 1314
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 1 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 2 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 3 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 4 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 5 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 6 memory usage MB: 11
2020-09-16 17:06:02,730 - INFO - allennlp.training.trainer - GPU 7 memory usage MB: 11
2020-09-16 17:06:02,732 - INFO - allennlp.training.trainer - Training
  0%|          | 0/169 [00:00<?, ?it/s]
2020-09-16 17:06:02,784 - CRITICAL - root - Uncaught exception
Traceback (most recent call last):
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/bin/allennlp", line 8, in <module>
    sys.exit(run())
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/__main__.py", line 34, in run
    main(prog="allennlp")
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 92, in main
    args.func(args)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 118, in train_model_from_args
    file_friendly_logging=args.file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 177, in train_model_from_file
    file_friendly_logging=file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 238, in train_model
    file_friendly_logging=file_friendly_logging,
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 443, in _train_worker
    metrics = train_loop.run()
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/commands/train.py", line 505, in run
    return self.trainer.train()
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 867, in train
    train_metrics = self._train_epoch(epoch)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 589, in _train_epoch
    batch_outputs = self.batch_outputs(batch, for_training=True)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/training/trainer.py", line 479, in batch_outputs
    output_dict = self._pytorch_model(**batch)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/localdata/tony/data-science/ner-refactor-pytorch/src/models/lstm.py", line 46, in forward
    embedded = self._embedder(tokens)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 84, in forward
    token_vectors = embedder(list(tensors.values())[0], **forward_params_values)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp_models/lm/modules/token_embedders/language_model.py", line 187, in forward
    tokens, mask, self._bos_indices, self._eos_indices
  File "/app/local/anaconda3/envs/torch1.6_allennlp1.1/lib/python3.7/site-packages/allennlp/nn/util.py", line 1565, in add_sentence_boundary_token_ids
    tensor_with_boundary_tokens[i, j + 1, :] = sentence_end_token
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
2020-09-16 17:06:02,789 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmpobwn4xyg

Do you want to reopen this or create a new issue?

Just to give closure here on this issue, @tpanza helpfully made another PR in #4761, and that's where we're continuing this saga.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

DanBigioi picture DanBigioi  路  4Comments

nitishgupta picture nitishgupta  路  3Comments

silencemaker picture silencemaker  路  4Comments

stefan-it picture stefan-it  路  4Comments

matt-gardner picture matt-gardner  路  3Comments