Flair: Running predict on CPU using ELMo Embeddings

Created on 15 Mar 2019 · 6Comments · Source: flairNLP/flair

I'm trying to run predict in a trained model using CPU device. I set flair.device to CPU as described here:

And then I tried to force the model's embeddings to use CPU, similar to what is suggested here. Here is the error I got:

Traceback (most recent call last):
File "/home/pedro/repositorios/flair/script.py", line 29, in
sentences = predict_all_sentences()
File "/home/pedro/repositorios/flair/script.py", line 25, in predict_all_sentences
return tagger.predict(corpus.get_all_sentences())
File "/home/pedro/repositorios/flair/flair/models/sequence_tagger_model.py", line 300, in predict
tags, _ = self.forward_labels_and_loss(batch, sort=False)
File "/home/pedro/repositorios/flair/flair/models/sequence_tagger_model.py", line 268, in forward_labels_and_loss
feature, lengths, tags = self.forward(sentences, sort=sort)
File "/home/pedro/repositorios/flair/flair/models/sequence_tagger_model.py", line 315, in forward
self.embeddings.embed(sentences)
File "/home/pedro/repositorios/flair/flair/embeddings.py", line 130, in embed
embedding.embed(sentences)
File "/home/pedro/repositorios/flair/flair/embeddings.py", line 63, in embed
self._add_embeddings_internal(sentences)
File "/home/pedro/repositorios/flair/flair/embeddings.py", line 399, in _add_embeddings_internal
embeddings = self.ee.embed_batch(sentence_words)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/allennlp/commands/elmo.py", line 244, in embed_batch
embeddings, mask = self.batch_to_embeddings(batch)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/allennlp/commands/elmo.py", line 186, in batch_to_embeddings
bilm_output = self.elmo_bilm(character_ids)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(input, *kwargs)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/allennlp/modules/elmo.py", line 605, in forward
token_embedding = self._token_embedder(inputs)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(input, *kwargs)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/allennlp/modules/elmo.py", line 357, in forward
self._char_embedding_weights
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/torch/nn/functional.py", line 1454, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #3 'index'

Here is the code:

flair.device = torch.device('cpu')

corpus: TaggedCorpus = NLPTaskDataFetcher.load_column_corpus(...)

tagger = SequenceTagger.load_from_file('/models/flair/ner-pt-flair-elmo/best-model.pt')
tagger.embeddings.cpu()

def predict_all_sentences():
    return tagger.predict(corpus.get_all_sentences())

if __name__ == '__main__':
    sentences = predict_all_sentences()
    for sentence in sentences:
        print(sentence)

OS Linux
Version flair-0.4.1

The same code works for models trained using only FlairEmbeddings, even without the need of specifying tagger.embeddings.cpu(). My guess is that it's necessary to do something else to ELMo in order to force it using CPU, but I'm not sure what could it be.

Thanks!

bug wontfix

Source

pvcastro

Most helpful comment

I'm not sure if there has been any progress on this issue, but I managed to train on GPU and predict on CPU.

I was getting an error quite different from OP's, but I suppose that's due to a different hardware setup. The error was this: (sorry for the print, I can't copy-paste from remote desktop).

Problem (in my case)

I started investigating the transition between Flair's and AllenNLP's calls. ELMoEmbeddings checks if a cuda device is available and then passes this info to AllenNLP's ElmoEmbedder.

https://github.com/flairNLP/flair/blob/7cda2f280bce5b2589db5601e4b358f5ec0f7613/flair/embeddings.py#L821-L833

ElmoEmbedder then saves the cuda device as an attribute and uses it both on __init__ and on allennlp/commands/elmo.py#batch_to_embeddings():

def __init__(
        self,
        options_file: str = DEFAULT_OPTIONS_FILE,
        weight_file: str = DEFAULT_WEIGHT_FILE,
        cuda_device: int = -1,
    ) -> None:
    # ...
    if cuda_device >= 0:
        self.elmo_bilm = self.elmo_bilm.cuda(device=cuda_device)

    self.cuda_device = cuda_device

def batch_to_embeddings(self, batch: List[List[str]]) -> Tuple[torch.Tensor, torch.Tensor]:
    # ...
    if self.cuda_device >= 0:
        character_ids = character_ids.cuda(device=self.cuda_device)

When the SequenceTagger is saved, ElmoEmbedder also get's pickled with the cuda_device set at training time. Consequently, when it is unpickled, it still thinks it should use the training device.

Workaround

In my case, I'm using StackedEmbeddings. So model.embeddings.embeddings access the list of embeddings, where the first one is an ELMoEmbeddings.

model = SequenceTagger.load(load_path)
model.embeddings.embeddings[0].ee.cuda_device = -1
# model.predict()

Fix

I think that the fix should come from Flair and not from AllenNLP. Flair is using the allennlp/commands/elmo.py module, which was intended to be run in the command line with its companion allennlp/commands/predict.py. When using these modules in the CLI, cuda_device is specified both at training and prediction time.

I suppose the fix should be to change the load method in flair.nn.Model or override it on the SequenceTagger class. There, we should check for an ELMoEmbeddings class and set its cuda_device properly.

Another alternative would be to replace the usage of allennlp/commands/elmo.py with allenlp.modules.elmo._ElmoBiLm. But this seems unnecessary work since we would need to reproduce the manipulations from allennlp/commands/elmo.py.

falcaopetri on 11 Feb 2020

👍2

All 6 comments

Hello @pvcastro yes you are right, the flair.device parameter only affects the flair embeddings and modules, but not external libraries such as ELMo or BERT. This is probably something we should sort out in the next version and would need to take a closer look at the dependent libaries.

Have you found out how to make ELMo use CPU instead? @stefan-it do you happen to know?

alanakbik on 18 Mar 2019

The trainer from allennlp does this:

def _move_to_gpu(self, model: Model) -> Model:
        if self._cuda_devices[0] != -1:
            return model.cuda(self._cuda_devices[0])
        else:
            return model

Doing this, I believe it uses "_apply" to the whole tree of objects in the model. Which is the same thing calling tagger.embeddings.cpu() in flair was supposed to do :thinking:
I'm not sure how I could override the pre-trained ELMo embeddings config to use cpu as well.

pvcastro on 18 Mar 2019

Ok thanks, we'll take a look at this for version 0.5 (development starts mid April when everyone is back from vacation). If you find our more, please share here!