I'm trying to run predict in a trained model using CPU device. I set flair.device to CPU as described here:
And then I tried to force the model's embeddings to use CPU, similar to what is suggested here. Here is the error I got:
Traceback (most recent call last):
File "/home/pedro/repositorios/flair/script.py", line 29, in
sentences = predict_all_sentences()
File "/home/pedro/repositorios/flair/script.py", line 25, in predict_all_sentences
return tagger.predict(corpus.get_all_sentences())
File "/home/pedro/repositorios/flair/flair/models/sequence_tagger_model.py", line 300, in predict
tags, _ = self.forward_labels_and_loss(batch, sort=False)
File "/home/pedro/repositorios/flair/flair/models/sequence_tagger_model.py", line 268, in forward_labels_and_loss
feature, lengths, tags = self.forward(sentences, sort=sort)
File "/home/pedro/repositorios/flair/flair/models/sequence_tagger_model.py", line 315, in forward
self.embeddings.embed(sentences)
File "/home/pedro/repositorios/flair/flair/embeddings.py", line 130, in embed
embedding.embed(sentences)
File "/home/pedro/repositorios/flair/flair/embeddings.py", line 63, in embed
self._add_embeddings_internal(sentences)
File "/home/pedro/repositorios/flair/flair/embeddings.py", line 399, in _add_embeddings_internal
embeddings = self.ee.embed_batch(sentence_words)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/allennlp/commands/elmo.py", line 244, in embed_batch
embeddings, mask = self.batch_to_embeddings(batch)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/allennlp/commands/elmo.py", line 186, in batch_to_embeddings
bilm_output = self.elmo_bilm(character_ids)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(input, *kwargs)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/allennlp/modules/elmo.py", line 605, in forward
token_embedding = self._token_embedder(inputs)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(input, *kwargs)
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/allennlp/modules/elmo.py", line 357, in forward
self._char_embedding_weights
File "/home/pedro/anaconda3/envs/flair/lib/python3.6/site-packages/torch/nn/functional.py", line 1454, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #3 'index'
Here is the code:
flair.device = torch.device('cpu')
corpus: TaggedCorpus = NLPTaskDataFetcher.load_column_corpus(...)
tagger = SequenceTagger.load_from_file('/models/flair/ner-pt-flair-elmo/best-model.pt')
tagger.embeddings.cpu()
def predict_all_sentences():
return tagger.predict(corpus.get_all_sentences())
if __name__ == '__main__':
sentences = predict_all_sentences()
for sentence in sentences:
print(sentence)
The same code works for models trained using only FlairEmbeddings, even without the need of specifying tagger.embeddings.cpu(). My guess is that it's necessary to do something else to ELMo in order to force it using CPU, but I'm not sure what could it be.
Thanks!
Hello @pvcastro yes you are right, the flair.device parameter only affects the flair embeddings and modules, but not external libraries such as ELMo or BERT. This is probably something we should sort out in the next version and would need to take a closer look at the dependent libaries.
Have you found out how to make ELMo use CPU instead? @stefan-it do you happen to know?
The trainer from allennlp does this:
def _move_to_gpu(self, model: Model) -> Model:
if self._cuda_devices[0] != -1:
return model.cuda(self._cuda_devices[0])
else:
return model
Doing this, I believe it uses "_apply" to the whole tree of objects in the model. Which is the same thing calling tagger.embeddings.cpu() in flair was supposed to do :thinking:
I'm not sure how I could override the pre-trained ELMo embeddings config to use cpu as well.
Ok thanks, we'll take a look at this for version 0.5 (development starts mid April when everyone is back from vacation). If you find our more, please share here!
I'll look into it :)
I'm not sure if there has been any progress on this issue, but I managed to train on GPU and predict on CPU.
I was getting an error quite different from OP's, but I suppose that's due to a different hardware setup. The error was this: (sorry for the print, I can't copy-paste from remote desktop).

I started investigating the transition between Flair's and AllenNLP's calls. ELMoEmbeddings checks if a cuda device is available and then passes this info to AllenNLP's ElmoEmbedder.
ElmoEmbedder then saves the cuda device as an attribute and uses it both on __init__ and on allennlp/commands/elmo.py#batch_to_embeddings():
def __init__(
self,
options_file: str = DEFAULT_OPTIONS_FILE,
weight_file: str = DEFAULT_WEIGHT_FILE,
cuda_device: int = -1,
) -> None:
# ...
if cuda_device >= 0:
self.elmo_bilm = self.elmo_bilm.cuda(device=cuda_device)
self.cuda_device = cuda_device
def batch_to_embeddings(self, batch: List[List[str]]) -> Tuple[torch.Tensor, torch.Tensor]:
# ...
if self.cuda_device >= 0:
character_ids = character_ids.cuda(device=self.cuda_device)
When the SequenceTagger is saved, ElmoEmbedder also get's pickled with the cuda_device set at training time. Consequently, when it is unpickled, it still thinks it should use the training device.
In my case, I'm using StackedEmbeddings. So model.embeddings.embeddings access the list of embeddings, where the first one is an ELMoEmbeddings.
model = SequenceTagger.load(load_path)
model.embeddings.embeddings[0].ee.cuda_device = -1
# model.predict()
I think that the fix should come from Flair and not from AllenNLP. Flair is using the allennlp/commands/elmo.py module, which was intended to be run in the command line with its companion allennlp/commands/predict.py. When using these modules in the CLI, cuda_device is specified both at training and prediction time.
I suppose the fix should be to change the load method in flair.nn.Model or override it on the SequenceTagger class. There, we should check for an ELMoEmbeddings class and set its cuda_device properly.
Another alternative would be to replace the usage of allennlp/commands/elmo.py with allenlp.modules.elmo._ElmoBiLm. But this seems unnecessary work since we would need to reproduce the manipulations from allennlp/commands/elmo.py.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
I'm not sure if there has been any progress on this issue, but I managed to train on GPU and predict on CPU.
I was getting an error quite different from OP's, but I suppose that's due to a different hardware setup. The error was this: (sorry for the print, I can't copy-paste from remote desktop).
Problem (in my case)
I started investigating the transition between Flair's and AllenNLP's calls.
ELMoEmbeddingschecks if a cuda device is available and then passes this info to AllenNLP'sElmoEmbedder.https://github.com/flairNLP/flair/blob/7cda2f280bce5b2589db5601e4b358f5ec0f7613/flair/embeddings.py#L821-L833
ElmoEmbedderthen saves the cuda device as an attribute and uses it both on__init__and onallennlp/commands/elmo.py#batch_to_embeddings():When the
SequenceTaggeris saved,ElmoEmbedderalso get's pickled with thecuda_deviceset at training time. Consequently, when it is unpickled, it still thinks it should use the training device.Workaround
In my case, I'm using StackedEmbeddings. So
model.embeddings.embeddingsaccess the list of embeddings, where the first one is an ELMoEmbeddings.Fix
I think that the fix should come from Flair and not from AllenNLP. Flair is using the
allennlp/commands/elmo.pymodule, which was intended to be run in the command line with its companionallennlp/commands/predict.py. When using these modules in the CLI,cuda_deviceis specified both at training and prediction time.I suppose the fix should be to change the load method in
flair.nn.Modelor override it on theSequenceTaggerclass. There, we should check for anELMoEmbeddingsclass and set itscuda_deviceproperly.Another alternative would be to replace the usage of
allennlp/commands/elmo.pywithallenlp.modules.elmo._ElmoBiLm. But this seems unnecessary work since we would need to reproduce the manipulations fromallennlp/commands/elmo.py.