Allennlp: Error moving allennlp.modules.elmo.Elmo to GPU

Created on 23 May 2019  路  6Comments  路  Source: allenai/allennlp

Please first search our GitHub repository for similar questions. If you don't find a similar example you can use the following template:

System (please complete the following information):

  • OS: Linux Ubuntu 18
  • Python version: 3.6.5
  • AllenNLP version: 0.8.3
  • PyTorch version: 1.1.0

Question
I'm not sure whether this is a question, a bug, or a kind of feature.

Using the allennlp.modules.elmo.Elmo class, as a module to train a new model, I found a problem trying to send my entire model to GPU device for training. Here is the code of my Bigger model, which contains Elmo as an embedder.

class ELMo(nn.Module):

    def __init__(self, fine_tune=False, n_classes=2):
        super(ELMo, self).__init__()

        self.embedding = Elmo(config.PATH_TO_ELMO_OPTIONS,
                              config.PATH_TO_ELMO_WEIGHTS, 1, dropout=0.2,
                              requires_grad=fine_tune)
        n_ftrs = self.embedding.get_output_dim()
        self.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(n_ftrs, n_classes)
        )

    def forward(self, x):
        x = [sentence.split(" ") for sentence in x]
        x = batch_to_ids(x)
        x = self.embedding(x)
        mask = x["mask"]
        x = x["elmo_representations"][0]
        x = self._get_mean(x, mask)
        x = self.fc(x)
        return x

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
elmo = ELMo()
elmo = elmo.to(device) # Using cuda
elmo(["Eu amo", "deep learning"])

When executing the last line, it throws an error:

RuntimeError Traceback (most recent call last)
in
6
7
----> 8 elmo(["Eu amo", "deep learning"])

/workspace/ReadOrSee/env/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, input, *kwargs)
491 result = self._slow_forward(input, *kwargs)
492 else:
--> 493 result = self.forward(input, *kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

in forward(self, x)
42 print("tIn Model: input size", len(x))
43 x = batch_to_ids(x)
---> 44 x = self.embedding(x)
45 mask = x["mask"]
46 x = x["elmo_representations"][0]

/workspace/ReadOrSee/env/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, input, *kwargs)
491 result = self._slow_forward(input, *kwargs)
492 else:
--> 493 result = self.forward(input, *kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

/workspace/ReadOrSee/env/lib/python3.6/site-packages/allennlp/modules/elmo.py in forward(self, inputs, word_inputs)
167
168 # run the biLM
--> 169 bilm_output = self._elmo_lstm(reshaped_inputs, reshaped_word_inputs)
170 layer_activations = bilm_output['activations']
171 mask_with_bos_eos = bilm_output['mask']

/workspace/ReadOrSee/env/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, input, *kwargs)
491 result = self._slow_forward(input, *kwargs)
492 else:
--> 493 result = self.forward(input, *kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

/workspace/ReadOrSee/env/lib/python3.6/site-packages/allennlp/modules/elmo.py in forward(self, inputs, word_inputs)
603 type_representation = token_embedding['token_embedding']
604 else:
--> 605 token_embedding = self._token_embedder(inputs)
606 mask = token_embedding['mask']
607 type_representation = token_embedding['token_embedding']

/workspace/ReadOrSee/env/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, input, *kwargs)
491 result = self._slow_forward(input, *kwargs)
492 else:
--> 493 result = self.forward(input, *kwargs)
494 for hook in self._forward_hooks.values():
495 hook_result = hook(self, input, result)

/workspace/ReadOrSee/env/lib/python3.6/site-packages/allennlp/modules/elmo.py in forward(self, inputs)
355 character_embedding = torch.nn.functional.embedding(
356 character_ids_with_bos_eos.view(-1, max_chars_per_token),
--> 357 self._char_embedding_weights
358 )
359

/workspace/ReadOrSee/env/lib/python3.6/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1504 # remove once script supports set_grad_enabled
1505 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1506 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1507
1508

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #3 'index'

It seems that we can not send the allennlp.modules.elmo.Elmo to GPU since some of it's parameters uses CPU backend variables, is that right? Is there another way to work with GPU for my model, which uses the allennlp.modules.elmo.Elmo class?

All 6 comments

your batch_to_ids function is generating a tensor of ids, is that tensor on the right device?

@joelgrus , you're totally right. It works now. But, since my batch_to_ids function is inside my forward() function, my ELMo class needs to know what device to use, right? I would like to use the DataParallel, and how could I know whether my batch in x ( from forward(self, x)) can know which GPU to send the content to?

find out what device self.embedding is on and send it to that one? (I am not a dataparallel expert)

Just to let my "solution" here, I think I need to put the following code:

x = [sentence.split(" ") for sentence in x]
x = batch_to_ids(x)

outside of my forward() function for the DataParallel to work properly. I think the batch you pass to the model needs to be tensors already, and not strings, as I was doing previously.

great, I'll close the issue then, reopen if you run into more problems

oops, I don't think you can reopen, just comment if you run into more problems

Was this page helpful?
0 / 5 - 0 ratings