Transformers: BertLayerNorm not loaded in CPU mode

Created on 28 Dec 2018 · 16Comments · Source: huggingface/transformers

I am running into an exception when loading a model on CPU in one of the example scripts. I suppose this is related to loading the FusedLayerNorm from apex, even when --no_cuda has been set.
https://github.com/huggingface/pytorch-pretrained-BERT/blob/8da280ebbeca5ebd7561fd05af78c65df9161f92/pytorch_pretrained_bert/modeling.py#L154

Or is this working for anybody else?

Example:

run_classifier.py --data_dir glue/CoLA --task_name CoLA --do_train --do_eval --bert_model bert-base-cased --max_seq_length 32 --train_batch_size 12 --learning_rate 2e-5 --num_train_epochs 2.0 --output_dir /tmp/mrpc_output/ --no_cuda

Exception:

[...]
File "/home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 19, in forward
    input_, self.normalized_shape, weight_, bias_, self.eps)
RuntimeError: input must be a CUDA tensor (layer_norm_affine at apex/normalization/csrc/layer_norm_cuda.cpp:120)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fe35f6e4cc5 in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: layer_norm_affine(at::Tensor, c10::ArrayRef<long>, at::Tensor, at::Tensor, double) + 0x4bc (0x7fe3591456ac in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x18db4 (0x7fe359152db4 in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x16505 (0x7fe359150505 in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>
frame #12: THPFunction_do_forward(THPFunction*, _object*) + 0x15c (0x7fe38fb7db7c in /home/mp/miniconda3/envs/bert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

wontfix

Source

tholor

Most helpful comment

I see. It's a bit tricky because apex is loaded by default when it can be found and this loading is deep inside the library it-self, not the examples (here). I don't think it's worth it to add specific logic inside the loading of the library to handle such a case.

I guess the easiest solution in your case is to have two python environments (with conda or virtualenv) and switch to the one in which apex is not installed when don't want to use GPU.

Feel free to re-open the issue if this doesn't solve your problem.

thomwolf on 10 Jan 2019

👍3

All 16 comments

Hi @tholor, apex is a GPU specific extension.
What kind of use-case do you have in which you have apex installed but no GPU (also fp16 doesn't work on CPU, it's not supported on PyTorch currently)?

thomwolf on 7 Jan 2019

👍1

The two cases I came across this:
1) testing if some code works for both GPU and CPU (on a GPU machine with apex installed)
2) training/debugging small sample models on my laptop. It has a small "toy GPU" with only 2 GB RAM and therefore I am usually using the CPUs here.

I agree that these are edge cases, but I thought the flag --no_cuda is intended for exactly such cases?

tholor on 7 Jan 2019

I guess the easiest solution in your case is to have two python environments (with conda or virtualenv) and switch to the one in which apex is not installed when don't want to use GPU.

Feel free to re-open the issue if this doesn't solve your problem.

thomwolf on 10 Jan 2019

👍3

Sure, then it's not worth the effort.

tholor on 11 Jan 2019

@thomwolf a solution would be to check torch.cuda.is_available() and then we can disable apex by using CUDA_VISIBLE_DEVICES=-1

artemisart on 11 Jan 2019

Is this also related to the fact then the tests fail when apex is installed?

    def forward(self, input, weight, bias):
      input_ = input.contiguous()
      weight_ = weight.contiguous()
      bias_ = bias.contiguous()
      output, mean, invvar = fused_layer_norm_cuda.forward_affine(
>         input_, self.normalized_shape, weight_, bias_, self.eps)
E     RuntimeError: input must be a CUDA tensor (layer_norm_affine at apex/normalization/csrc/layer_norm_cuda.cpp:120)
E     frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f754d802021 in /lium/buster1/caglayan/anaconda/envs/bert/lib/python3.6/site-packages/torch/lib/libc10.so)
E     frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f754d8018ea in /lium/buster1/caglayan/anaconda/envs/bert/lib/python3.6/site-packages/torch/lib/libc10.so)
E     frame #2: layer_norm_affine(at::Tensor, c10::ArrayRef<long>, at::Tensor, at::Tensor, double) + 0x6b9 (0x7f754a8aafe9 in /lium/buster1/caglayan/anaconda/envs/bert/lib/python3.6/site-pack
ages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
E     frame #3: <unknown function> + 0x19b9d (0x7f754a8b8b9d in /lium/buster1/caglayan/anaconda/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpy
thon-36m-x86_64-linux-gnu.so)
E     frame #4: <unknown function> + 0x19d1e (0x7f754a8b8d1e in /lium/buster1/caglayan/anaconda/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpy
thon-36m-x86_64-linux-gnu.so)
E     frame #5: <unknown function> + 0x16971 (0x7f754a8b5971 in /lium/buster1/caglayan/anaconda/envs/bert/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpy
thon-36m-x86_64-linux-gnu.so)
E     <omitting python frames>
E     frame #13: THPFunction_do_forward(THPFunction*, _object*) + 0x15c (0x7f7587d411ec in /lium/buster1/caglayan/anaconda/envs/bert/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

../../lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py:21: RuntimeError
_______________________________________________________________________________ OpenAIGPTModelTest.test_default

ozancaglayan on 1 Mar 2019

Hello @artemisart,

What do you mean by "disable apex by CUDA_VISIBLE_DEVICES=-1" ? I tried to do that but the import still work at this line

LamDang on 29 Mar 2019

@LamDang You can set the env CUDA_VISIBLE_DEVICES=-1 to disable cuda in pytorch (ex when you launch your script in bash CUDA_VISIBLE_DEVICES=-1 python script.py), and then wrap the import apex with a if torch.cuda.is_available() in the script.

artemisart on 29 Mar 2019

Hi all, I came across this issue when my GPU memory was fully loaded and had to make some inference at the same time. For this kind of temporary need, the simplest solution for me is just to touch apex.py before the run and remove it afterwards.

vickyliin on 1 Apr 2019

👍1

Re-opening this to remember to wrap the apex import with a if torch.cuda.is_available() in the next release as advocated by @artemisart

thomwolf on 9 Apr 2019

Hello, I pushed a pull request here to solve this issue upstream https://github.com/NVIDIA/apex/pull/256

Update: it is merged into apex

LamDang on 10 Apr 2019

🎉1

Re-opening this to remember to wrap the apex import with a if torch.cuda.is_available() in the next release as advocated by @artemisart

Yes please, I also struggle with Apex in CPU mode, i have wrapped Bertmode in my object and when I tried to load the pretrained GPU model with torch.load(model, map_location='cpu') , it shows 'no module named apex' but if I install apex, I get no cuda error(I'm on a CPU machine in inference phase )

jingshu-liu on 15 Apr 2019

Well it should be solved in apex now. What is the exact error message you have ?
By the way, not using apex is also fine, don't worry about it if you don't need t.

thomwolf on 15 Apr 2019

I got
model = torch.load(model_file, map_location='cpu')
result = unpickler.load() ModuleNotFoundError: No module named 'apex'

model_file is a pretrained object with GPU with a bertmodel field , but I want to unpickle it in CPU mode

jingshu-liu on 15 Apr 2019

Try to use pytorch recommended serialization practice (saving/loading the state dict):
https://pytorch.org/docs/stable/notes/serialization.html

thomwolf on 15 Apr 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.