Apex: Error in FusedLayerNorm

Created on 15 Feb 2019 · 32Comments · Source: NVIDIA/apex

After installing apex with the cuda extensions and running pytorch-pretrained-BERT, I get the following error in FusedLayerNormAffineFunction, apex/normalization/fused_layer_norm.py (line 21).

RuntimeError: a Tensor with 2482176 elements cannot be converted to Scalar (item at /pytorch/aten/src/ATen/native/Scalar.cpp:9)

Here are the shapes of my tensors:

input_ - [32, 101, 768]
bias_ - [768]
weight_ - [768]
self.normalized_shape - [768]

I'm not sure if it's a problem with pytorch-pretrained-BERT calling it incorrectly or a bug in apex. Any idea? I've also created an issue here.

I'm running Ubuntu with CUDA 9, PyTorch 0.4.1.

Full stacktrace below.

File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 710, in forward
    embedding_output = self.embeddings(input_ids, token_type_ids)
  File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 261, in forward
    embeddings = self.LayerNorm(embeddings)
  File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 149, in forward
    input, self.weight, self.bias)
  File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 21, in forward
    input_, self.normalized_shape, weight_, bias_, self.eps)

RuntimeError: a Tensor with 2482176 elements cannot be converted to Scalar (item at /pytorch/aten/src/ATen/native/Scalar.cpp:9)

frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f1aa5da3021 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f1aa5da28ea in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: at::native::item(at::Tensor const&) + 0x12c3 (0x7f1aa690d5b3 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #3: at::TypeDefault::item(at::Tensor const&) const + 0x55 (0x7f1aa6b1c905 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #4: torch::autograd::VariableType::eye_out(at::Tensor&, long, long) const + 0x184 (0x7f1aa4faeec4 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #5: <unknown function> + 0x89ca (0x7f1a82e739ca in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #6: layer_norm_affine(at::Tensor, c10::ArrayRef<long>, at::Tensor, at::Tensor, double) + 0x185 (0x7f1a82e762a5 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0x18d44 (0x7f1a82e83d44 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #8: <unknown function> + 0x16495 (0x7f1a82e81495 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #9: _PyCFunction_FastCallDict + 0x154 (0x55a8f9925744 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #10: <unknown function> + 0x198610 (0x55a8f99ac610 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #12: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #13: _PyFunction_FastCallDict + 0x11b (0x55a8f99a6bab in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #14: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #15: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #16: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #17: THPFunction_do_forward(THPFunction*, _object*) + 0x15c (0x7f1ae02e21ec in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #18: PyCFunction_Call + 0x5f (0x55a8f992863f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #19: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #20: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #21: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #22: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #24: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #25: _PyFunction_FastCallDict + 0x11b (0x55a8f99a6bab in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #26: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #27: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #28: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #29: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #30: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #31: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #32: _PyFunction_FastCallDict + 0x1bc (0x55a8f99a6c4c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #33: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #34: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #35: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #36: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #37: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #38: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #40: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #41: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #42: _PyFunction_FastCallDict + 0x3da (0x55a8f99a6e6a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #43: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #44: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #45: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #46: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #47: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #48: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #49: _PyFunction_FastCallDict + 0x1bc (0x55a8f99a6c4c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #50: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #51: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #52: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #53: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #54: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #55: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #56: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #57: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #58: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #59: _PyFunction_FastCallDict + 0x3da (0x55a8f99a6e6a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #60: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #61: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #62: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #63: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)

Source

Hyperparticle

Most helpful comment

@Hyperparticle @thomwolf @geniki While I wait for the results of Thor's runs, one thing that occurs to me is your segfault may be because when you upgraded Pytorch, the existing (installed) Apex binaries were no longer compatible somehow. Try a full pip uninstall apex, then cd apex_repo_dir; rm-rf build; pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . and see if the segfault persists.

mcarilli on 19 Feb 2019

👍16 🎉4 🚀1

All 32 comments

Upgraded to CUDA 10.0 and PyTorch 1.0.1, now I get a segmentation fault with Apex enabled.

Hyperparticle on 15 Feb 2019

👍1

I also have this error (not on pytorch-bert). Same setup (CUDA 10 and latest PyTorch 1.0.1).

thomwolf on 17 Feb 2019

Me too - PyTorch 1.0.1, CUDA 10. It's not specific to pytorch-pretrained-BERT, the script below is enough for me:

import torch
import apex

input = torch.rand(3, 10).cuda()
fln = apex.normalization.FusedLayerNorm(10).cuda()
fln(input)

geniki on 17 Feb 2019

👍3

I got this example to fail on a V100 too.
I've now also tested on a k80 and this example works well with CUDA 10 and pytorch 1.0.1.post2 🤔

thomwolf on 18 Feb 2019

@geniki @thomwolf Strange, I don't get any errors with the script above, but I still get the runtime error when running pytorch-pretrained-BERT (using Titan RTX).

Hyperparticle on 18 Feb 2019

@geniki

Me too - PyTorch 1.0.1, CUDA 10. It's not specific to pytorch-pretrained-BERT, the script below is enough for me:
import torch
import apex

input = torch.rand(3, 10).cuda()
fln = apex.normalization.FusedLayerNorm(10).cuda()
fln(input)

When I run this^ I get:

ModuleNotFoundError: No module named 'fused_layer_norm_cuda'

Also getting it on a pytorch-pretrained-BERT experiment.

Not sure if these issues (mine and the one originally posted) are related though...

mrdbourke on 18 Feb 2019

@mrdbourke I think you may have compiled apex without cuda support. You need to compile it with python setup.py install --cpp_ext --cuda_ext.

thorjohnsen on 18 Feb 2019

👍5

@mrdbourke I think you may have compiled apex without cuda support. You need to compile it with python setup.py install --cpp_ext --cuda_ext.

Thank you, just realised I didn't use the extension... my bad.

This fixed it.

mrdbourke on 18 Feb 2019

@mcarilli any hint on a possible source of error from you guys?

thomwolf on 19 Feb 2019

Sorry for the delayed response, my bandwidth right now is completely consumed cleaning up the mixed precision API (https://github.com/NVIDIA/apex/compare/api_refactor?expand=1).** I didn't write FusedLayerNorm (it came in from our MLPerf efforts) and I haven't had time to debug it. @thorjohnsen is currently using it in our own implementation of BERT.

@geniki Thank you for the minimal repro.
@Hyperparticle @thomwolf When you say "I get a segmentation fault with Apex enabled in https://github.com/NVIDIA/apex/issues/156#issuecomment-464115433, do you mean the segmentation fault occurs specifically when you try to use FusedLayerNorm, or at some other point?

**Unrelated, but useful: I'll be presenting a preview of the new API in a webinar tomorrow. It's working, but I don't have documentation or examples yet. I will add it to master by next week.

mcarilli on 19 Feb 2019

@mcarilli I can confirm that I do get the segmentation fault when calling the FusedLayerNorm code, but I haven't investigated exactly where. I don't get one when I use regular LayerNorm.

Hyperparticle on 19 Feb 2019

mcarilli on 19 Feb 2019

👍16 🎉4 🚀1

@mcarilli Thanks, that fixed the segfault. But now I still get the same FusedLayerNorm error.

Hyperparticle on 19 Feb 2019

@geniki The mini repro runs fine on my setup (cuda 10 with v100)

@Hyperparticle Can you provide some more information on how to repro this issue? which pretrained model are you using?

A script (if possible) with the repro would be of great help.

jjsjann123 on 19 Feb 2019

Thanks @mcarilli. This fixed it for me - at least the snipped I posted above. @Hyperparticle does the snipped above run for you?

geniki on 19 Feb 2019

@geniki @jjsjann123 The snippet works, but I'm still seeing an error for my use-case. I'm running the tutorial code from this section in pytorch-pretrained-BERT with apex enabled. I'll try to debug it and get a minimal code snippet extracted with the tensor operation.

Hyperparticle on 19 Feb 2019

Thanks a lot. We are having a hard time reproducing the bug.
Having a repro script would make it much faster for us to debug the problem. Looking forward to your update.

jjsjann123 on 19 Feb 2019

@jjsjann123 This is basically what the code is doing:

import torch
import apex
import importlib

fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")

input_ = torch.rand([32, 63, 768]).cuda()
weight_ = torch.rand(768).cuda()
bias_ = torch.rand(768).cuda()
normalized_shape = weight_.size()
eps = 1e-12

output, mean, invvar = fused_layer_norm_cuda.forward_affine(input_, normalized_shape, weight_, bias_, eps)

My GPU is now unavailable, so I can't verify if this causes the problem. If not, then it could either be the values in the tensors that are the problem (which I will have to save and upload somewhere), or some other extraneous property of the tensors.

Hyperparticle on 19 Feb 2019

root@d0c3981dfbe3:/workspace# cat repro.py 
import torch
import apex
import importlib

fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")

input_ = torch.rand([32, 63, 768]).cuda()
weight_ = torch.rand(768).cuda()
bias_ = torch.rand(768).cuda()
normalized_shape = weight_.size()
eps = 1e-12

output, mean, invvar = fused_layer_norm_cuda.forward_affine(input_, normalized_shape, weight_, bias_, eps)
torch.cuda.synchronize()
root@d0c3981dfbe3:/workspace# python repro.py 
root@d0c3981dfbe3:/workspace#

This is working fine for me as well :(

jjsjann123 on 19 Feb 2019

Seems like recompiling apex cleanly like @mcarilli indicated fixed the problem for me also!
Both @geniki and @Hyperparticle examples works at my place (as well as my current project).
Thanks a lot!

thomwolf on 19 Feb 2019

@thomwolf Well that sounds like a relief. As for me, I'll have to see if the old code is still lingering somewhere on my system. I'll have to test it in a couple days. @jjsjann123 If it works for others, then you can close this issue.

Hyperparticle on 19 Feb 2019

I'll close the issue and feel free to open a new one and ping me on that if things don't work out for you @Hyperparticle

jjsjann123 on 19 Feb 2019

Whew, this is a useful gotcha to know about. good old emergency repair procedure number one: turn it off and on again. Glad people seem to be happy, especially since as I said, I don't have the bandwidth to do a deep dive debug right this second.

Note to self: make the setup.py smarter to avoid such cases in the future.

mcarilli on 19 Feb 2019

👍3

@mrdbourke I think you may have compiled apex without cuda support. You need to compile it with python setup.py install --cpp_ext --cuda_ext.

I cannot use pip to install apex, your method works for me

Strideradu on 7 Jun 2019

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

i do like this but also get the segment fault

wyx518 on 10 Jun 2019

👍1

@wyx518 Do you get the seg fault while running some python script using apex/amp or during the install?
Either way, could you post the complete error message with the stack trace so that we can have a look?

ptrblck on 10 Jun 2019

@ptrblck First，I run my own demo using pytorch-pretrained-BERT but get this
run.sh: line 3: 21713 Segmentation fault (core dumped) .
Then I run the code @jjsjann123 offered, also get the

wyx518 on 11 Jun 2019

I solved the problem, it's the version of GCC . It should be 4.9+,but ubuntu 14.04 is 4.8.

wyx518 on 11 Jun 2019

👍4

@mcarilli Could you please tell me how to find "apex_repo_dir" and then "cd apex_repo_dir",I find it all the time ,but cannot figure it out ,thanks

lbys on 21 Aug 2019

Upgraded to CUDA 10.0 and PyTorch 1.0.1, now I get a segmentation fault with Apex enabled.

I also get a segmentation fault with Apex enabled, CUDA 9,0 and PyTorch 1.1.0

widgetxp on 26 Feb 2020

Running fp16 models via fairseq and getting a segmentation fault with pytorch 1.4.0, gcc/6.3.0, cuda/10.1.105

ethanjperez on 11 May 2020

Is there a way to install apex on a windows machine with "--cpp_ext" and "--cuda_ext"? At the moment I can't, and as far as I can tell that's a general issue with windows?