Apex: No module named 'fused_layer_norm_cuda'

Created on 20 Feb 2019  路  14Comments  路  Source: NVIDIA/apex

I am using apex on Google Colab. It managed to install with cuda and cpp.

However, I am encountering this problem when calling fused_layer_norm_cuda: "No module named 'fused_layer_norm_cuda'"

Most helpful comment

@knrd In your install command:

! git clone https://github.com/NVIDIA/apex.git
% cd apex
! python setup.py install --cuda_ext --cpp_ext
! pip install .
%cd ..

why are you doing an additional pip install .? I believe the pip install . will silently rerun the setup.py but without the --cuda_ext and --cpp_ext options, which I suspect is the source of your error.

Try a clean reinstall as described in https://github.com/NVIDIA/apex/issues/156#issuecomment-465301976 (don't additionally do the pip install .). If you prefer to use pip over setuptools, you can use this command INSTEAD of python setup.py install --cuda_ext --cpp_ext:

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

(thanks to @syed-ahmed for this snippet).

@alvin-leong @AlexanderYogurt could this also be the source of your errors?

All 14 comments

Here is my Colab notebook: https://colab.research.google.com/drive/1mzz-uyc1tq0rgm5sF5JC9s3hxlMmuHVp

As you can see, the installation seems fine.

However, you can scroll all the way down where it reports this error.

Are you sure you have Cuda 9 installed? I had the same problem because I had Cuda 8 installed on my machine, but Cuda 9 only via Pytorch (at least I think that's why).

I actually have the same problem. I have cuda 9:

File "/home/agemagician/anaconda3/envs/tensorflow/lib/python3.6/site-packages/apex/normalization/fused_layer_norm.py", line 126, in __init__ fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda") File "/home/agemagician/anaconda3/envs/tensorflow/lib/python3.6/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 994, in _gcd_import File "<frozen importlib._bootstrap>", line 971, in _find_and_load File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked ModuleNotFoundError: No module named 'fused_layer_norm_cuda'

When I tried the solution here:
https://github.com/NVIDIA/apex/issues/156#issuecomment-465301976

Now I got a different error:

  File "/home/agemagician/anaconda3/envs/tensorflow/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 126, in __init__
    fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
  File "/home/agemagician/anaconda3/envs/tensorflow/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /home/agemagician/anaconda3/envs/tensorflow/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration

I am also facing this issue on a clean install. Python 3.6, CUDA10, pytorch 1.0.1.post2 (installed with conda). GPU V100.

Ok got it nvcc was missing and the install was not installing cuda extensions (check for a message at the very beginning of the installation).
Be sure to have the CUDA Toolkit installed: https://developer.nvidia.com/cuda-downloads

I actually have it:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Wed_Apr_11_23:16:29_CDT_2018
Cuda compilation tools, release 9.2, V9.2.88

But still have the error.

I'm also facing the same problem (ModuleNotFoundError: No module named 'fused_layer_norm_cuda') on https://colab.research.google.com/

I have successfully installed apex:

! git clone https://github.com/NVIDIA/apex.git
% cd apex
! python setup.py install --cuda_ext --cpp_ext
! pip install .
%cd ..

and I'm able to load FusedLayerNorm:

! nvcc --version
import apex
from apex.normalization.fused_layer_norm import FusedLayerNorm
print(dir(FusedLayerNorm))
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
['__call__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_apply', '_get_name', '_load_from_state_dict', '_named_members', '_register_load_state_dict_pre_hook', '_register_state_dict_hook', '_slow_forward', '_tracing_name', '_version', 'add_module', 'apply', 'buffers', 'children', 'cpu', 'cuda', 'double', 'dump_patches', 'eval', 'extra_repr', 'float', 'forward', 'half', 'load_state_dict', 'modules', 'named_buffers', 'named_children', 'named_modules', 'named_parameters', 'parameters', 'register_backward_hook', 'register_buffer', 'register_forward_hook', 'register_forward_pre_hook', 'register_parameter', 'reset_parameters', 'share_memory', 'state_dict', 'to', 'train', 'type', 'zero_grad']



md5-c2a61616817acba52095c4be71500f82



---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-33-dec9113144dd> in <module>()
----> 1 model = BertForTokenClassification.from_pretrained("bert-base-uncased", num_labels=len(tag2idx))

/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling.py in from_pretrained(cls, pretrained_model_name_or_path, state_dict, cache_dir, from_tf, *inputs, **kwargs)
    579         logger.info("Model config {}".format(config))
    580         # Instantiate model.
--> 581         model = cls(config, *inputs, **kwargs)
    582         if state_dict is None and not from_tf:
    583             weights_path = os.path.join(serialization_dir, WEIGHTS_NAME)

/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling.py in __init__(self, config, num_labels)
   1097         super(BertForTokenClassification, self).__init__(config)
   1098         self.num_labels = num_labels
-> 1099         self.bert = BertModel(config)
   1100         self.dropout = nn.Dropout(config.hidden_dropout_prob)
   1101         self.classifier = nn.Linear(config.hidden_size, num_labels)

/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling.py in __init__(self, config)
    683     def __init__(self, config):
    684         super(BertModel, self).__init__(config)
--> 685         self.embeddings = BertEmbeddings(config)
    686         self.encoder = BertEncoder(config)
    687         self.pooler = BertPooler(config)

/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling.py in __init__(self, config)
    245         # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
    246         # any TensorFlow checkpoint file
--> 247         self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
    248         self.dropout = nn.Dropout(config.hidden_dropout_prob)
    249 

/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py in __init__(self, normalized_shape, eps, elementwise_affine)
    124 
    125         global fused_layer_norm_cuda
--> 126         fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
    127 
    128         if isinstance(normalized_shape, numbers.Integral):

/usr/lib/python3.6/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

/usr/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

/usr/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)

/usr/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'fused_layer_norm_cuda'

I got this error as well.

My nvcc:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

@knrd In your install command:

! git clone https://github.com/NVIDIA/apex.git
% cd apex
! python setup.py install --cuda_ext --cpp_ext
! pip install .
%cd ..

why are you doing an additional pip install .? I believe the pip install . will silently rerun the setup.py but without the --cuda_ext and --cpp_ext options, which I suspect is the source of your error.

Try a clean reinstall as described in https://github.com/NVIDIA/apex/issues/156#issuecomment-465301976 (don't additionally do the pip install .). If you prefer to use pip over setuptools, you can use this command INSTEAD of python setup.py install --cuda_ext --cpp_ext:

pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

(thanks to @syed-ahmed for this snippet).

@alvin-leong @AlexanderYogurt could this also be the source of your errors?

I did not do

! pip install .

So that was not the cause of my problem.

However, I did try to install using pip instad of setuptools:

!pip install -e "/content/apex" --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext"

It worked! Thanks!

Just dropped in to say that using the pip installer solved the problem for me too.

Running on Ubuntu via Databricks, just in case anyone else is having trouble this is how I got it working:

%sh
/databricks/python3/bin/pip install --upgrade --force-reinstall torch torchvision

cd /path/to/apex
rm -rf build
/databricks/python3/bin/pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

/databricks/python3/bin/pip install pytorch-pretrained-bert

git clone https://github.com/NVIDIA/apex.git && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . && cd .. && rm -rf apex

I am using apex on Google Colab. It managed to install with cuda and cpp.

However, I am encountering this problem when calling fused_layer_norm_cuda: "No module named 'fused_layer_norm_cuda'"

I faced the exact same issue in Colab even after following the solution here, after which I figured out (atleast in my case) that the issue was due to the CUDA version dissimilarity between PyTorch(cuda-9.2) and Colab(cuda-10.1). I then reinstalled PyTorch with CUDA 10.1 using this command

pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html

And then it worked fine!!

Was this page helpful?
0 / 5 - 0 ratings