I am using apex on Google Colab. It managed to install with cuda and cpp.
However, I am encountering this problem when calling fused_layer_norm_cuda: "No module named 'fused_layer_norm_cuda'"
Here is my Colab notebook: https://colab.research.google.com/drive/1mzz-uyc1tq0rgm5sF5JC9s3hxlMmuHVp
As you can see, the installation seems fine.
However, you can scroll all the way down where it reports this error.
Are you sure you have Cuda 9 installed? I had the same problem because I had Cuda 8 installed on my machine, but Cuda 9 only via Pytorch (at least I think that's why).
I actually have the same problem. I have cuda 9:
File "/home/agemagician/anaconda3/envs/tensorflow/lib/python3.6/site-packages/apex/normalization/fused_layer_norm.py", line 126, in __init__
fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
File "/home/agemagician/anaconda3/envs/tensorflow/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'fused_layer_norm_cuda'
When I tried the solution here:
https://github.com/NVIDIA/apex/issues/156#issuecomment-465301976
Now I got a different error:
File "/home/agemagician/anaconda3/envs/tensorflow/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 126, in __init__
fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
File "/home/agemagician/anaconda3/envs/tensorflow/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
File "<frozen importlib._bootstrap>", line 571, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 922, in create_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /home/agemagician/anaconda3/envs/tensorflow/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: __cudaPopCallConfiguration
I am also facing this issue on a clean install. Python 3.6, CUDA10, pytorch 1.0.1.post2 (installed with conda). GPU V100.
Ok got it nvcc was missing and the install was not installing cuda extensions (check for a message at the very beginning of the installation).
Be sure to have the CUDA Toolkit installed: https://developer.nvidia.com/cuda-downloads
I actually have it:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Wed_Apr_11_23:16:29_CDT_2018
Cuda compilation tools, release 9.2, V9.2.88
But still have the error.
I'm also facing the same problem (ModuleNotFoundError: No module named 'fused_layer_norm_cuda') on https://colab.research.google.com/
I have successfully installed apex:
! git clone https://github.com/NVIDIA/apex.git
% cd apex
! python setup.py install --cuda_ext --cpp_ext
! pip install .
%cd ..
and I'm able to load FusedLayerNorm:
! nvcc --version
import apex
from apex.normalization.fused_layer_norm import FusedLayerNorm
print(dir(FusedLayerNorm))
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
['__call__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_apply', '_get_name', '_load_from_state_dict', '_named_members', '_register_load_state_dict_pre_hook', '_register_state_dict_hook', '_slow_forward', '_tracing_name', '_version', 'add_module', 'apply', 'buffers', 'children', 'cpu', 'cuda', 'double', 'dump_patches', 'eval', 'extra_repr', 'float', 'forward', 'half', 'load_state_dict', 'modules', 'named_buffers', 'named_children', 'named_modules', 'named_parameters', 'parameters', 'register_backward_hook', 'register_buffer', 'register_forward_hook', 'register_forward_pre_hook', 'register_parameter', 'reset_parameters', 'share_memory', 'state_dict', 'to', 'train', 'type', 'zero_grad']
md5-c2a61616817acba52095c4be71500f82
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-33-dec9113144dd> in <module>()
----> 1 model = BertForTokenClassification.from_pretrained("bert-base-uncased", num_labels=len(tag2idx))
/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling.py in from_pretrained(cls, pretrained_model_name_or_path, state_dict, cache_dir, from_tf, *inputs, **kwargs)
579 logger.info("Model config {}".format(config))
580 # Instantiate model.
--> 581 model = cls(config, *inputs, **kwargs)
582 if state_dict is None and not from_tf:
583 weights_path = os.path.join(serialization_dir, WEIGHTS_NAME)
/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling.py in __init__(self, config, num_labels)
1097 super(BertForTokenClassification, self).__init__(config)
1098 self.num_labels = num_labels
-> 1099 self.bert = BertModel(config)
1100 self.dropout = nn.Dropout(config.hidden_dropout_prob)
1101 self.classifier = nn.Linear(config.hidden_size, num_labels)
/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling.py in __init__(self, config)
683 def __init__(self, config):
684 super(BertModel, self).__init__(config)
--> 685 self.embeddings = BertEmbeddings(config)
686 self.encoder = BertEncoder(config)
687 self.pooler = BertPooler(config)
/usr/local/lib/python3.6/dist-packages/pytorch_pretrained_bert/modeling.py in __init__(self, config)
245 # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
246 # any TensorFlow checkpoint file
--> 247 self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
248 self.dropout = nn.Dropout(config.hidden_dropout_prob)
249
/usr/local/lib/python3.6/dist-packages/apex/normalization/fused_layer_norm.py in __init__(self, normalized_shape, eps, elementwise_affine)
124
125 global fused_layer_norm_cuda
--> 126 fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda")
127
128 if isinstance(normalized_shape, numbers.Integral):
/usr/lib/python3.6/importlib/__init__.py in import_module(name, package)
124 break
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
127
128
/usr/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)
/usr/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)
/usr/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)
ModuleNotFoundError: No module named 'fused_layer_norm_cuda'
I got this error as well.
My nvcc:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
@knrd In your install command:
! git clone https://github.com/NVIDIA/apex.git
% cd apex
! python setup.py install --cuda_ext --cpp_ext
! pip install .
%cd ..
why are you doing an additional pip install .? I believe the pip install . will silently rerun the setup.py but without the --cuda_ext and --cpp_ext options, which I suspect is the source of your error.
Try a clean reinstall as described in https://github.com/NVIDIA/apex/issues/156#issuecomment-465301976 (don't additionally do the pip install .). If you prefer to use pip over setuptools, you can use this command INSTEAD of python setup.py install --cuda_ext --cpp_ext:
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
(thanks to @syed-ahmed for this snippet).
@alvin-leong @AlexanderYogurt could this also be the source of your errors?
I did not do
! pip install .
So that was not the cause of my problem.
However, I did try to install using pip instad of setuptools:
!pip install -e "/content/apex" --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext"
It worked! Thanks!
Just dropped in to say that using the pip installer solved the problem for me too.
Running on Ubuntu via Databricks, just in case anyone else is having trouble this is how I got it working:
%sh
/databricks/python3/bin/pip install --upgrade --force-reinstall torch torchvision
cd /path/to/apex
rm -rf build
/databricks/python3/bin/pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
/databricks/python3/bin/pip install pytorch-pretrained-bert
git clone https://github.com/NVIDIA/apex.git && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" . && cd .. && rm -rf apex
I am using apex on Google Colab. It managed to install with cuda and cpp.
However, I am encountering this problem when calling fused_layer_norm_cuda: "No module named 'fused_layer_norm_cuda'"
I faced the exact same issue in Colab even after following the solution here, after which I figured out (atleast in my case) that the issue was due to the CUDA version dissimilarity between PyTorch(cuda-9.2) and Colab(cuda-10.1). I then reinstalled PyTorch with CUDA 10.1 using this command
pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
And then it worked fine!!
Most helpful comment
@knrd In your install command:
why are you doing an additional
pip install .? I believe thepip install .will silently rerun the setup.py but without the --cuda_ext and --cpp_ext options, which I suspect is the source of your error.Try a clean reinstall as described in https://github.com/NVIDIA/apex/issues/156#issuecomment-465301976 (don't additionally do the
pip install .). If you prefer to use pip over setuptools, you can use this command INSTEAD ofpython setup.py install --cuda_ext --cpp_ext:(thanks to @syed-ahmed for this snippet).
@alvin-leong @AlexanderYogurt could this also be the source of your errors?