With the latest pytorch binaries and the latest code from apex I get an ImportError when trying to use the fused_layer_norm_cuda module. Specifically the following results in an error:
In [1]: import fused_layer_norm_cuda
ImportError: <path/to/install>/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE
Following suggestions from #187, here's my system information:
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial
Torch info:
In [1]: print(torch.__version__, torch.version.cuda, torch.utils.cpp_extension.CUDA_HOME)
1.0.1 10.0.130 /usr/local/cuda
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
Compiling from source may fix this - but is it expected that this compilation against latest pytorch conda binaries should fail?
No that is not expected. First off, make sure you import torch before you import anything from apex (this is a common issue with extensions). If that doesn鈥檛 work we can try to repro.
I was able to reproduce this locally:
>>> import fused_layer_norm_cuda
by itself resulted in an ImportError with an undefined symbol.
To fix the error:
>>> import torch
>>> import fused_layer_norm_cuda
As I said, this is a known issue with extensions in general.
I also recommend using FusedLayerNorm via the wrapper module interface (apex.normalization.FusedLayerNorm). If you call the Cuda binding directly (fused_layer_norm_cuda) it will not route through an autograd function, and therefore will not be differentiable.
Just to follow up on this for others who encounter similar issues - I hadn't imported torch in an attempt to create a minimum breaking example, not realizing this causes a different error.
Switching to the correct imports:
In [1]: import torch
In [2]: import fused_layer_norm_cuda
ImportError: <path/to/install>/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKSs
Some googling of this error turned up this issue. The suggestion was to create a clean conda environment, then install ipython followed by pytorch. After doing that, I was able to get things working.
Most helpful comment
Just to follow up on this for others who encounter similar issues - I hadn't imported
torchin an attempt to create a minimum breaking example, not realizing this causes a different error.Switching to the correct imports:
Some googling of this error turned up this issue. The suggestion was to create a clean conda environment, then install
ipythonfollowed bypytorch. After doing that, I was able to get things working.