Hi, I wonder how to set -D_GLIBCXX_USE_CXX11_ABI=1 when building apex. I got this error when using syncbn:
Warning: using Python fallback for SyncBatchNorm, possibly because apex was installed without --cuda_ext. The exception raised when attempting t
o import the cuda backend was: /usr/lib/python3.7/site-packages/syncbn.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC1ENS_14So
urceLocationERKSs
and then I got:
$ c++filt _ZN3c105ErrorC1ENS_14SourceLocationERKSs
c10::Error::Error(c10::SourceLocation, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
It is said related to -D_GLIBCXX_USE_CXX11_ABI according to this post.
Also, I check ABI by:
$ strings /usr/lib/python3.7/site-packages/torch/lib/libtorch.so|grep ABI
CXXABI_1.3.8
CXXABI_1.3.5
CXXABI_1.3.7
CXXABI_1.3.3
CXXABI_1.3
$ strings /usr/lib/python3.7/site-packages/syncbn.cpython-37m-x86_64-linux-gnu.so|grep ABI
CXXABI_1.3
CXXABI_1.3.3
CXXABI_1.3.5
A solution to this is compile apex with -D_GLIBCXX_USE_CXX11_ABI=1. So, how could I do that?
I have the same issue
What's your build environment?
gcc, cuda, OS, version of pytorch?
According to that same issue, https://discuss.pytorch.org/t/undefined-symbol-when-import-lltm-cpp-extension/32627/4, the pytorch extension builder should be setting that environment variable for you in Pytorch 1.0 and later.
@mcarilli I'm trying to package apex for ArchLinux. So, I'm using gcc 7, cuda 110.0.130, cudnn 7.5.0.56, Arch Linux, pytorch 1.0.1. According to ArchLinux's PKGBUILD https://git.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=packages/python-pytorch#n107, pytorch is build with -D_GLIBCXX_USE_CXX11_ABI=0 which is the pytorch's default behavior. I modified /usr/lib/python3.7/site-packages/torch/utils/cpp_extension.py, set -D_GLIBCXX_USE_CXX11_ABI=1, https://github.com/pytorch/pytorch/blob/83221655a8237ca80f9673dad06a98d34c43e546/torch/utils/cpp_extension.py#L398 and https://github.com/pytorch/pytorch/blob/83221655a8237ca80f9673dad06a98d34c43e546/torch/utils/cpp_extension.py#L1012. I finally success to build and use SyncBatchNorm.
@mcarilli So in order for apex to build and function I first need to build pytorch with D_GLIBCXX_USE_CXX11_ABI=1 instead of the 0 in the spots you mentioned?
Typically, you don't have to do anything. My understanding is that precompiled Pytorch binaries (e.g., what you get from a pip/conda install torch) are compiled with D_GLIBCXX_USE_CXX11_ABI=0.
The extension builder tries to check how you installed torch:
https://github.com/pytorch/pytorch/blob/master/torch/utils/cpp_extension.py#L104
_is_binary_build should return True if you installed Torch by downloading precompiled pip/conda binaries, and False if you built Torch from source on your system.
If _is_binary_build() is True (which should be the case if Pytorch was installed via pip/conda from precompiled binaries, which we expect were compiled with D_GLIBCXX_USE_CXX11_ABI=0) then the extension builder's call to _add_gnu_abi_flag_if_binary sets D_GLIBCXX_USE_CXX11_ABI=0 to build the extensions as well, which should ensure the extensions are binary compatible with the installed Pytorch binaries.
if _is_binary_build() is False (which should be the case if you compiled Torch from source on your system) then the extension builder's call to _add_gnu_abi_flag_if_binary does NOT set D_GLIBCXX_USE_CXX11_ABI=0. I guess it assumes you are building the extensions in the same environment that you used to compile Torch, and therefore, whatever value of D_GLIBCXX_USE_CXX11_ABI was present in the environment when you compiled Torch is also present while you are building the extensions. In other words, it assumes that the Torch binaries you compiled and the extensions you are now compiling will be compatible because they are/were built in the same environment.
I'm not sure if this helps directly, but I hope it gives a better idea what's going on. To solve the problem we need to know more about your environment (how was Torch installed, torch version, gcc version, cuda version).
https://github.com/pytorch/pytorch/commit/e0c593eae7679fbf9e08d933f695ba9781b24da2
So the problem is that pytorch is compiled from source but the version string is set binary-formatted manually in the PKGBUILD.
This issue is resolved in the commit above (3 days ago) but not in the pytorch v1.0.1.
I compiled pytorch master from source with cuda/intel-mkl and compiled apex with it. I can verify the problem does not exist for pytorch master.
@mcarilli I use Ubuntu 18.04, CUDA 10, gcc 7.3 and pytorch 1.0.1 (from pytorch anaconda channel). I did not compile pytorch from source. Changing the flag resolved the issue, at least in that environment. I'll try building pytorch master from source @leomao suggest next week.