We have a library that uses pybind11 to wrap its internal C++ code. We now also want to allow external extension modules to be usable with the library. However, we are noticing that when our library is built with one compiler and an extension module with another, there is a segfault within pybind11 upon import.
I am able to reproduce the bug with a small example:
a.cpp (imagine our library)
#include <pybind11/pybind11.h>
namespace py = pybind11;
struct A {
explicit A(int y) : _y(y) {}
int f(int x) { return x + _y; }
int _y;
};
PYBIND11_MODULE(a, m) {
py::class_<A>(m, "A").def(py::init<int>()).def("f", &A::f);
}
b.cpp (imagine an extension)
#include <pybind11/pybind11.h>
namespace py = pybind11;
struct B {
explicit B(int y) : _y(y) {}
int f(int x) { return x + _y; }
int _y;
};
PYBIND11_MODULE(b, m) {
py::class_<B>(m, "B").def(py::init<int>()).def("f", &B::f);
}
setup_a.py
from setuptools import setup, Extension
from setuptools.command.build_ext import build_ext
ext_modules = [
Extension('a', ['a.cpp'], include_dirs=['../include'], language='c++'),
]
class BuildExtension(build_ext):
"""A custom build extension for adding compiler-specific options."""
def build_extensions(self):
for extension in self.extensions:
extension.extra_compile_args = ['-g', '-std=c++11']
build_ext.build_extensions(self)
setup(
name='a', ext_modules=ext_modules, cmdclass={
'build_ext': BuildExtension
})
setup_b.py
from setuptools import setup, Extension
from setuptools.command.build_ext import build_ext
ext_modules = [
Extension('b', ['b.cpp'], include_dirs=['../include'], language='c++'),
]
class BuildExtension(build_ext):
"""A custom build extension for adding compiler-specific options."""
def build_extensions(self):
for extension in self.extensions:
extension.extra_compile_args = ['-g', '-std=c++11']
build_ext.build_extensions(self)
setup(
name='b',
ext_modules=ext_modules,
cmdclass={
'build_ext': BuildExtension
})
Then:
CXX=clang++ CC=clang python setup_a.py installCXX=g++-7 CC=gcc-7 python setup_b.py installThen:
$ lldb python
(lldb) target create "python"
iCurrent executable set to 'python' (x86_64).
(lldb) run
imProcess 61507 launched: '/Users/psag/home/play/x/pybind11/env/bin/python' (x86_64)
impPython 3.5.1 (default, Jan 24 2016, 13:26:48)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
o>>> import a
>>> import b
Process 63580 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x130)
frame #0: 0x0000000102ac58ea b.cpython-35m-darwin.so`pybind11::detail::make_new_python_type(rec=0x00007fff5fbfdf60) at class.h:564
561 auto metaclass = rec.metaclass.ptr() ? (PyTypeObject *) rec.metaclass.ptr()
562 : internals.default_metaclass;
563
-> 564 auto heap_type = (PyHeapTypeObject *) metaclass->tp_alloc(metaclass, 0);
565 if (!heap_type)
566 pybind11_fail(std::string(rec.name) + ": Unable to create type object!");
567
Target 0: (python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x5)
* frame #0: 0x0000000000000005
frame #1: 0x000000010236f3b2 b.cpython-35m-darwin.so`pybind11::detail::make_new_python_type(rec=0x00007fff5fbfe000) at class.h:564
frame #2: 0x0000000102375ac6 b.cpython-35m-darwin.so`pybind11::detail::generic_type::initialize(this=0x00007fff5fbfdff8, rec=0x00007fff5fbfe000) at pybind11.h:887
frame #3: 0x00000001023762e1 b.cpython-35m-darwin.so`::PyInit_b() [inlined] _ZN8pybind116class_I1BJEEC4IJNS_9metaclassEEEENS_6handleEPKcDpRKT_((null)=<unavailable>, name=<unavailable>, scope=handle @ 0x00007fde97dcf910, this=0x00007fff5fbfdff8) at pybind11.h:1065
frame #4: 0x0000000102376260 b.cpython-35m-darwin.so`::PyInit_b() [inlined] pybind11_init_b(m=<unavailable>)
frame #5: 0x0000000102376260 b.cpython-35m-darwin.so`::PyInit_b()
frame #6: 0x000000010014b844 Python`_PyImport_LoadDynamicModuleWithSpec + 489
frame #7: 0x000000010014b3f6 Python`_imp_create_dynamic + 252
frame #8: 0x00000001000d19d5 Python`PyCFunction_Call + 273
frame #9: 0x00000001001359bc Python`PyEval_EvalFrameEx + 24272
frame #10: 0x00000001001386f0 Python`_PyEval_EvalCodeWithName + 1884
frame #11: 0x000000010013902f Python`fast_function + 341
frame #12: 0x0000000100135100 Python`PyEval_EvalFrameEx + 22036
frame #13: 0x0000000100138faf Python`fast_function + 213
frame #14: 0x0000000100135100 Python`PyEval_EvalFrameEx + 22036
frame #15: 0x0000000100138faf Python`fast_function + 213
frame #16: 0x0000000100135100 Python`PyEval_EvalFrameEx + 22036
frame #17: 0x0000000100138faf Python`fast_function + 213
frame #18: 0x0000000100135100 Python`PyEval_EvalFrameEx + 22036
frame #19: 0x0000000100138faf Python`fast_function + 213
frame #20: 0x0000000100135100 Python`PyEval_EvalFrameEx + 22036
frame #21: 0x00000001001386f0 Python`_PyEval_EvalCodeWithName + 1884
frame #22: 0x000000010012fad7 Python`PyEval_EvalCodeEx + 78
frame #23: 0x00000001000babb0 Python`function_call + 377
frame #24: 0x000000010009905e Python`PyObject_Call + 97
frame #25: 0x00000001000998b7 Python`_PyObject_CallMethodIdObjArgs + 197
frame #26: 0x000000010014a8b0 Python`PyImport_ImportModuleLevelObject + 1780
frame #27: 0x000000010012cbc8 Python`builtin___import__ + 135
frame #28: 0x00000001000d1900 Python`PyCFunction_Call + 60
frame #29: 0x000000010009905e Python`PyObject_Call + 97
frame #30: 0x0000000100137f38 Python`PyEval_CallObjectWithKeywords + 165
frame #31: 0x0000000100133a10 Python`PyEval_EvalFrameEx + 16164
frame #32: 0x00000001001386f0 Python`_PyEval_EvalCodeWithName + 1884
frame #33: 0x000000010012fa83 Python`PyEval_EvalCode + 81
frame #34: 0x0000000100155461 Python`run_mod + 58
frame #35: 0x000000010015522e Python`PyRun_InteractiveOneObject + 569
frame #36: 0x0000000100154b88 Python`PyRun_InteractiveLoopFlags + 209
frame #37: 0x0000000100154a84 Python`PyRun_AnyFileExFlags + 60
frame #38: 0x0000000100168d72 Python`Py_Main + 3430
frame #39: 0x0000000100001e27 python`___lldb_unnamed_symbol1$$python + 224
frame #40: 0x00007fffa5a38235 libdyld.dylib`start + 1
frame #41: 0x00007fffa5a38235 libdyld.dylib`start + 1
It seems the metaclass variable is nullptr in this case. This can be confirmed by putting an assertion into that location in class.h.
This is on macOS Sierra, but we see the same on Linux. We also observe this for certain combinations of different GCC versions. In the example above I use Python 3.5, but the same is observable for Python 3.6 and Python 2.7.
In general you want everything to be built with the same compiler, and same version of that compiler, see:
https://stackoverflow.com/questions/23895081/can-you-mix-c-compiled-with-different-versions-of-the-same-compiler
STL internal data structures are not guaranteed to be compatible across compiler major versions, and definitely not across entirely different compilers. Pybind11 uses STL data structures to organize its internal state, hence it is important that extension modules are also compiled with the same compiler (otherwise, all sorts of corruption can occur).
How do you recommend shipping binary python extension modules that use pybind11 if you don't know what other extensions a user might have installed that may use pybind11 and were built with another compiler version?
Is there a way to completely isolate the pybind11 state across these extensions?
The compiler version isn't quite as critical as the STL (and its version). STL versions are usually, but not always, backwards-compatible with previous versions of the same STL. For instance, you're usually fine mixing modules built with gcc-5/gcc-6/gcc-7/gcc-8/clang-* on linux, since they all use gcc's stdlibc++. Mixing any of those with clang using libc++鈥攚hich is the default when using clang under macOS, but not on Linux鈥攊s asking for trouble. Very rarely the stl breaks backwards compatibility鈥擨IRC, the last time for stdlibc++ was when version 5 came out (and was related to C++11 compatibility), so crossing the pre-5 and post-5 gcc boundary is likely another no-no.
What you're getting trying to load one so built with g++/stdlibc++ and another built with clang++/libc++ at the same time in the same binary is just something that can't work in any C++ code making use of the stl.
The only way around it is really to isolate the software: keep all your g++/stdlibc++-compiled code separate from your clang++/libc++-compiled code. And while that is a nuissance, it's not something that pybind can realistically do anything about.
Dear all,
I've realized that this has become a bit of a painful problem, particularly when installing external packages where one may not have control over what compiler is being used.
The following commit, currently on master, namespaces pybind11's internal data structures based on the value of the __GXX_ABI_VERSION flag, if present.
https://github.com/pybind/pybind11/commit/bdf1a2cc34815c2f9ee9a5f3b5b05bfadd28dd35
My hope is that this should avoid this kind of breakage in the future. For those of you who are affected, could you let me know if this addresses the problem? My plan then would be to push this into a patch release of pybind11.
Best,
Wenzel
I'll try it today. Thanks for your help
It solved my problem. Thanks!
Hi @wjakob , do you have a schedule for the patch release?
I've added another commit that provides an even stricter separation: https://github.com/pybind/pybind11/commit/c9f5a464bc8ebe91dee8578b2b4a23d9997ffefe
Released in v2.4.0 now :)
(not a patch release after all, because there are also some minor new features)
Thank you for fixing this @wjakob ! I can confirm that this patch worked for us as well.
This seems resolved. If more stuff needs to be done in this regard, please open a new issue.
Most helpful comment
How do you recommend shipping binary python extension modules that use pybind11 if you don't know what other extensions a user might have installed that may use pybind11 and were built with another compiler version?
Is there a way to completely isolate the pybind11 state across these extensions?