Pybind11: Factory function crashes

Created on 23 Nov 2016  路  24Comments  路  Source: pybind/pybind11

I'm new in using pybind11, so I apologize if my problem is stupid. I had previously written a Python binding for this library called Pismo File Mount, that seems to be broken since I've updated Python and pybind11. The library has a functionint PfmApiFactory (PfmApi** api) which retrieves an instance of the PfmApi interface. I've written a wrapper function to call newPfmApi and return it back to Python.

PfmApi* newPfmApi(){
    PfmApi* pfm = 0;
    PfmApiFactory(&pfm);
    return pfm;
}
m.def("newPfmApi", &newPfmApi);



md5-4e1b4526a102a96de3a691ba93341933



m.def("newPfmApi", &newPfmApi, py::return_value_policy::reference);

The complete binding code is at http://paste.ubuntu.com/23520659/

Most helpful comment

The linked commit started using run-time type information: typeid(type_name) vs. typeid(variable_name). The latter looks up RTTI in the variable's vtable which should be located inside the library which defined the polymorphic type. I'm guessing the issue here is that the destination library doesn't support RTTI: it's probably a native C library with a C++ interface (or a native C++ library compiled with -fno-rtti, but C is more likely based on the API). In that case typeid is following the vtable, but it doesn't have any RTTI so it returns a junk pointer -> segfault on deference.

To confirm this, try wrapping the pointer in a simple struct and then use py::class_ with that struct, but not PfmApi* directly:

struct PfmApiWrapper {
    PfmApi* pfm = nullptr;

    PfmApiWrapper() {
        int err = PfmApiFactory(&pfm);
        if (!err)
            std::cout << pfm->Version() << std::endl;
        // you may want to throw on error here
    }
    ~PfmApiWrapper() {
        // release procedure goes here
    }
};

PYBIND11_PLUGIN(pismo) {
    py::module m("pismo", "Pismo Bindings for Python");
    py::class_<PfmApiWrapper>(m, "PfmApi")
        .def(py::init<>());
    return m.ptr();
}

If the following works correctly, then the RTTI lookup was the problem.

>>> import pismo
>>> pismo.PfmApi()

All 24 comments

since I've updated Python and pybind11

There may be some bug here, but it would be nice if you could definitely rule out the Python upgrade (e.g. by doing a completely clean build with the new Python, and by checking out the older pybind11 revision where things worked). Is it possible that you have mismatched python headers and library versions?

What you're describing seems okay (whether you want reference or not really depends on whether your factory is creating new objects that you have to destroy yourself, or returning pointers to objects that it manages internally; the former is the default assumption, the latter needs reference).

We'd be happy to take a look at this, but you will need to submit a (much smaller) self-contained example that reproduces the issue (see CONTRIBUTING.md for details). I'll close this ticket until then.

OK. The smallest example I can come up with:

#include "pybind11/pybind11.h"
#include "pfmapi.h"
#include <iostream>

namespace py = pybind11;
PfmApi* newPfmApi(){
    PfmApi* pfm = 0;
    int err = PfmApiFactory(&pfm);
    if (!err){
        std::cout << "Pismo (version " << pfm->Version() << ") loaded." << std::endl;
    }
    return pfm;
}

PYBIND11_PLUGIN(pismo) {
    py::module m("pismo", "Pismo Bindings for Python");

    py::class_<PfmApi>(m, "PfmApi");
    m.def("newPfmApi", &newPfmApi, "");
    return m.ptr();
}

As I'm running macOS and was worried that system Python and the one installed by homebrew may be conflicting, I tested it on a Linux machine:

nima@nima-vm:~/pismo/pfmkit-183/pybinding$ c++ pismo.cpp -fPIC -shared -std=c++11 -I ~/pismo/pybind11-master/include -I ../include -fpermissive `python-config --cflags --ldflags --libs` -o pismo.so
nima@nima-vm:~/pismo/pfmkit-183/pybinding$ python -c "import pismo; pismo.newPfmApi()"
Pismo (version pfm.1.0.0.183) loaded.
Segmentation fault (core dumped)

Made it work eventually! I had a number of older releases for pybind11 on my system. I checked each and the oldest one worked. I'm not sure which version it is though! It doesn't have a _version.py file and setup.py says it's version 1.0, but comparing it to v1.0 tarball I just downloaded suggests they're not identical (e.g. uses PYBIND11_* instead of PYBIND_*). I can upload this version of pybind11 if you like, so you can track the bug. Any thoughts?

Your code is fine: with a proper C++ class, everything there works. This sort of factory creation is a pretty basic feature of pybind11 that is tested under multiple variations in the test code. This definitely looks to be a problem with PfmApi.

The following is just guesswork, but might help you out.

I took a quick look at the online API docs, and I think one possible issue is that PfmApi is an abstract class, but does not have a virtual destructor (but it does have a virtual Release() method which has documentation that seems to do what I'd expect a destructor to do).

Not supporting destruction from a polymorphic base class pointer, besides being fairly bad C++ design, will break pybind11's default holder (std::unique_ptr) because that attempts to delete the allocated instance--but apparently PfmApi isn't designed to be destroyed that way. (This design does not give me a high level of confidence in its C++ code).

You can, temporarily, try this, which will basically just disable deleting of PfmApi objects controlled by pybind (they'll just be leaked instead):

struct PfmDeleter { void operator()(PfmApi *pfm) { /* FIXME: destroy/cleanup pfm */ } };
py::class_<PfmApi, std::unique_ptr<PfmApi, PfmDeleter>>(m, "PfmApi");

If that fixes the segfault, you'll need to figure out what goes in the comment to properly do the cleanup. (It might just be pfm->Release(), but I'm just basing that on the fairly inadequate documentation) so that the C++ object get cleaned up properly when the associated python object does.

Thanks @jagerman. I really appreciate the response. I asked Pismo's main developer to take a look at your comments. Just to be clear, doesn't the fact that pybind11 v1.0 doesn't crash imply that there may be something broken with newer releases of pybind11?

Agreed, this looks like a design issue with the underlying Library rather than a Pybind11 problem -- what version of pybind11 crashes in this context is not relevant.

@jagerman, what if it doesn't fix the segfault? I'm a little confused:

struct PfmDeleter { void operator()(PfmApi *pfm) { std::cout << "PfmDeleter called!" << std::endl; } };
typedef std::unique_ptr<PfmApi, PfmDeleter> PfmApiDel;

PfmApiDel newPfmApiDel(){
    PfmApi* pfm = 0;
    int err = PfmApiFactory(&pfm);
    if (!err)
        std::cout << pfm->Version() << std::endl;
    return PfmApiDel (pfm);
}

PfmApi* newPfmApi(){
    PfmApi* pfm = 0;
    int err = PfmApiFactory(&pfm);
    if (!err)
        std::cout << pfm->Version() << std::endl;
    return pfm;
}

PYBIND11_PLUGIN(pismo) {
    py::module m("pismo", "Pismo Bindings for Python");
    py::class_<PfmApi, PfmApiDel>(m, "PfmApi");
    m.def("newPfmApi", &newPfmApi, "");
    m.def("newPfmApiDel", &newPfmApiDel, "");
    return m.ptr();
}

Using pybind11 v1.0 headers:

nima@macbook$ python -c "import pismo; pismo.newPfmApi(); print 'fini'"
pfm.1.0.0.183
PfmDeleter called!
fini
nima@macbook$ python -c "import pismo; pismo.newPfmApiDel(); print 'fini'"
pfm.1.0.0.183
PfmDeleter called!
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: Unable to convert function return value to a Python type! The signature was
    () -> std::__1::unique_ptr<PfmApi, PfmDeleter>



md5-967d9ed5924dc694e9e0aae96c4d92b8



nima@macbook$ python -c "import pismo; pismo.newPfmApi(); print 'fini'"
pfm.1.0.0.183
Segmentation fault: 11



md5-fdb2fecafb0c1f5e012fc30c65a3a1da



nima@macbook$ python -c "import pismo; pismo.newPfmApiDel(); print 'fini'"
pfm.1.0.0.183
Segmentation fault: 11

@jagerman, Regarding your comments:

This definitely looks to be a problem with PfmApi.
... possible issue is that PfmApi is an abstract class, but does not have a virtual destructor (but it does have a virtual Release() method which has documentation that seems to do what I'd expect a destructor to do).
... besides being fairly bad C++ design, will break pybind11's default holder (std::unique_ptr) because that attempts to delete the allocated instance--but apparently PfmApi isn't designed to be destroyed that way. (This design does not give me a high level of confidence in its C++ code).

The factory+release pattern used in the PFM API, is very intentional, provides significant benefits for the PFM implementation and for client code, and is not that unusual. Calling this pattern "bad C++ design" demonstrates some lack of experience on your part, or at least a fairly close-minded view of software engineering. A more useful way to develop confidence in the PFM code is to measure maintainability, stability, portability, and supportability with real world projects, not by subjectively grading the Stroustrup-erificness of the cross-platform native C/C++ PFM API.

This is not the forum for discussion of the interface you choose to provide for your commercial, closed source software.

Regardless of the comments on PfmApi design, can you kindly tell me why it still crashes even when I set a custom deleter for unique_ptr? I would really appreciate it.

@nimamox - at this point I'd suggest running Python under gdb and getting a stack trace for the segfault. Right now your minimal test case involves an entire library whose source code isn't available. (If you could reproduce the segfault without using Pfm at all we would definitely address it). It might also be helpful to store the Python value in a variable to see whether the segfault is happening during construction (you will not see fini) or destruction (you will see fini before the segfault).

Also note that you don't need the newPfmApiDel function: the important part is telling pybind to use it as a holder (the second template argument to py::class_).

As you suspected, segfault occurs during construction. I forgot to mention, but I had also tested the code above with garbage collector being disabled and Python would still crash before printing fini.
Moreover, I went through the commits of pybind11, and found the exact commit, that is d7efa4f, after which the behavior changed and the factory function would crash upon return. Most of changes appear related to casting between C++ and Python types in cast.h. I toyed with functions of interest in cast.h in latest revision, and the exact line which causes the crash is

auto it = internals.registered_types_cpp.find(std::type_index(*type_info));

Any thoughts?

It may have something to do with PfmApi being a pure virtual class; I'll look into this (but I can't do so for at least a day or two).

The linked commit started using run-time type information: typeid(type_name) vs. typeid(variable_name). The latter looks up RTTI in the variable's vtable which should be located inside the library which defined the polymorphic type. I'm guessing the issue here is that the destination library doesn't support RTTI: it's probably a native C library with a C++ interface (or a native C++ library compiled with -fno-rtti, but C is more likely based on the API). In that case typeid is following the vtable, but it doesn't have any RTTI so it returns a junk pointer -> segfault on deference.

To confirm this, try wrapping the pointer in a simple struct and then use py::class_ with that struct, but not PfmApi* directly:

struct PfmApiWrapper {
    PfmApi* pfm = nullptr;

    PfmApiWrapper() {
        int err = PfmApiFactory(&pfm);
        if (!err)
            std::cout << pfm->Version() << std::endl;
        // you may want to throw on error here
    }
    ~PfmApiWrapper() {
        // release procedure goes here
    }
};

PYBIND11_PLUGIN(pismo) {
    py::module m("pismo", "Pismo Bindings for Python");
    py::class_<PfmApiWrapper>(m, "PfmApi")
        .def(py::init<>());
    return m.ptr();
}

If the following works correctly, then the RTTI lookup was the problem.

>>> import pismo
>>> pismo.PfmApi()

it's probably a native C library with a C++ interface

Thanks. This is exactly the case. The code snippet works correctly.

Ah yes, good observation @dean0x7d, that indeed seems likely here.

@dean0x7d -- I can't seem to find any reference indicating what typeid() gives back when RTTI isn't supported--it would be nice to throw an informative pybind11_fail in that case if it can somehow be detected.

I would have expected this to fail with a linker error. I suspect that this is on the fringes of compiler/linker implementations, so there probably isn't a whole lot we can do in pybind11.

@jagerman I don't think there's anything that can be detected. Disabling RTTI isn't part of the C++ standard so the typeid() call becomes undefined behavior -- no way around that. The C++ interface for the underlying C library is tricking the C++ compiler into working with the type, but the C library doesn't supply the data needed at runtime.

Yeah, I suppose so.

It's a bit surprising that the code even links properly though--if I try to compile code invoking typeid() on via linking to an .so compiled with -fno-rtti, I get a linking error (as @wjakob expected).

As far as I can see from the docs, the library is not linked at all, it's loaded at runtime. The precompiled C library emulates a vtable for C++ (which is probably UB right there or at the very least implementation defined behavior).

Currently the PFM API is implemented in a run-time loaded C++ shared library, compiled with RTTI disabled.

The C compatible interface definitions in pfmapi.h are dependent on the underlying platform ABI, different for each platform, but well defined.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

MisesEnForce picture MisesEnForce  路  3Comments

nschloe picture nschloe  路  4Comments

c-f-h picture c-f-h  路  3Comments

Xeite picture Xeite  路  4Comments

tdp2110 picture tdp2110  路  3Comments