Pybind11: [FEAT] being able to enforce the dereferencing of the smart pointer in argument conversion

Created on 29 Dec 2020  路  7Comments  路  Source: pybind/pybind11

Thank you for making PyBind11!

I have a relatively complex ("externally given") class with a funny ownership model, so things can go missing while Python has a reference. To wrap it in Python safely, I'm trying to hold instances in a smart pointer (a la shared_ptr) that checks on dereferencing whether the referred to object still exists. It does so by another pointer indirection.

The trouble is that while PyBind will create the shared ptr, I cannot seem to enforce that argument conversion always goes through the smart ptr and not the raw ptr held alongside.

I tried to do a custom caster, but that ran into "specialization after instantiation" error messages.

There is an "obvious path" to go through lambdas that all take the smart pointer and dereference themselves, but this is obviously a bit cumbersome when it needs to be done to a large number of methods.

@YannickJadoul looked at this on gitter to help clarify my thoughts (but all errors are mine). Thank you Yannick!

#include <pybind11/pybind11.h>
#include <iostream>

template <typename T>
struct Wrap {
  Wrap(T* p) : elem(p) {}
  void clear() { elem = nullptr; }
  T* elem;
};


class MyClass {
public:
  MyClass() : wrap_(std::make_shared<Wrap<MyClass>>(this)) {
  }
  std::shared_ptr<Wrap<MyClass>> wrap() {
    return wrap_;
  }
  ~MyClass() {
    wrap_->clear();
  }
private:
  std::shared_ptr<Wrap<MyClass>> wrap_;
};

template <typename T>
class unwrapping_shared_ptr {
 private:
  std::shared_ptr<Wrap<T>> impl;

 public:
  unwrapping_shared_ptr() = default;
  unwrapping_shared_ptr(std::shared_ptr<T> p) : impl(p) { std::cerr << "construct shared_ptr\n"; }
  unwrapping_shared_ptr(T* p) : impl(p->wrap()) { std::cerr << "construct T*\n"; }
  T* get() const {
    if (! impl->elem) {
      throw std::logic_error("has been invalidated");
    }
    std::cerr << "get deref!\n";
    return impl->elem;
  }
  T** operator&() {
    if (! impl->elem) {
      throw std::logic_error("has been invalidated");
    }
    std::cerr << "& deref!\n";
    return &(impl->elem);
  }
};

PYBIND11_DECLARE_HOLDER_TYPE(T, unwrapping_shared_ptr<T>, true);

namespace py = pybind11;

PYBIND11_MODULE(test_ext, m) {
  py::class_<MyClass, unwrapping_shared_ptr<MyClass>>(m, "MyClass")
    .def("__str__", [] (MyClass& m) { return "MyClass - jolly good"; });

  m.def("get_something", [] () { return new MyClass(); });
  m.def("delete_something", [] (MyClass* m) { delete m; });
  m.def("test_something", [] (unwrapping_shared_ptr<MyClass> m) {
    if (m.get()) {
      return "alive";
    } else {
      return "just resting";
    }});
}

and then in Python this gives me:

In [1]: import test_ext

In [2]: t = test_ext.get_something()
construct T*

In [4]: test_ext.test_something(t)
get deref!
Out[4]: 'alive'

In [5]: test_ext.delete_something(t)

In [6]: test_ext.test_something(t)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-6-34740f9db26d> in <module>
----> 1 test_ext.test_something(t)

RuntimeError: has been invalidated

In [7]: str(t)
Out[7]: 'MyClass - jolly good'

In [8]: 

Ideally, in [7] it would try to dereference the smart pointer, but it does not.

holders

Most helpful comment

I ran into this issue before as well. Having a custom holder type, you would actually expect that the holder's get() function is called each time its value is accessed. But that's not the case, because, in the value_and_holder struct, both the original pointer as well as the holder are stored and - as a shortcut - only the raw pointer is used in most cases.
However, that's not the only issue here: In @t-vi original post, a Python-managed object instance is passed to C++ and deleted there. This should not only delete the C++ part of the instance, but also the Python-part, i.e. the wrapper instance as well as its corresponding entry in pybind's internal registered_instances.
Furthermore, in the example code, ownership is not clear at all: Passing a raw pointer to delete_something() doesn't grant ownership and thus the right to delete the pointer!

All 7 comments

To be clear, I'm not making any promises (it's very tricky to access the holder if you don't know what the holder is, since pybind11 type-erases the holder object in ), but it's an interesting angle to take into account when working on the holders: should we always try to access contents of an object through holders? Would that solve some other things?
So I'm giving it that label (ping @EricCousineau-TRI, @rwgk, @rhaschke).

Meanwhile, I'll look at the custom type_caster solution.

should we always try to access contents of an object through holders? Would that solve some other things?

IIUC @rhaschke's comment from a couple weeks ago correctly, the raw pointer is also used in some registry, I hope Robert can chime in (I'd have to dig up that comment).

But assuming that can be resolved (if necessary), I think it's a great small-scale experiment to replace the raw pointer lookup with dereferencing the holder, and see if that has any measurable impact on runtime performance. (I feel like it will not, but you never know before you try.)

AFAIC, I'm not even sure this should be "fixed", because I'm not sure pybind11 promises anything like this (so it would need a discussion about expected behavior, etc). But it's interesting to keep in the back of our minds, when looking at the other holder types, right? Especially if we notice this results in further confusion, it's worth considering making a semi-breaking change.

small-scale experiment to replace the raw pointer lookup with dereferencing the holder

And yes, I was thinking that a pointer to the value most likely takes as much memory as a pointer to a function that extracts the pointer to the value from the holder (again, things are type-erased, so not that obvious at runtime, because you don't know the type of the holder, so you need some kind of function stored to do so, I think!).

I ran into this issue before as well. Having a custom holder type, you would actually expect that the holder's get() function is called each time its value is accessed. But that's not the case, because, in the value_and_holder struct, both the original pointer as well as the holder are stored and - as a shortcut - only the raw pointer is used in most cases.
However, that's not the only issue here: In @t-vi original post, a Python-managed object instance is passed to C++ and deleted there. This should not only delete the C++ part of the instance, but also the Python-part, i.e. the wrapper instance as well as its corresponding entry in pybind's internal registered_instances.
Furthermore, in the example code, ownership is not clear at all: Passing a raw pointer to delete_something() doesn't grant ownership and thus the right to delete the pointer!

However, that's not the only issue here: In @t-vi original post, a Python-managed object instance is passed to C++ and deleted there. This should not only delete the C++ part of the instance, but also the Python-part, i.e. the wrapper instance as well as its corresponding entry in pybind's internal registered_instances.

Well, so I cannot destruct python instances, and my goal here is to invalidate them through the smart pointer. This would work if the holder's get() function were called each time its value is accessed.

Furthermore, in the example code, ownership is not clear at all: Passing a raw pointer to delete_something() doesn't grant ownership and thus the right to delete the pointer!

This is "just" the minimal example. So what I would like to do is call clear() on the wrapper (or I could make a clear method on the smart pointer) and then subsequent Python conversions should fail. (I don't expect a solution for the lifetime issue of clearing while someone else still has the pointer.)

@t-vi, soooo, it's harder than I thought it would be, because of an implementation detail: the default holder_type<T>'s type_caster<holder_type<T>> inherits type_caster<T>. And I wanted to try create type_caster<T> by subclassing (or at least using) type_caster<holder_type<T>>. That's a pretty circular dependency.

Good news is, it's possible, though, by cutting and pasting relevant parts of copyable_holder_caster<T>. Bad news is, it looks pretty horrible; please don't try this at home. It's also (probably?) slightly hacking into implementation details, but then the whole type_caster thing is in detail and (in principle) bound to change without warning, I think. Good (?) news on that level is: no one's going to want to touch that code, anyway, so it's pretty stable...

Now, without further ado, I reveal to you, the monster hack:

namespace pybind11 {
namespace detail {

template <>
struct type_caster<MyClass> : public type_caster_base<MyClass> {
public:
    using type = MyClass;
    using holder_type = unwrapping_shared_ptr<MyClass>;

    bool load(handle src, bool convert) {
        return load_impl<type_caster<MyClass>>(src, convert);
    }

    explicit operator type *() { return static_cast<type *>(value); }
    explicit operator type &() { return *static_cast<type *>(value); }

protected:
    friend class type_caster_generic;

    bool load_value(value_and_holder &&v_h) {
        if (v_h.holder_constructed()) {
            value = v_h.template holder<holder_type>().get();
            return true;
        } else {
            throw cast_error("Unable to cast from non-held to held instance (T& to Holder<T>) "
#if defined(NDEBUG)
                             "(compile in debug mode for type information)");
#else
                             "of type '" + type_id<holder_type>() + "''");
#endif
        }
    }
};

}
}

which gives you, I believe, the output you expected?

import example

t = example.get_something()
example.test_something(t)
example.delete_something(t)
try:
    example.test_something(t)
except Exception as e:
    print(e)
print(str(t))

gives output

get deref!
get deref!
has been invalidated
Traceback (most recent call last):
  File "/home/yannick/tmp/sandbox/test.py", line 10, in <module>
    print(str(t))
RuntimeError: has been invalidated

I want to say "you're welcome", but ... I don't know if this is such a nice present, actually :sweat_smile: So I'll just keep it at: "enjoy" ;-)

Thank you! :gift: :christmas_tree: This is exactly what I had in mind and it will be useful to me.

Was this page helpful?
0 / 5 - 0 ratings