Pybind11: [QUESTION] subscripting pybind11::array_t without bounds check and without using proxy

Created on 16 Oct 2020  路  20Comments  路  Source: pybind/pybind11

hello,
I am trying to subscript a pybind11::array_t<scalar_t, pybind11::array::c_style> or pybind11::array_t<scalar_t, pybind11::array::f_style>. I know I can use at() or mutable_at but those imply a check on the bounds.
I also know I can use the proxy but for my use case I have a wrapped class of pybind11::array_t and would like to not rely on the proxy since I am seeing some issues if I try to keep the proxy as member of my wrapper class.
So basically i would like to do something like:

template<typename scalar_t>
class MyWrapper
{
  MyWrapper(std::size_t extent) 
  : data_(extent), mutProxy(data_), uMutProxy(...)
public:
  scalar_t & operator[](std:size_t i){ data_.mP_(i); }
  scalar_t const & operator[](std:size_t i) const{ data_.uP_(i); }
  pbv_t * data() { return &data_; }
  pbv_t const * data() const { return &data_; }

private:
  using pbv_t = pybind11::array_t<scalar_t, pybind11::array::f_style>;
  using mp_t = decltype( std::declval<pbv_t &>().mutable_unchecked() );
  using up_t = decltype( std::declval<const pbv_t &>().unchecked() );

  pbv_t data_;
  mp_t mP_;
  up_t uP_;
}

Are there corresponding methods for at mutable_at or other ways I can subscript the native array that do not check bounds but that avoid the proxy? I looked at the source code and did not find anything.
Thanks

All 20 comments

since I am seeing some issues if I try to keep the proxy as member of my wrapper class.

@fnrizzi Which issues? I would argue this is the correct way of doing things, though.

@YannickJadoul
Thanks for your reply!
I updated the code above to explain the problem.
I have some issues because suppose I do the following:

// in python I have 
class MyTest:  
  def foo(self): return np.zeros(5) 

// in C++ I have 
MyWrapper<double > aCpp(5);
// suppose I have an object called "Obj" referencing MyTest such that I can do:
*aCpp.data() = Obj.attr("foo")();

Since the data wrapped by MyWrapper changes, I should update the proxies data members in MyWrapper class. But I cannot do that unless I explicitly add a method to MyWrapper class to reset the proxies which would be bad design in my view (if one forgets that, then kaboom) also not sure if at all possible to reset the proxy.
One way to get around this would be to pass *aCpp.data() to the python Foo. But if I did not want to do that, that is why I was asking if there is a mutable_at that does not check bounds because that way I can get around all of the above :)
Does this explain my problem?

@fnrizzi, is there an issue with making a distinction between aCpp.getData() and aCpp.setData()? Then in void MyWrapper<double>::setData(pbv_t data) you could update the wrapper as well as store the proxy.

Somehow, the *aCpp.data() = Obj.attr("foo")(); feels a bit tricky (it's basically just making pbv_t data_; public, right?).

If that's not possible you could of course implement some sort of caching functionality. But yeah, I don't think there's current a non-checking access without the proxy.

yes, this makes sense, kind of pointing to just using the foo(a) version of things and not the .data method.
However, I would need to .data when using for instance scipy blas so I can pass directly the object. so that would be a problem.
Btw, how do I update the proxy?

I can always modify the array_t class in my version of pybind11 to add a mutable_at and at methods without bounds check.
Is there a reason why such method are not provided and the suggestion is to use the proxies? I personally think it would not hurt to have them since it would avoid one to use the proxy . Or maybe enable the check just in debug mode.

I meant doing something like this:

template<typename scalar_t>
class MyWrapper
{
public:
  // ...

  void setData(pbv_t data) {
     data_ = data;
     mP_ = data.mutable_unchecked();
  }

private:
  using pbv_t = pybind11::array_t<scalar_t, pybind11::array::f_style>;
  using mp_t = decltype( std::declval<pbv_t &>().mutable_unchecked() );

  pbv_t data_;
  mp_t mP_;
}

And in the C++ code, instead of *aCpp.data() = Obj.attr("foo")();, you just call a setter aCpp.setData(Obj.attr("foo")()); (which is better practice, anyway, than returning a pointer to a private member and assigning to that - again, doing so is basically just a convoluted way of making a private member public??)

I can always modify the array_t class in my version of pybind11 to add a mutable_at and at methods without bounds check.

Of course, I can't stop you from doing that.

Is there a reason why such method are not provided and the suggestion is to use the proxies?

Not sure; I wasn't involved when this was added to pybind11 (quite some time ago). I assume it is because Python expects bounds checks and a segfault that crashes you whole Python interpreter isn't too advisable? And because you can still avoid it with these proxies, when you're sure you know what you're doing?

You really have no check in the code you've shown whether the 5 in return np.zeros(5) matches the one in MyWrapper<double > aCpp(5);. So you want to add some extra checks there, anyway, if you want to create a decent and safe Python interface. If you feel very powerful, you do have full access to ptr() to get the C API pointer.

yes that makes sense. I agree with you.
So the proxies objects are assignable?

So the proxies objects are assignable?

Good question. I think they should be. Let me check.

Oh, no... seems it's not possible, since its dims_ is const :-(

    const unsigned char *data_;
    // Storing the shape & strides in local variables (i.e. these arrays) allows the compiler to
    // make large performance gains on big, nested loops, but requires compile-time dimensions
    conditional_t<Dynamic, const ssize_t *, std::array<ssize_t, (size_t) Dims>>
            shape_, strides_;
    const ssize_t dims_;

yes, that was ringing a bell :)
It seems to me that, everything considered, adding [] to represent the subscript without checking bounds is the most viable way and also in line with the c++ standard, like std::vector. I can always add checks myself and enable those in debug mode only. This allows us to use the wrapper in the intuitive way.
Thanks for your help!

I've discussed this problem a bit with @YannickJadoul . First, const data members are pretty harmful. Second, you can work around it:

  1. Instead of storing the proxy objects themselves, store arrays of std::byte, appropriately aligned.
  2. Use placement new and placement delete to manage the lifetime of proxies that reside in your wrapper, constructing them within that std::byte[].
  3. Become friends with std::launder, because you'll need it.

I've discussed this problem a bit with @YannickJadoul . First, const data members are pretty harmful. Second, you can work around it:

1. Instead of storing the proxy objects themselves, store arrays of `std::byte`, appropriately aligned.

2. Use placement new and placement delete to manage the lifetime of proxies that reside in your wrapper, constructing them within that `std::byte[]`.

3. Become friends with `std::launder`, because you'll need it.

Oooooor, maybe a PR removing this const, then, if it's so harmful? Then at least it becomes assignable ;-)

Oooooor, maybe a PR removing this const, then, if it's so harmful?

That's fair. I just wanted to offer a workaround, however ugly it may be, in case @fnrizzi can't wait for the pull request.

ok thanks for the info!
Just my thought, but maybe the [] operator for "expert" use would be very useful :)
Just curious, why "const data members are pretty harmful"? @bstaletic

Just curious, why "const data members are pretty harmful"?

They inhibit move semantics, make the whole class unassignable and don't actually protect you from anything. Try this example:

struct S { const int i = 3; };
int main() {
    S s;
    auto ip = new(&s + offsetof(S, i)) int{666}; // not actually UB as long as I only use `ip` to access `s.i`.
    return *ip; // Just so we know it worked. On POSIX it return (*ip) % 256
}

Just my thought, but maybe the [] operator for "expert" use would be very useful :)

We have a way of doing that: unchecked and unchecked_mutable. If you really want to go crazy, just get the data() pointer and go crazy indexing yourself?

go crazy indexing yourself?

You're not supposed to be using that (or at least not to complain about it if we change it), but there's detail::byte_offset_unsafe(strides(), ssize_t(index)...) that you could mimic.

Just my thought, but maybe the [] operator for "expert" use would be very useful :)

We have a way of doing that: unchecked and unchecked_mutable. If you really want to go crazy, just get the data() pointer and go crazy indexing yourself?

yes but that returns the proxy, I was saying [] to access directly like mutable_at and at. Similarly to std vector

Just curious, why "const data members are pretty harmful"?

They inhibit move semantics, make the whole class unassignable and don't actually protect you from anything. Try this example:

struct S { const int i = 3; };
int main() {
    S s;
    auto ip = new(&s + offsetof(S, i)) int{666}; // not actually UB as long as I only use `ip` to access `s.i`.
    return *ip; // Just so we know it worked. On POSIX it return (*ip) % 256
}

yes, i agree with you in some cases :)

yes but that returns the proxy, I was saying [] to access directly like mutable_at and at. Similarly to std vector

Yes, but that proxy is the way to avoid bounds checks. I've explained above about the reasoning why this probably is the case.

I agree, I am just saying that in some cases it is slightly inconvenient and even std vector has [] for that. But will stick to the provided API without adding my own modifications :) When the proxies become assignable i will change I think

Was this page helpful?
0 / 5 - 0 ratings