Pybind11: pybind::array and pybind::array_t universal references

Created on 13 Nov 2016 · 28Comments · Source: pybind/pybind11

Variadic template methods of pybind::array and pybind::array_t used for accessing data, namely

data, mutable_data, offset_at, index_at, get_byte_offset, at, mutable_at

should take arguments by value instead of universal reference.

The semantic in the standard library for functions taking integral or iterator arguments is to take them by value. (For example std::vector<T>::operator[] takes its argument by value).

Source

SylvainCorlay

Most helpful comment

In some unlikely cases it can have border effects, such as an object being moved unexpectedly. For example if you have a function foo that transform the index, but sometimes only returns it unchanged as an rvalue. Then myvec[foo(i)] will cause i to be moved and not usable afterwards.

That would be a fault in foo, not in myvec::operator[]. There's really no way to accidentally move out an lvalue -- std::move is required and very loud. (Pure rvalues are moved silently, but there's nothing unexpected about that.)

The only downside of forwarding references is that they result in extra overhead (pointers to ints) in case the template function isn't inlined (unlikely for short templates, but it does happen). There's no difference for fully inlined templates.

It would probably be best to keep things simple and optimize for the most common use case. These are pretty much always going to be simple integers so value semantics seems preferable.

dean0x7d on 13 Nov 2016

👍2

All 28 comments

^@aldanor

SylvainCorlay on 13 Nov 2016

The issue is that, although all arguments get converted to size_t eventually, the variable-arguments-through-a-template-parameter-pack approach being used here doesn't allow that implicit conversion to take place.

For stl functions like vector's operator[] I get implicit conversion because the argument is a size_t, which means that I can call it with anything that is implicitly convertible to size_t, e.g.:

class SomeNonCopyableClass {
    // ...
    operator size_t() const { return /* ... */; }
};

// ...

SomeNonCopyableClass a(2);
std::cout << myvec[a] << "\n";

With pass-by-value semantics without the type being in the function signature, this fails because I can't copy my a object to pass it along: the deduced argument type will be SomeNonCopyableClass, not size_t. But the current code fails here, too, because the various argument chains actually _are_ passing by value rather than forwarding arguments. This should be fixed.

So I think, to have an stl-like interface that takes anything convertible to a size_t, the code should _keep_ the && arguments, but needs to also use perfect forwarding, e.g. by changing the calls such as next_method(index...) to either next_method(std::forward<Ix>(index)...) (to pass the conversion down the line), or next_method(size_t(index)...) to do the conversion immediately.

On a side note: there's an index_at method that current takes its arguments by lvalue reference, which looks like a typo (i.e. it's missing the second &).

jagerman on 13 Nov 2016

So I think, to have an stl-like interface that takes anything convertible to a size_t, the code should keep the && arguments, but needs to also use perfect forwarding, e.g. by changing the calls such as next_method(index...) to either next_method(std::forward(index)...) (to pass the conversion down the line), or next_method(size_t(index)...) to do the conversion immediately.

Can you think of a realistic case where people would be indexing with non-integral types convertible to size_t?

SylvainCorlay on 13 Nov 2016

Sure; perhaps I encode something else in my indices--a row title, for instance--and so my index objects are:

struct myindex {
    size_t i;
    std::string name;
    operator size_t() const { return i; }
};

Sure, there are other ways I could do this, but what is the compelling reason to _not_ allow arbitrary types that are implicitly convertible to size_t to be used? (In other words, what does pass-by-value gain over perfect forwarding?)

jagerman on 13 Nov 2016

Sure, there are other ways I could do this, but what is the compelling reason to not allow arbitrary types that are implicitly convertible to size_t to be used? (In other words, what does pass-by-value gain over perfect forwarding?)

Not much except that it is the most used semantic for integral types, so universal references go against the principle of least surprise.

SylvainCorlay on 13 Nov 2016

In some unlikely cases it can have border effects, such as an object being moved unexpectedly. For example if you have a function foo that transform the index, but sometimes only returns it unchanged as an rvalue (effectively equivalently to std::move). Then myvec[foo(i)] will cause i to be moved and not usable afterwards.

SylvainCorlay on 13 Nov 2016

Regarding your concern regarding non-copyable things, I think that it is a good assumptions / requirement for and index to have a value semantic.

SylvainCorlay on 13 Nov 2016

I agree: the ideal interface here is (size_t i_0, size_t i_1, size_t i_2, ...) where the number of arguments is variable. But I don't see an easy way to do that here (two things that occur to me: making each of the calls a recursive template so that all arguments really do have to be size_t; or using va_args). Neither is all that nice, but perhaps one or the other is worth doing anyway.

jagerman on 13 Nov 2016

In some unlikely cases it can have border effects, such as an object being moved unexpectedly. For example if you have a function foo that transform the index, but sometimes only returns it unchanged as an rvalue. Then myvec[foo(i)] will cause i to be moved and not usable afterwards.

It would probably be best to keep things simple and optimize for the most common use case. These are pretty much always going to be simple integers so value semantics seems preferable.

dean0x7d on 13 Nov 2016

👍2

@dean0x7d said:
It would probably be best to keep things simple and optimize for the most common use case. These are pretty much always going to be simple integers so value semantics seems preferable.

I strongly agree with @SylvainCorlay and @dean0x7d on that; passing by value is preferable, and follows the conventions of the C++ standard library.

@jagerman said:
I agree: the ideal interface here is (size_t i_0, size_t i_1, size_t i_2, ...) where the number of arguments is variable. But I don't see an easy way to do that here (two things that occur to me: making each of the calls a recursive template so that all arguments really do have to be size_t; or using va_args). Neither is all that nice, but perhaps one or the other is worth doing anyway.

The recursive approach is fine, it is what we use in xtensor. It is more efficient, no additional array on the stack is required, and the compiler can inline it so the assembly is just exactly the computation of the index in the underlying buffer, nothing more. And you force conversion to size_t for free.

JohanMabille on 14 Nov 2016

👍1

@wjakob what do you think of the above?

SylvainCorlay on 15 Nov 2016

It sounds like a fairly low level change -- either is fine with me I guess. This piece of code "belongs" to @aldanor, who has done an excellent job designing these APIs -- so I ultimately defer to him.

wjakob on 15 Nov 2016

There is a significant performance difference too.

Regarding the dimension checks, I would like to remove them all and only check at the top-level calls to at and data.

SylvainCorlay on 15 Nov 2016

Significant performance difference for what?

Are the checks being duplicated somewhere or do you just want to get rid of them altogether? The latter was discussed previously and went in favor of checking.

dean0x7d on 15 Nov 2016

I would assume all of this gets inlined for any decent C++ compiler (the implementations of these functions are all trivial), so I'm surprised by this statement.

wjakob on 15 Nov 2016

I would assume all of this gets inlined for any decent C++ compiler (the implementations of these functions are all trivial), so I'm surprised by this statement.

The performance issue comes from the stack array that is created in get_byte_offset which we don't need in the case of the recursive implementation.

SylvainCorlay on 15 Nov 2016

Are the checks being duplicated somewhere or do you just want to get rid of them altogether? The latter was discussed previously and went in favor of checking.

I want the checks to only be at the top-level functions (at, data) and not in get_byte_offset which we want to use without performance penalty in xtensor.

SylvainCorlay on 15 Nov 2016

PR coming in a minute.

SylvainCorlay on 15 Nov 2016

Hi all, a bit late to the party.

Re: universal refs vs by-value, I'm personally fine with whichever way it is. Do we have a consensus on whether Ix&& should be converted Ix in array methods? Note that for literal ints and ints passed by-value the assembly would already be _identical_, you can easily check that. The only difference would be when you pass e.g. a const size_t& (why?..) and now it basically has to do a dereference. If we leave it be as universal refs, could do proper forwarding as noted above.

Re: checks, it's probably fine to remove the check from the lowest-level method (get_byte_offset()?) as long all higher-level methods remain bound-checked to avoid surprises. Or, alternatively, just provide a get_byte_offset_unchecked() -- that's a very common idiom e.g. in Rust (example).

aldanor on 15 Nov 2016

👍1

Actually, I would suggest to do what Eigen and similar libraries do here:

Simply

#if !defined(NDEBUG)
....
#endif

those parts. It makes a lot of sense to have checks for debugging, but they should probably be disabled for release builds.

wjakob on 15 Nov 2016

I like this idea even better.

SylvainCorlay on 15 Nov 2016

This one I'm not sure I fully agree with, tying bounds checking with a commonly used ndebug flag is very implicit; you switch for a release build and suddenly out of bounds exceptions become bad memory access. There's basically no way to re-enable bound checks in release builds then.

aldanor on 15 Nov 2016

py::array is essentially a convenient C++ wrapper for np.array and Python doesn't have an unchecked release mode. Translating Python's runtime checks/exception into C++ undefined behavior would be pretty surprising.

dean0x7d on 15 Nov 2016

@aldanor did you see the point about avoiding the temporary array (performance)?

The following produces simpler assembly

https://github.com/QuantStack/xtensor/blob/master/include/xtensor/xcontainer.hpp#L238

SylvainCorlay on 15 Nov 2016

There's really two use cases here, I guess, hence all the bikeshedding. One is NumPy-like interface with NumPy-like behaviour which implies bound checks and no UB. The other is downstream library code like xtensor that chooses to manage bound checks and other things on its own. Hence my suggestion of providing "unchecked" variants of a few methods.

aldanor on 15 Nov 2016

👍1

@SylvainCorlay yes, recursive would result in a simpler assembly indeed, that's something we should probably do

aldanor on 15 Nov 2016

https://github.com/pybind/pybind11/pull/500

SylvainCorlay on 15 Nov 2016

So the checks are moved out of byte_offset so that xtensor can use it unchecked.

We should still decide which top-level functions should have the checks and which not.

SylvainCorlay on 15 Nov 2016

Was this page helpful?

0 / 5 - 0 ratings