Variadic template methods of pybind::array and pybind::array_t used for accessing data, namely
data, mutable_data, offset_at, index_at, get_byte_offset, at, mutable_at
should take arguments by value instead of universal reference.
The semantic in the standard library for functions taking integral or iterator arguments is to take them by value. (For example std::vector<T>::operator[] takes its argument by value).
^@aldanor
The issue is that, although all arguments get converted to size_t eventually, the variable-arguments-through-a-template-parameter-pack approach being used here doesn't allow that implicit conversion to take place.
For stl functions like vector's operator[] I get implicit conversion because the argument is a size_t, which means that I can call it with anything that is implicitly convertible to size_t, e.g.:
class SomeNonCopyableClass {
// ...
operator size_t() const { return /* ... */; }
};
// ...
SomeNonCopyableClass a(2);
std::cout << myvec[a] << "\n";
With pass-by-value semantics without the type being in the function signature, this fails because I can't copy my a object to pass it along: the deduced argument type will be SomeNonCopyableClass, not size_t. But the current code fails here, too, because the various argument chains actually _are_ passing by value rather than forwarding arguments. This should be fixed.
So I think, to have an stl-like interface that takes anything convertible to a size_t, the code should _keep_ the && arguments, but needs to also use perfect forwarding, e.g. by changing the calls such as next_method(index...) to either next_method(std::forward<Ix>(index)...) (to pass the conversion down the line), or next_method(size_t(index)...) to do the conversion immediately.
On a side note: there's an index_at method that current takes its arguments by lvalue reference, which looks like a typo (i.e. it's missing the second &).
So I think, to have an stl-like interface that takes anything convertible to a size_t, the code should keep the && arguments, but needs to also use perfect forwarding, e.g. by changing the calls such as next_method(index...) to either next_method(std::forward
(index)...) (to pass the conversion down the line), or next_method(size_t(index)...) to do the conversion immediately.
Can you think of a realistic case where people would be indexing with non-integral types convertible to size_t?
Sure; perhaps I encode something else in my indices--a row title, for instance--and so my index objects are:
struct myindex {
size_t i;
std::string name;
operator size_t() const { return i; }
};
Sure, there are other ways I could do this, but what is the compelling reason to _not_ allow arbitrary types that are implicitly convertible to size_t to be used? (In other words, what does pass-by-value gain over perfect forwarding?)
Sure, there are other ways I could do this, but what is the compelling reason to not allow arbitrary types that are implicitly convertible to size_t to be used? (In other words, what does pass-by-value gain over perfect forwarding?)
Not much except that it is the most used semantic for integral types, so universal references go against the principle of least surprise.
In some unlikely cases it can have border effects, such as an object being moved unexpectedly. For example if you have a function foo that transform the index, but sometimes only returns it unchanged as an rvalue (effectively equivalently to std::move). Then myvec[foo(i)] will cause i to be moved and not usable afterwards.
Regarding your concern regarding non-copyable things, I think that it is a good assumptions / requirement for and index to have a value semantic.
I agree: the ideal interface here is (size_t i_0, size_t i_1, size_t i_2, ...) where the number of arguments is variable. But I don't see an easy way to do that here (two things that occur to me: making each of the calls a recursive template so that all arguments really do have to be size_t; or using va_args). Neither is all that nice, but perhaps one or the other is worth doing anyway.
In some unlikely cases it can have border effects, such as an object being moved unexpectedly. For example if you have a function foo that transform the index, but sometimes only returns it unchanged as an rvalue. Then myvec[foo(i)] will cause i to be moved and not usable afterwards.
That would be a fault in foo, not in myvec::operator[]. There's really no way to accidentally move out an lvalue -- std::move is required and very loud. (Pure rvalues are moved silently, but there's nothing unexpected about that.)
The only downside of forwarding references is that they result in extra overhead (pointers to ints) in case the template function isn't inlined (unlikely for short templates, but it does happen). There's no difference for fully inlined templates.
It would probably be best to keep things simple and optimize for the most common use case. These are pretty much always going to be simple integers so value semantics seems preferable.
@dean0x7d said:
It would probably be best to keep things simple and optimize for the most common use case. These are pretty much always going to be simple integers so value semantics seems preferable.
I strongly agree with @SylvainCorlay and @dean0x7d on that; passing by value is preferable, and follows the conventions of the C++ standard library.
@jagerman said:
I agree: the ideal interface here is (size_t i_0, size_t i_1, size_t i_2, ...) where the number of arguments is variable. But I don't see an easy way to do that here (two things that occur to me: making each of the calls a recursive template so that all arguments really do have to be size_t; or using va_args). Neither is all that nice, but perhaps one or the other is worth doing anyway.
The recursive approach is fine, it is what we use in xtensor. It is more efficient, no additional array on the stack is required, and the compiler can inline it so the assembly is just exactly the computation of the index in the underlying buffer, nothing more. And you force conversion to size_t for free.
@wjakob what do you think of the above?
It sounds like a fairly low level change -- either is fine with me I guess. This piece of code "belongs" to @aldanor, who has done an excellent job designing these APIs -- so I ultimately defer to him.
There is a significant performance difference too.
Regarding the dimension checks, I would like to remove them all and only check at the top-level calls to at and data.
Significant performance difference for what?
Are the checks being duplicated somewhere or do you just want to get rid of them altogether? The latter was discussed previously and went in favor of checking.
I would assume all of this gets inlined for any decent C++ compiler (the implementations of these functions are all trivial), so I'm surprised by this statement.
I would assume all of this gets inlined for any decent C++ compiler (the implementations of these functions are all trivial), so I'm surprised by this statement.
The performance issue comes from the stack array that is created in get_byte_offset which we don't need in the case of the recursive implementation.
Are the checks being duplicated somewhere or do you just want to get rid of them altogether? The latter was discussed previously and went in favor of checking.
I want the checks to only be at the top-level functions (at, data) and not in get_byte_offset which we want to use without performance penalty in xtensor.
PR coming in a minute.
Hi all, a bit late to the party.
Re: universal refs vs by-value, I'm personally fine with whichever way it is. Do we have a consensus on whether Ix&& should be converted Ix in array methods? Note that for literal ints and ints passed by-value the assembly would already be _identical_, you can easily check that. The only difference would be when you pass e.g. a const size_t& (why?..) and now it basically has to do a dereference. If we leave it be as universal refs, could do proper forwarding as noted above.
Re: checks, it's probably fine to remove the check from the lowest-level method (get_byte_offset()?) as long all higher-level methods remain bound-checked to avoid surprises. Or, alternatively, just provide a get_byte_offset_unchecked() -- that's a very common idiom e.g. in Rust (example).
Actually, I would suggest to do what Eigen and similar libraries do here:
Simply
#if !defined(NDEBUG)
....
#endif
those parts. It makes a lot of sense to have checks for debugging, but they should probably be disabled for release builds.
I like this idea even better.
This one I'm not sure I fully agree with, tying bounds checking with a commonly used ndebug flag is very implicit; you switch for a release build and suddenly out of bounds exceptions become bad memory access. There's basically no way to re-enable bound checks in release builds then.
py::array is essentially a convenient C++ wrapper for np.array and Python doesn't have an unchecked release mode. Translating Python's runtime checks/exception into C++ undefined behavior would be pretty surprising.
@aldanor did you see the point about avoiding the temporary array (performance)?
The following produces simpler assembly
https://github.com/QuantStack/xtensor/blob/master/include/xtensor/xcontainer.hpp#L238
There's really two use cases here, I guess, hence all the bikeshedding. One is NumPy-like interface with NumPy-like behaviour which implies bound checks and no UB. The other is downstream library code like xtensor that chooses to manage bound checks and other things on its own. Hence my suggestion of providing "unchecked" variants of a few methods.
@SylvainCorlay yes, recursive would result in a simpler assembly indeed, that's something we should probably do
So the checks are moved out of byte_offset so that xtensor can use it unchecked.
We should still decide which top-level functions should have the checks and which not.
Most helpful comment
That would be a fault in
foo, not inmyvec::operator[]. There's really no way to accidentally move out an lvalue --std::moveis required and very loud. (Pure rvalues are moved silently, but there's nothing unexpected about that.)The only downside of forwarding references is that they result in extra overhead (pointers to ints) in case the template function isn't inlined (unlikely for short templates, but it does happen). There's no difference for fully inlined templates.
It would probably be best to keep things simple and optimize for the most common use case. These are pretty much always going to be simple integers so value semantics seems preferable.