This issue was raised at https://github.com/JuliaLang/julia/pull/20421#discussion_r99267449. We should define clear rules regarding when it's OK to use @inbounds
in Base code. Currently the code seems to be inconsistent.
@JeffBezanson mentioned this: "only use @inbounds
when you can be certain, from local information, that all accesses are in bounds." In a strict interpretation, that would mean @inbounds for i in eachindex(a)
is not correct when a::AbstractArray
, since an incorrect array implementation could return invalid indices, which would crash Julia.
This strict interpretation is problematic since (as @stevengj noted) often a generic AbstractArray
method is also used for Array
: if we cannot use @inbounds
there, performance suffers in the common case just to avoid possible crashes in rare cases. A mechanism to enable @inbounds
only for trusted types could help, but it would still be too bad that custom array types wouldn't benefit from bounds checking removal even when they implemented everything correctly: that would defeat the goal of making user-defined type as efficient as Base types. So I would say we need to trust custom array types, or at least allow them to opt out of bounds checking by stating that they are safe. Cf. https://github.com/JuliaLang/julia/pull/15291.
The only problem here is that @inbounds
has two implementations right now:
AbstractArray
implementations that have explicitly opted into bounds check removal with @boundscheck
blocks. It only goes down one level of inlining. This is perfect — it allows for us to safely elide bounds checks in generic code and it'll only affect arrays that expect it.Array
still uses the old version of bounds check removal, implemented in codegen. This version affects all levels of inlining. It's leaky.This means that custom array types are safe… unless they depend upon Array
's bounds checking. The solution here isn't to rip out all support for fast and generic arrays, but rather to fix Array
's bounds check elision.
julia> immutable BadVector{T} <: AbstractVector{Int}
data::T
end
Base.size(X::BadVector) = size(X.data)
Base.getindex(X::BadVector, i::Int) = X.data[i-1]
julia> BadVector(1:3)[:] # Nonscalar indexing is implemented with `@inbounds` and this is safe!
ERROR: BoundsError: attempt to access 3-element UnitRange{Int64} at index [0]
Stacktrace:
[1] throw_boundserror(::UnitRange{Int64}, ::Int64) at ./abstractarray.jl:365
[2] getindex at ./range.jl:512 [inlined]
[3] getindex(::BadVector{UnitRange{Int64}}, ::Int64) at ./REPL[12]:5
[4] macro expansion at ./multidimensional.jl:414 [inlined]
[5] macro expansion at ./cartesian.jl:62 [inlined]
[6] macro expansion at ./multidimensional.jl:411 [inlined]
[7] _unsafe_getindex! at ./multidimensional.jl:406 [inlined]
[8] macro expansion at ./multidimensional.jl:400 [inlined]
[9] _unsafe_getindex(::Base.LinearSlow, ::BadVector{UnitRange{Int64}}, ::Base.Slice{Base.OneTo{Int64}}) at ./multidimensional.jl:393
[10] macro expansion at ./multidimensional.jl:382 [inlined]
[11] _getindex at ./multidimensional.jl:378 [inlined]
[12] getindex(::BadVector{UnitRange{Int64}}, ::Colon) at ./abstractarray.jl:793
julia> BadVector([1,2,3])[:] # But this isn't because it propagates too far
3-element Array{Int64,1}:
7955998441852384046
1
2
Thanks. So at least we know where we're heading to.
Then I'd say policy should anticipate on this, and we should use @inbounds
with any type as long as the indices are built in such a way that they will be valid if the AbstractArray
implementation is correct (e.g. when using eachindex
). Crashes will only happen if 1) the implementation is incorrect and 2) its author annotated @boundscheck
sections. Sounds like a strict enough contract to me.
I've been meaning to ask, since I've been writing some LinAlg
code lately, exactly how far to push this as well.
Sometimes I'd like to be able to remove multiple checks of array sizes. For example, *
might check that two input matrices have compatible sizes and allocate an outpu matrix, but the implementation of A_mul_B!
will check the sizes again. I could use @inbounds
and @boundscheck
to make this happen just once - but would that be an abuse?
(And yes, for e.g. 3x3 matrix multiplication, the overhead of unnecessary things in Base
can be significant.)
A sub-question (raised in #21402) is whether @inbounds
is OK when iterating over any type, or only for AbstractArray
. I would advocate for the former, which avoids duplicating methods just to add @inbounds
as in #21402.
So it looks like this just needs #20485 and maybe a review of the verbiage in the @inbounds
and @boundscheck
macros.
@mbauman So do you think it's OK to add @inbounds
in Base methods which apply to any iterable, or only to for working with AbstractArray
? Cf. #21402.
Ah, I missed that this issue arose from was also connected to a non-AbstractArray
use-case. I didn't explicitly answer that in the docstring, but yes, I think that we should be able to trust that an implementation is correct if they've explicitly added their own @boundscheck
annotations.
OK, thanks. So that means for #21402 I only need to add @inbounds
annotations to the general method rather than duplicating it for AbstractArray
.
(edit, just realized this issue was closed already!)
In #27384 I looked at the example of extrema
, which accepts a generic iterator. To me it seemed rather uncomfortable to add @inbounds
to a loop iterating over any iterator.
For iterators in general, my guess is that _almost always_ iteration can be guaranteed to go inbounds by the iterator itself*, at least for concrete implementations. That way we avoid placing awkward @inbounds
annotations in functions that have no idea what iterators they're dealing with -- rather have them in the iterator itself.
* there's obviously no guarantee for AbstractArray
, for various reasons: (1) the user can provide the iterator an invalid state, (2) the array is mutated during the iteration and (3) the array has a faulty, custom implementation. But:
(1) The iteration state of v::AbstractArray
is a tuple of the eachindex(v)
iterator and its state, so it takes some work to provide an invalid iteration state.
(2) This is no worse than https://github.com/JuliaLang/julia/issues/25743 where indexing of a view is known to be unsafe and a good solution that retains performance is just not in sight -- we just seem to live with that.
(3) This was countered above :)
My hope is that with the above policy, we could implement the AbstractArray
iterator as
function iterate(A::AbstractArray, state=(eachindex(A),))
y = iterate(state...)
y === nothing && return nothing
(@inbounds A[y[1]]), (state[1], tail(y)...)
end
so that we could limit the @inbounds
annotations when iterating over generic arrays as much as possible.
Most helpful comment
Thanks. So at least we know where we're heading to.
Then I'd say policy should anticipate on this, and we should use
@inbounds
with any type as long as the indices are built in such a way that they will be valid if theAbstractArray
implementation is correct (e.g. when usingeachindex
). Crashes will only happen if 1) the implementation is incorrect and 2) its author annotated@boundscheck
sections. Sounds like a strict enough contract to me.