Julia: Policy regarding @inbounds annotations

Created on 6 Feb 2017 · 9Comments · Source: JuliaLang/julia

This issue was raised at https://github.com/JuliaLang/julia/pull/20421#discussion_r99267449. We should define clear rules regarding when it's OK to use @inbounds in Base code. Currently the code seems to be inconsistent.

@JeffBezanson mentioned this: "only use @inbounds when you can be certain, from local information, that all accesses are in bounds." In a strict interpretation, that would mean @inbounds for i in eachindex(a) is not correct when a::AbstractArray, since an incorrect array implementation could return invalid indices, which would crash Julia.

This strict interpretation is problematic since (as @stevengj noted) often a generic AbstractArray method is also used for Array: if we cannot use @inbounds there, performance suffers in the common case just to avoid possible crashes in rare cases. A mechanism to enable @inbounds only for trusted types could help, but it would still be too bad that custom array types wouldn't benefit from bounds checking removal even when they implemented everything correctly: that would defeat the goal of making user-defined type as efficient as Base types. So I would say we need to trust custom array types, or at least allow them to opt out of bounds checking by stating that they are safe. Cf. https://github.com/JuliaLang/julia/pull/15291.

arrays performance

Source

nalimilan

Most helpful comment

Thanks. So at least we know where we're heading to.

Then I'd say policy should anticipate on this, and we should use @inbounds with any type as long as the indices are built in such a way that they will be valid if the AbstractArray implementation is correct (e.g. when using eachindex). Crashes will only happen if 1) the implementation is incorrect and 2) its author annotated @boundscheck sections. Sounds like a strict enough contract to me.

nalimilan on 6 Feb 2017

👍5

All 9 comments

The only problem here is that @inbounds has two implementations right now:

There's the new swanky generic version. This version _only_ affects AbstractArray implementations that have explicitly opted into bounds check removal with @boundscheck blocks. It only goes down one level of inlining. This is perfect — it allows for us to safely elide bounds checks in generic code and it'll only affect arrays that expect it.
But… Array still uses the old version of bounds check removal, implemented in codegen. This version affects all levels of inlining. It's leaky.

This means that custom array types are safe… unless they depend upon Array's bounds checking. The solution here isn't to rip out all support for fast and generic arrays, but rather to fix Array's bounds check elision.

julia> immutable BadVector{T} <: AbstractVector{Int}
           data::T
       end
       Base.size(X::BadVector) = size(X.data)
       Base.getindex(X::BadVector, i::Int) = X.data[i-1]

julia> BadVector(1:3)[:] # Nonscalar indexing is implemented with `@inbounds` and this is safe!
ERROR: BoundsError: attempt to access 3-element UnitRange{Int64} at index [0]
Stacktrace:
 [1] throw_boundserror(::UnitRange{Int64}, ::Int64) at ./abstractarray.jl:365
 [2] getindex at ./range.jl:512 [inlined]
 [3] getindex(::BadVector{UnitRange{Int64}}, ::Int64) at ./REPL[12]:5
 [4] macro expansion at ./multidimensional.jl:414 [inlined]
 [5] macro expansion at ./cartesian.jl:62 [inlined]
 [6] macro expansion at ./multidimensional.jl:411 [inlined]
 [7] _unsafe_getindex! at ./multidimensional.jl:406 [inlined]
 [8] macro expansion at ./multidimensional.jl:400 [inlined]
 [9] _unsafe_getindex(::Base.LinearSlow, ::BadVector{UnitRange{Int64}}, ::Base.Slice{Base.OneTo{Int64}}) at ./multidimensional.jl:393
 [10] macro expansion at ./multidimensional.jl:382 [inlined]
 [11] _getindex at ./multidimensional.jl:378 [inlined]
 [12] getindex(::BadVector{UnitRange{Int64}}, ::Colon) at ./abstractarray.jl:793

julia> BadVector([1,2,3])[:] # But this isn't because it propagates too far
3-element Array{Int64,1}:
 7955998441852384046
                   1
                   2

mbauman on 6 Feb 2017

👍1

Thanks. So at least we know where we're heading to.

nalimilan on 6 Feb 2017

👍5

I've been meaning to ask, since I've been writing some LinAlg code lately, exactly how far to push this as well.

Sometimes I'd like to be able to remove multiple checks of array sizes. For example, * might check that two input matrices have compatible sizes and allocate an outpu matrix, but the implementation of A_mul_B! will check the sizes again. I could use @inbounds and @boundscheck to make this happen just once - but would that be an abuse?

(And yes, for e.g. 3x3 matrix multiplication, the overhead of unnecessary things in Base can be significant.)

andyferris on 7 Feb 2017

A sub-question (raised in #21402) is whether @inbounds is OK when iterating over any type, or only for AbstractArray. I would advocate for the former, which avoids duplicating methods just to add @inbounds as in #21402.

nalimilan on 3 May 2017

So it looks like this just needs #20485 and maybe a review of the verbiage in the @inbounds and @boundscheck macros.

mbauman on 30 Aug 2017

@mbauman So do you think it's OK to add @inbounds in Base methods which apply to any iterable, or only to for working with AbstractArray? Cf. #21402.

nalimilan on 7 Sep 2017

Ah, I missed that this issue ~~arose from~~ was also connected to a non-AbstractArray use-case. I didn't explicitly answer that in the docstring, but yes, I think that we should be able to trust that an implementation is correct if they've explicitly added their own @boundscheck annotations.

mbauman on 7 Sep 2017

OK, thanks. So that means for #21402 I only need to add @inbounds annotations to the general method rather than duplicating it for AbstractArray.

nalimilan on 7 Sep 2017

(edit, just realized this issue was closed already!)

In #27384 I looked at the example of extrema, which accepts a generic iterator. To me it seemed rather uncomfortable to add @inbounds to a loop iterating over any iterator.

For iterators in general, my guess is that _almost always_ iteration can be guaranteed to go inbounds by the iterator itself*, at least for concrete implementations. That way we avoid placing awkward @inbounds annotations in functions that have no idea what iterators they're dealing with -- rather have them in the iterator itself.

* there's obviously no guarantee for AbstractArray, for various reasons: (1) the user can provide the iterator an invalid state, (2) the array is mutated during the iteration and (3) the array has a faulty, custom implementation. But:

(1) The iteration state of v::AbstractArray is a tuple of the eachindex(v) iterator and its state, so it takes some work to provide an invalid iteration state.
(2) This is no worse than https://github.com/JuliaLang/julia/issues/25743 where indexing of a view is known to be unsafe and a good solution that retains performance is just not in sight -- we just seem to live with that.
(3) This was countered above :)

My hope is that with the above policy, we could implement the AbstractArray iterator as

function iterate(A::AbstractArray, state=(eachindex(A),))
    y = iterate(state...)
    y === nothing && return nothing
    (@inbounds A[y[1]]), (state[1], tail(y)...)
end

so that we could limit the @inbounds annotations when iterating over generic arrays as much as possible.