This issue came about when I discovered that sort!
on a small vector of integers was slower than usual; further inspection shows that extrema
is now 65% slower for a vector of 500 integers, and this is a result of bounds checks in each iteration.
I would not dare to fix this by adding @inbounds
to the loop of extrema
, since the function is defined for iterators in general (why is it extrema
's responsibility to signal that the internals of the iterator will do inbounds things?). I'd say an iterator over an array should guarantee to stay inbounds and not require its call site to provide @inbounds
.
But the issue is quite subtle. The function iterate(A::Array, ::Int)
does something that comes close to bounds checking
function iterate(A::Array, i=1)
@_propagate_inbounds_meta # why require the parent function to provide inbounds
i >= length(A) + 1 ? nothing : (A[i], i+1) # when we here (kinda) assert we are inbounds?
end
but it's not entirely bounds checking, because in principle i
could overflow, so to be safe an actual check is required to guarantee i > 0
.
~The issue is a result of #27079 (more robust iteration over vectors), but in fact it cheats a bit by requiring the user to always provide @inbounds
.~ Edit (sorry!): without inbounds for x in xs
would do bounds checking already.
Possible solutions:
for x in Mutable(xs)
that allows to modify xs
on the go; for x in xs
is then immutable iteration by convention and can be fast even without @inbounds
.extrema(::AbstractArray)
that uses @inbounds
(meh)@inbounds
in extrema
and document that it does thatcheckbounds(Bool, ::AbstractArray, ::Int)
and use that rather than the i >= length(A) + 1
condition, and somehow tell the compiler i
cannot overflow, so that the i > 0
check can be optimized away.Edit, seems like the new iteration protocol is enough to fix things.
This is quite closely related to https://github.com/JuliaLang/julia/pull/21402, but indeed things are different now that we only have one method — and don't need to assume that the caller did the start/next/done dance correctly.
extrema is now 65% slower for a vector of 500 integers
Slower than v0.6.x?
65% slower on 0.7 when comparing bounds checking enabled vs disabled. Full report:
> @benchmark extrema(v) setup = (v = rand(Int, 1000))
884.1 ns
on 0.6.2
1.222 μs
on 0.7.0-alpha.0
422.8 ns
with the proposed fix of #27386 (i.e. no bounds checking)
I have to agree with @haampie's general point here... generic code should be able to be just as fast as concretely-typed code, and the interfaces available for use by functions like extrema(::Any)
should be clear, fast and safe. (IMO we should try to avoid as much as possible the situation where a concrete version of perfectly valid generic method is added, with extra @inbounds
added for speed).
Have we discussed having an unsafe_iterate
(used when lowering for
loops) with a default fallback to iterate
, which was something already suggested for unsafe_next
in #15291 (and before)? The alternative seems to be asserting that all implementations of @inbounds iterate(...)
shouldn't be able to crash if the user doesn't mess with the iteration state (and have lowering do this for us?). Tricky, tricky...
I think we can do this without an unsafe_iterate
. What would that gain us beyond Harmen's proposal in https://github.com/JuliaLang/julia/issues/20469#issuecomment-396083994?
I guess the idea of unsafe_iterate
is that for x in X
would be lowered to it, which would allow getting rid of bounds checks? But maybe we can achieve the same result by lowering for x in X
to @inbounds iterate(...)
?
But doesn't the new iteration protocol give the iteration implementors enough information that they can safely add the @inbounds
themselves without any additional context from the caller? No longer do we have to trust that the caller has called done
— iterate
itself does that check. The only thing it doesn't protect ourselves from is someone manually constructing and passing an invalid state that happens to pass the check but it still out of bounds.
The PR over at #27386 demonstrates (or hopefully will demonstrate once Nanosoldier is feeling better) that it's possible to check the entire domain of the state without incurring a performance penalty.
Now, it's harder (impossible?) to check the entire domain with abstract iterator implementations, but https://github.com/JuliaLang/julia/issues/20469#issuecomment-396083994 is arguing that the iterator state is sufficiently complicated that it would require the caller to specifically craft an invalid state such that it passes the "done" check but still yields an out of bounds access. I was initially inclined to agree, but as I was writing this it dawned on me that the easiest way to mess this up would be by iterating over two arrays of different lengths at the same time and inadvertently mixing up the states. Is there another approach we could take here?
I'd really love it if we could avoid either kludge (that is, either unsafe_iterate
or @inbounds iterate
).
inadvertently mixing up the states
I did not think of that, and it looks pretty hard to avoid unfortunately.
The only thing it doesn't protect ourselves from is someone manually constructing and passing an invalid state that happens to pass the check but it still out of bounds.
Exactly. AFAICT the previous policy is that if no-one calls something called unsafe_xxx
, uses ccall
, or adds any @inbounds
, Julia will basically never crash (i.e. segfault).
I am however happy to agree to ditch that policy - but it's worth discussing.
(To be honest the fact that done
and next
are now together is only a minor mitigation of the fact that users can mess iteration up, since as mentioned they can still pass whatever state that they want).
I'd really love it if we could avoid either kludge (that is, either
unsafe_iterate
or@inbounds iterate
).
Rather silly idea: if we simply renamed iterate
to unsafe_iterate
then we are basically done (ugly, I know, but brutally honest).
Most helpful comment
But doesn't the new iteration protocol give the iteration implementors enough information that they can safely add the
@inbounds
themselves without any additional context from the caller? No longer do we have to trust that the caller has calleddone
—iterate
itself does that check. The only thing it doesn't protect ourselves from is someone manually constructing and passing an invalid state that happens to pass the check but it still out of bounds.The PR over at #27386 demonstrates (or hopefully will demonstrate once Nanosoldier is feeling better) that it's possible to check the entire domain of the state without incurring a performance penalty.
Now, it's harder (impossible?) to check the entire domain with abstract iterator implementations, but https://github.com/JuliaLang/julia/issues/20469#issuecomment-396083994 is arguing that the iterator state is sufficiently complicated that it would require the caller to specifically craft an invalid state such that it passes the "done" check but still yields an out of bounds access. I was initially inclined to agree, but as I was writing this it dawned on me that the easiest way to mess this up would be by iterating over two arrays of different lengths at the same time and inadvertently mixing up the states. Is there another approach we could take here?
I'd really love it if we could avoid either kludge (that is, either
unsafe_iterate
or@inbounds iterate
).