We discussed at JuliaCon what to do about #8149. I'm opening this issue to capture the specific design proposed at JuliaCon (and so that others can make corrections about things I misremembered).
In the new iteration protocol, there's only one function to be overloaded:
step(it, state=...)::Nullable
with for loops getting lowered as:
for i in it
#= loop body =#
end
equivalent to
x = step(it)
while !isnull(x)
i, state = unsafe_get(x)
#= loop body =#
x = step(it, state)
end
This takes care of iterators that don't know whether they're done until trying (just return Nullable{typeof(state}()
). One concern with this approach (and approaches like it) was whether there would be an extra allocation and branch, which would be an unacceptable performance regression. (Added in EDIT by @timholy: https://github.com/JuliaLang/julia/issues/16035#issue-150794573 and #9080.) The conclusion we came to was that the compiler should be able to optimize this out. To make sure that this happens, we may want to use a slightly different lowering, such as:
1:
x = step(it)
isnull(x) ? goto 4 : goto 3
2:
x = step(it, state)
isnull(x) ? goto 4 : goto 3
3:
#= loop body =#
goto 2:
4:
#= rest of function =#
The motivation for structuring things that way, is that simple step functions where performance really matters probably look something like:
function step(it, state)
if state > length(it)
Nullable{Tuple{T,Int}}()
else
Nullable{Tuple{T,Int}}((it[state], state+1))
end
end
which after inlining, should make this a fairly straightforward jump threading optimization (if the optimization doesn't happen we should figure out why and fix that).
Another nice property of this proposal is that it's backwards compatible with the existing scheme. The fallback method is simply:
@inline function step(it,state=start(it))
done(it) ? Nullable{Tuple{Any,typeof(state)}}() : Nullable{Tuple{Any, typeof(state)}}(next(it, state))
end
Now, the Nullable{Any,}
in there is not ideal, but forced inlining together with allocation elision should be able to handle that just fine (otherwise we could also do Nullable{eltype(it),...}
, which should work also, but has slightly greater risk of breaking).
cc @timholy @JeffBezanson - Please add anything I missed
👍
I think your conditions in your lowering should be
isnull(x) ? goto 4 : goto 3
Yes, you're right. Fixing.
Needing to write the type of the state that would be returned is a slight problem. Returning Nullable()
could work ok, since the result of get(::Union{Nullable{Int},Nullable{Union{}}})
is easy to infer, but this still adds an API issue of whether the parameter of the Nullable
is reliable.
but this still adds an API issue of whether the parameter of the Nullable is reliable.
Since the nullable is not really visible to users of the API, maybe the answer is no?
Though if we're willing to go the type unstable way, we could just use the empty tuple to indicate the end and avoid having to declare type parameters entirely
A small question/comment: I think Nullable(it[state], state+1)
is supposed to be Nullable((it[state], state+1))
and Nullable{Any,typeof(state)}()
is supposed to be Nullable{Tuple{Any,typeof(state)}}()
, right?
As regards type stability, you can always declare the method's return type via ::Nullable{Tuple{T, Int}}
. This allows return Nullable()
to do the conversion automatically. Also, you don't need to give the type parameters in Nullable((it[state], state+1))
. So overall you only have to declare them once (in the signature).
Finally, @davidagold recently added an unsafe_get
function which skips the isnull
check when you've already done it manually. Could be useful if the compiler isn't able to skip the redundant second check (I haven't verified whether that's the case in your example).
Corrections made. Yes, the return type annotation is a good point. But still, having to spell out that type is annoying.
Upon further discussion, it seems that we may be ok with having step
be fundamentally type unstable and improving the compiler to the point where we don't feel the overhead. In that case,
the step function would return Union{Void,Tuple{Element,State}}
. One of the major considerations there is the ease of actually writing these functions. Having to explicitly write the element types is very onerous.
That proposal would work especially nicely with the proposed syntax for a &&
-alike that returns nothing in the else case. Examples:
step(a::Vector, i=0) = i < length(a) ?? (a[i+1], i+1)
step(a::Range1) = (first(a),first(a))
step(a::Range1, i) = (i!=last(a)) ?? (i+1,i+1)
As a side note, one interesting aspect of this proposal is that the state now is the index of the last access element rather than the state of the next element as it was with the old iteration protocol.
Advantage: never having to return invalid state.
Thanks for writing this up! Would we need Any
for Base.HasEltype
iterators?
Upon further discussion, it seems that we may be ok with having step be fundamentally type unstable and improving the compiler to the point where we don't feel the overhead.
Fast enough not to hinder vectorization? For example, things like this use SIMD instructions currently, and it would be too bad to lose this:
function mysum(X::AbstractArray)
s = zero(eltype(X))
@inbounds for x in X
s += x
end
s
end
FWIW another approach to avoid the need for explicit return-type declarations would be to have a function-wise declaration which would apply to all methods:
function step{T}(itr, state::T)::Nullable{Tuple{eltype(itr), T}} end
Cf. @StefanKarpinski's recent post.
That doesn't work because the state may change type.
cross-referencing the related https://github.com/JuliaLang/julia/issues/16878
Unlikely to happen by end-of-year feature freeze?
I think sometime shortly after v0.6 is branched we can go ahead and implement this Julep:
julia> @inline iterate(x) = next(x, start(x))
iterate (generic function with 1 method)
julia> @inline iterate(x, i) = done(x, i) ? done : next(x, i)
iterate (generic function with 2 methods)
Where the lowering of a for-loop would look like:
let state = iterate(x)
while !(state === done)
body(state[1])
state = iterate(x, state[2])
end
end
The codegen for this is already looking pretty good.
That's awesome. I noted a couple of days ago that #9080 also seems less severe (but still not perfect), and specifically that the test case in https://github.com/JuliaLang/julia/issues/16035#issuecomment-214482039 seems fixed. (With recent PRs, the situation might be even better still.)
Returning the done
function is not a good way to express this, but it's great that the new union code gen can handle this efficiently. I think this is a case where we need to use a Nullable not for performance, but because it's the only way that iteration can return any possible value. It would be very strange if the array collect([start, done, next]) == [start]
.
This handles that case just fine (return values of done
or Pair{state, value}
), while wrapping it in a Nullable wouldn't actually work (we can't infer it as well and would usually generate worse code). I agree that punning on the done
singleton isn't necessary. It may be clearer to make a new explicit type.
Right, the terminal value is a state object, not a yielded object. So I guess this pattern would imply that the done
object can never be used as a state value, but that's an acceptable restriction.
No, it would imply that the done
instance can never be a Tuple / Pair. But that's trivial.
Might as well just use nothing
then.
There was some debate over whether we should distinguish the nothingness. For example, so that typos like iterate(x::Ref, state=nothing) = @inbounds x[]
throw errors instead of assuming the return nothing
meant stop iteration.
Would it make sense to use a custom type given the specific use case (ie: IterableState
vs Nullable
and isdone(i::IterableState)
vs isnull(i::Nullable)
)?
There is no IterableState or Nullable type in the above proposal – they would just get in the way of inference and slow it down. We might make an explicit sentinel value for done however (rather than punning on done
as I did in my example above)
Why not just use nothing
since any valid value would have to be a tuple, which nothing
is not?
nothing
would be my default choice, but there is some worry about returning it accidentally.
To test this design I'd like to see implementations of the product, flatten, and filter iterators --- those being some of the trickiest and most important. Ideally we could get nicer code and better performance.
And, looks like I just volunteered to do that :)
I tried this, and ran into a current show-stopping optimization miss. It appears we will need some way to elide allocation of the tuple in types like: Union{Tuple{String, Int64}, Void}
to make this feasible. Is that going to be possible?
Sorry to reopen this issue: Could one of the core developers post the appropriate if VERSION >= ...
statement to be used in libraries that want to support both the old and new iteration protocol.
You can use contrib/commit-name.sh
to find the version number for a commit hash. In this case, the iteration protocol merge is 0.7.0-DEV.5126
.
Most helpful comment
And, looks like I just volunteered to do that :)