Julia: Julep: Revised iteration protocol

Created on 7 Oct 2016 · 32Comments · Source: JuliaLang/julia

We discussed at JuliaCon what to do about #8149. I'm opening this issue to capture the specific design proposed at JuliaCon (and so that others can make corrections about things I misremembered).

The new iteration protocol

In the new iteration protocol, there's only one function to be overloaded:

step(it, state=...)::Nullable

with for loops getting lowered as:

for i in it
    #= loop body =#
end

equivalent to

x = step(it)
while !isnull(x)
    i, state = unsafe_get(x)
    #= loop body =#
    x = step(it, state)
end

Implementation

This takes care of iterators that don't know whether they're done until trying (just return Nullable{typeof(state}()). One concern with this approach (and approaches like it) was whether there would be an extra allocation and branch, which would be an unacceptable performance regression. (Added in EDIT by @timholy: https://github.com/JuliaLang/julia/issues/16035#issue-150794573 and #9080.) The conclusion we came to was that the compiler should be able to optimize this out. To make sure that this happens, we may want to use a slightly different lowering, such as:

1:
x = step(it)
isnull(x) ? goto 4 : goto 3
2:
x = step(it, state)
isnull(x) ? goto 4 : goto 3
3:
#= loop body =#
goto 2:
4:
#= rest of function =#

The motivation for structuring things that way, is that simple step functions where performance really matters probably look something like:

function step(it, state)
if state > length(it)
Nullable{Tuple{T,Int}}()
else
Nullable{Tuple{T,Int}}((it[state], state+1))
end
end

which after inlining, should make this a fairly straightforward jump threading optimization (if the optimization doesn't happen we should figure out why and fix that).

Migration

Another nice property of this proposal is that it's backwards compatible with the existing scheme. The fallback method is simply:

@inline function step(it,state=start(it))
done(it) ? Nullable{Tuple{Any,typeof(state)}}() : Nullable{Tuple{Any, typeof(state)}}(next(it, state))
end

Now, the Nullable{Any,} in there is not ideal, but forced inlining together with allocation elision should be able to handle that just fine (otherwise we could also do Nullable{eltype(it),...}, which should work also, but has slightly greater risk of breaking).

cc @timholy @JeffBezanson - Please add anything I missed

breaking design julep

Source

Keno

👍9 ❤2

Most helpful comment

And, looks like I just volunteered to do that :)

JeffBezanson on 27 Jul 2017

🎉5 😄1

All 32 comments

👍

I think your conditions in your lowering should be

isnull(x) ? goto 4 : goto 3

kmsquire on 7 Oct 2016

Yes, you're right. Fixing.

Keno on 7 Oct 2016

Needing to write the type of the state that would be returned is a slight problem. Returning Nullable() could work ok, since the result of get(::Union{Nullable{Int},Nullable{Union{}}}) is easy to infer, but this still adds an API issue of whether the parameter of the Nullable is reliable.

JeffBezanson on 7 Oct 2016

but this still adds an API issue of whether the parameter of the Nullable is reliable.

Since the nullable is not really visible to users of the API, maybe the answer is no?

Keno on 7 Oct 2016

Though if we're willing to go the type unstable way, we could just use the empty tuple to indicate the end and avoid having to declare type parameters entirely

Keno on 7 Oct 2016

A small question/comment: I think Nullable(it[state], state+1) is supposed to be Nullable((it[state], state+1)) and Nullable{Any,typeof(state)}() is supposed to be Nullable{Tuple{Any,typeof(state)}}(), right?

As regards type stability, you can always declare the method's return type via ::Nullable{Tuple{T, Int}}. This allows return Nullable() to do the conversion automatically. Also, you don't need to give the type parameters in Nullable((it[state], state+1)). So overall you only have to declare them once (in the signature).

Finally, @davidagold recently added an unsafe_get function which skips the isnull check when you've already done it manually. Could be useful if the compiler isn't able to skip the redundant second check (I haven't verified whether that's the case in your example).

nalimilan on 7 Oct 2016

Corrections made. Yes, the return type annotation is a good point. But still, having to spell out that type is annoying.

Keno on 7 Oct 2016

Upon further discussion, it seems that we may be ok with having step be fundamentally type unstable and improving the compiler to the point where we don't feel the overhead. In that case,
the step function would return Union{Void,Tuple{Element,State}}. One of the major considerations there is the ease of actually writing these functions. Having to explicitly write the element types is very onerous.

That proposal would work especially nicely with the proposed syntax for a &&-alike that returns nothing in the else case. Examples:

step(a::Vector, i=0) = i < length(a) ?? (a[i+1], i+1)

step(a::Range1) = (first(a),first(a))
step(a::Range1, i) = (i!=last(a)) ?? (i+1,i+1)

As a side note, one interesting aspect of this proposal is that the state now is the index of the last access element rather than the state of the next element as it was with the old iteration protocol.

Keno on 13 Oct 2016

Advantage: never having to return invalid state.

StefanKarpinski on 13 Oct 2016

Thanks for writing this up! Would we need Any for Base.HasEltype iterators?

timholy on 13 Oct 2016

Upon further discussion, it seems that we may be ok with having step be fundamentally type unstable and improving the compiler to the point where we don't feel the overhead.

Fast enough not to hinder vectorization? For example, things like this use SIMD instructions currently, and it would be too bad to lose this:

function mysum(X::AbstractArray)
   s = zero(eltype(X))
   @inbounds for x in X
       s += x
   end
   s
 end

nalimilan on 13 Oct 2016

FWIW another approach to avoid the need for explicit return-type declarations would be to have a function-wise declaration which would apply to all methods:

function step{T}(itr, state::T)::Nullable{Tuple{eltype(itr), T}} end

Cf. @StefanKarpinski's recent post.

nalimilan on 13 Oct 2016

That doesn't work because the state may change type.

Keno on 13 Oct 2016

cross-referencing the related https://github.com/JuliaLang/julia/issues/16878

tkelman on 26 Oct 2016

Unlikely to happen by end-of-year feature freeze?

tkelman on 15 Dec 2016

I think sometime shortly after v0.6 is branched we can go ahead and implement this Julep:

julia> @inline iterate(x) = next(x, start(x))
iterate (generic function with 1 method)

julia> @inline iterate(x, i) = done(x, i) ? done : next(x, i)
iterate (generic function with 2 methods)

Where the lowering of a for-loop would look like:

let state = iterate(x)
    while !(state === done)
        body(state[1])
        state = iterate(x, state[2])
    end
end

The codegen for this is already looking pretty good.

vtjnash on 6 Apr 2017

That's awesome. I noted a couple of days ago that #9080 also seems less severe (but still not perfect), and specifically that the test case in https://github.com/JuliaLang/julia/issues/16035#issuecomment-214482039 seems fixed. (With recent PRs, the situation might be even better still.)

timholy on 6 Apr 2017

Returning the done function is not a good way to express this, but it's great that the new union code gen can handle this efficiently. I think this is a case where we need to use a Nullable not for performance, but because it's the only way that iteration can return any possible value. It would be very strange if the array collect([start, done, next]) == [start].

StefanKarpinski on 7 Apr 2017

This handles that case just fine (return values of done or Pair{state, value}), while wrapping it in a Nullable wouldn't actually work (we can't infer it as well and would usually generate worse code). I agree that punning on the done singleton isn't necessary. It may be clearer to make a new explicit type.

vtjnash on 7 Apr 2017

Right, the terminal value is a state object, not a yielded object. So I guess this pattern would imply that the done object can never be used as a state value, but that's an acceptable restriction.

StefanKarpinski on 7 Apr 2017

No, it would imply that the done instance can never be a Tuple / Pair. But that's trivial.

vtjnash on 7 Apr 2017

Might as well just use nothing then.

StefanKarpinski on 7 Apr 2017

There was some debate over whether we should distinguish the nothingness. For example, so that typos like iterate(x::Ref, state=nothing) = @inbounds x[] throw errors instead of assuming the return nothing meant stop iteration.

vtjnash on 7 Apr 2017

Would it make sense to use a custom type given the specific use case (ie: IterableState vs Nullable and isdone(i::IterableState) vs isnull(i::Nullable))?

rofinn on 21 Apr 2017

There is no IterableState or Nullable type in the above proposal – they would just get in the way of inference and slow it down. We might make an explicit sentinel value for done however (rather than punning on done as I did in my example above)

vtjnash on 21 Apr 2017

Why not just use nothing since any valid value would have to be a tuple, which nothing is not?

StefanKarpinski on 21 Apr 2017

nothing would be my default choice, but there is some worry about returning it accidentally.

JeffBezanson on 21 Apr 2017

To test this design I'd like to see implementations of the product, flatten, and filter iterators --- those being some of the trickiest and most important. Ideally we could get nicer code and better performance.

JeffBezanson on 27 Jul 2017

And, looks like I just volunteered to do that :)

JeffBezanson on 27 Jul 2017

🎉5 😄1

I tried this, and ran into a current show-stopping optimization miss. It appears we will need some way to elide allocation of the tuple in types like: Union{Tuple{String, Int64}, Void} to make this feasible. Is that going to be possible?

vtjnash on 3 Aug 2017

Sorry to reopen this issue: Could one of the core developers post the appropriate if VERSION >= ... statement to be used in libraries that want to support both the old and new iteration protocol.