Julia: Julep: Revised iteration protocol

Created on 7 Oct 2016  ·  32Comments  ·  Source: JuliaLang/julia

We discussed at JuliaCon what to do about #8149. I'm opening this issue to capture the specific design proposed at JuliaCon (and so that others can make corrections about things I misremembered).

The new iteration protocol

In the new iteration protocol, there's only one function to be overloaded:

step(it, state=...)::Nullable

with for loops getting lowered as:

for i in it
    #= loop body =#
end

equivalent to

x = step(it)
while !isnull(x)
    i, state = unsafe_get(x)
    #= loop body =#
    x = step(it, state)
end

Implementation

This takes care of iterators that don't know whether they're done until trying (just return Nullable{typeof(state}()). One concern with this approach (and approaches like it) was whether there would be an extra allocation and branch, which would be an unacceptable performance regression. (Added in EDIT by @timholy: https://github.com/JuliaLang/julia/issues/16035#issue-150794573 and #9080.) The conclusion we came to was that the compiler should be able to optimize this out. To make sure that this happens, we may want to use a slightly different lowering, such as:

1:
x = step(it)
isnull(x) ? goto 4 : goto 3
2:
x = step(it, state)
isnull(x) ? goto 4 : goto 3
3:
#= loop body =#
goto 2:
4:
#= rest of function =#

The motivation for structuring things that way, is that simple step functions where performance really matters probably look something like:

function step(it, state)
if state > length(it)
Nullable{Tuple{T,Int}}()
else
Nullable{Tuple{T,Int}}((it[state], state+1))
end
end

which after inlining, should make this a fairly straightforward jump threading optimization (if the optimization doesn't happen we should figure out why and fix that).

Migration

Another nice property of this proposal is that it's backwards compatible with the existing scheme. The fallback method is simply:

@inline function step(it,state=start(it))
done(it) ? Nullable{Tuple{Any,typeof(state)}}() : Nullable{Tuple{Any, typeof(state)}}(next(it, state))
end

Now, the Nullable{Any,} in there is not ideal, but forced inlining together with allocation elision should be able to handle that just fine (otherwise we could also do Nullable{eltype(it),...}, which should work also, but has slightly greater risk of breaking).

cc @timholy @JeffBezanson - Please add anything I missed

breaking design julep

Most helpful comment

And, looks like I just volunteered to do that :)

All 32 comments

👍

I think your conditions in your lowering should be

isnull(x) ? goto 4 : goto 3

Yes, you're right. Fixing.

Needing to write the type of the state that would be returned is a slight problem. Returning Nullable() could work ok, since the result of get(::Union{Nullable{Int},Nullable{Union{}}}) is easy to infer, but this still adds an API issue of whether the parameter of the Nullable is reliable.

but this still adds an API issue of whether the parameter of the Nullable is reliable.

Since the nullable is not really visible to users of the API, maybe the answer is no?

Though if we're willing to go the type unstable way, we could just use the empty tuple to indicate the end and avoid having to declare type parameters entirely

A small question/comment: I think Nullable(it[state], state+1) is supposed to be Nullable((it[state], state+1)) and Nullable{Any,typeof(state)}() is supposed to be Nullable{Tuple{Any,typeof(state)}}(), right?

As regards type stability, you can always declare the method's return type via ::Nullable{Tuple{T, Int}}. This allows return Nullable() to do the conversion automatically. Also, you don't need to give the type parameters in Nullable((it[state], state+1)). So overall you only have to declare them once (in the signature).

Finally, @davidagold recently added an unsafe_get function which skips the isnull check when you've already done it manually. Could be useful if the compiler isn't able to skip the redundant second check (I haven't verified whether that's the case in your example).

Corrections made. Yes, the return type annotation is a good point. But still, having to spell out that type is annoying.

Upon further discussion, it seems that we may be ok with having step be fundamentally type unstable and improving the compiler to the point where we don't feel the overhead. In that case,
the step function would return Union{Void,Tuple{Element,State}}. One of the major considerations there is the ease of actually writing these functions. Having to explicitly write the element types is very onerous.

That proposal would work especially nicely with the proposed syntax for a &&-alike that returns nothing in the else case. Examples:

step(a::Vector, i=0) = i < length(a) ?? (a[i+1], i+1)

step(a::Range1) = (first(a),first(a))
step(a::Range1, i) = (i!=last(a)) ?? (i+1,i+1)

As a side note, one interesting aspect of this proposal is that the state now is the index of the last access element rather than the state of the next element as it was with the old iteration protocol.

Advantage: never having to return invalid state.

Thanks for writing this up! Would we need Any for Base.HasEltype iterators?

Upon further discussion, it seems that we may be ok with having step be fundamentally type unstable and improving the compiler to the point where we don't feel the overhead.

Fast enough not to hinder vectorization? For example, things like this use SIMD instructions currently, and it would be too bad to lose this:

function mysum(X::AbstractArray)
   s = zero(eltype(X))
   @inbounds for x in X
       s += x
   end
   s
 end

FWIW another approach to avoid the need for explicit return-type declarations would be to have a function-wise declaration which would apply to all methods:

function step{T}(itr, state::T)::Nullable{Tuple{eltype(itr), T}} end

Cf. @StefanKarpinski's recent post.

That doesn't work because the state may change type.

cross-referencing the related https://github.com/JuliaLang/julia/issues/16878

Unlikely to happen by end-of-year feature freeze?

I think sometime shortly after v0.6 is branched we can go ahead and implement this Julep:

julia> @inline iterate(x) = next(x, start(x))
iterate (generic function with 1 method)

julia> @inline iterate(x, i) = done(x, i) ? done : next(x, i)
iterate (generic function with 2 methods)

Where the lowering of a for-loop would look like:

let state = iterate(x)
    while !(state === done)
        body(state[1])
        state = iterate(x, state[2])
    end
end

The codegen for this is already looking pretty good.

That's awesome. I noted a couple of days ago that #9080 also seems less severe (but still not perfect), and specifically that the test case in https://github.com/JuliaLang/julia/issues/16035#issuecomment-214482039 seems fixed. (With recent PRs, the situation might be even better still.)

Returning the done function is not a good way to express this, but it's great that the new union code gen can handle this efficiently. I think this is a case where we need to use a Nullable not for performance, but because it's the only way that iteration can return any possible value. It would be very strange if the array collect([start, done, next]) == [start].

This handles that case just fine (return values of done or Pair{state, value}), while wrapping it in a Nullable wouldn't actually work (we can't infer it as well and would usually generate worse code). I agree that punning on the done singleton isn't necessary. It may be clearer to make a new explicit type.

Right, the terminal value is a state object, not a yielded object. So I guess this pattern would imply that the done object can never be used as a state value, but that's an acceptable restriction.

No, it would imply that the done instance can never be a Tuple / Pair. But that's trivial.

Might as well just use nothing then.

There was some debate over whether we should distinguish the nothingness. For example, so that typos like iterate(x::Ref, state=nothing) = @inbounds x[] throw errors instead of assuming the return nothing meant stop iteration.

Would it make sense to use a custom type given the specific use case (ie: IterableState vs Nullable and isdone(i::IterableState) vs isnull(i::Nullable))?

There is no IterableState or Nullable type in the above proposal – they would just get in the way of inference and slow it down. We might make an explicit sentinel value for done however (rather than punning on done as I did in my example above)

Why not just use nothing since any valid value would have to be a tuple, which nothing is not?

nothing would be my default choice, but there is some worry about returning it accidentally.

To test this design I'd like to see implementations of the product, flatten, and filter iterators --- those being some of the trickiest and most important. Ideally we could get nicer code and better performance.

And, looks like I just volunteered to do that :)

I tried this, and ran into a current show-stopping optimization miss. It appears we will need some way to elide allocation of the tuple in types like: Union{Tuple{String, Int64}, Void} to make this feasible. Is that going to be possible?

Sorry to reopen this issue: Could one of the core developers post the appropriate if VERSION >= ... statement to be used in libraries that want to support both the old and new iteration protocol.

You can use contrib/commit-name.sh to find the version number for a commit hash. In this case, the iteration protocol merge is 0.7.0-DEV.5126.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wilburtownsend picture wilburtownsend  ·  3Comments

Keno picture Keno  ·  3Comments

omus picture omus  ·  3Comments

musm picture musm  ·  3Comments

manor picture manor  ·  3Comments