Julia: generators should lower to `Iterators.map`, not `Base.Generator`

Created on 14 Jan 2020  路  11Comments  路  Source: JuliaLang/julia

This will allow dispatching to types that aren't always Base.Generator

triage

Most helpful comment

Here is another argument: If you want to define a special method for (f(x) for x in xs), it is kind of possible to do this with dispatching on Generator{<:MyContainer}. However, it does not scale well because Generator can be nested; e.g., dispatching on (g(y) for y in (f(x) for x in xs)) requires Generator{<:Generator{<:MyContainer}}. Lowering to Iterators.map helps because it lets you use arbitrary container type.

All 11 comments

Thanks for opening this. I think the same holds for Iterators.filter, Iterators.flatten, and Iterators.product.

Why?

It would be nice to be able to define function Iterators.map(a_function, thing::SpecialTypeOfArray) to return something other than a Generator. Of course, we can currently overload the Generator constructor to make something that's not a Generator, but it seems to violate the principle of least surprise. @tkf what was your reasoning?

Why do you need it to return something other than a Generator? A generator just contains the function and iterator, so no information is lost.

@bramtayl That's my thought too.

More specifically, for example, Iterators.map(very_special_function, ::SpecialTypeOfArray) can return a lazy AbstractArray if you have a compiler-agnostic way of computing output type of the very_special_function. Likewise, Iterators.product can easily return a lazy array, too. Iterators.flatten can also return an array if the input is an array of static arrays. OTOH, Generator cannot be an array subtype as it has to be able wrap other things (e.g., Iterators.filter)

Another usecase is to overload Iterators.map(f, ::MyTable) :: MyLazyTable. You can then overload show of MyLazyTable to preview computed result like Query is doing.

(But maybe the first point should be resolved using traits, as discussed in the ArrayLike PR #34196)

As I understand it right now, that'll easen the implementation of all types of operators which intend to propagate something. For example a caching iterator which automatically caches the outermost iteration call. Even if it's used as the innermost starter. This would work by dispatching on

Iterators.map(f, c::Caching) = cache(f(x) for x in c.iter)

where cache(iter) creates the caching layer.

Now, (f(x) for x in cache(1:10)) would be the same as cache(f(x) for x in 1:10)

Also, it'd allow custom return types which in turn could implement arbitrary traits. Currently, as I understand it, you can only implement that by either not using the generator syntax or by specializing the trait for ::Generator{myf, myiter}

Edit: We'd even be able to explicitly fuse multiple Generator layers (instead of nesting them):

Iterators.map(f, c::Generator) = ((f鈭榗.f)(x) for x in c.iter)

(Sure that already could be done aswell but it'd fit into the same generic style very well!)

I support multiple dispatch for construction of generic generators based on lowering to Iterators.map.

A container has a larger API surface than iteration, and if Base.map can conserve these features I don't know why generators and Iterators.map wouldn't also try to conserve indexing and other rich and useful behaviors on the lazily-mapped container. The first example that comes to mind is that a generator of arrays or dictionaries could return something indexable - ideally closely adhering to interfaces like AbstractArray or AbstractDict or, in my case, Dictionaries.AbstractDictionary. (There is the issue of it being ElTypeUnknown, of course...)

Here is another argument: If you want to define a special method for (f(x) for x in xs), it is kind of possible to do this with dispatching on Generator{<:MyContainer}. However, it does not scale well because Generator can be nested; e.g., dispatching on (g(y) for y in (f(x) for x in xs)) requires Generator{<:Generator{<:MyContainer}}. Lowering to Iterators.map helps because it lets you use arbitrary container type.

Why is that:

julia> Iterators.map(x->x^2, 1:10) isa Base.Generator
false

julia> Iterators.map==map
true

I mean, shouldn't Iterators.map be the lazy variant in the first place? Otherwise lowering the generator syntax to Iterators.map would be very game changing in the first place since it wouldn't be lazy anymore. So either way, that should be changed even before changing the lowering IMO.

This is because Iterators.map is implemented in #34352 which just got merged. So, Iterators.map === Base.map is false in Julia >= 1.6 and true in Julia <= 1.5. This PR is what to do after #34352. As you said, (f(x) for x in xs) should always be lazy.

What are the chances that this will make it into LTS (=1.6 feature freeze)? I think it would be very important to have a generic style and meaning of the Iterators submodule and to be able to hook into its functions in a way that uses context information to eliminate unnecessary work. I.e. the Iterators.<fun> should only promise that whatever will be returned will be lazy. The current implementation which just points towards the lazy variants of the eager algorithms will become the default/fallback.implementation.

For example, having Iterators.product(CartesianIndices, CartesianIndices) return something of type CartesianIndices rather than just a generic ProductIterator (EDIT: that concrete example might be a bad one since a tuple of CartesianIndex is something else than a single flattened CartesianIndex). Encourage the specialization of functions in the Iterators submodule.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Keno picture Keno  路  3Comments

m-j-w picture m-j-w  路  3Comments

musm picture musm  路  3Comments

sbromberger picture sbromberger  路  3Comments

yurivish picture yurivish  路  3Comments