Julia: Julep: Syntax for reduction (overloadable splatting?)

Created on 10 Aug 2019  Â·  62Comments  Â·  Source: JuliaLang/julia

I suggest a mechanism to customize splatting behavior such that code like

+((x .- y).^2...) / length(x)  # compute MSE

can be executed efficiently without any allocations.

The idea is inspired by discussion in #29114 by @c42f et al.

Idea

Like dot-call syntax, I suggest to lower splatting to a series of function calls that are overloadable. One possibility is

# f(a...) is expanded to:
apply(f, Arguments(VA(splattable(a))))

# f(a, b..., c, d...) is expanded to:
apply(f, Arguments(a, VA(splattable(b)), c, VA(splattable(d))))

(To avoid recursion, this lowering should happen only when the call includes splatting.)

When the dot-call syntax appears in the splatting operand, I suggest _not_ materialize the dot-call. That is to say, for example, op(f.(xs)...) is lowered to

bc = broadcasted(f, xs)  # `bc` not materialized
apply(op, Arguments(VA(splattable(bc))))

This let us evaluate op(f.(xs)...) without any allocation once reduce/foldl supports Broadcasted object (#31020 started tackle this).

Interface

The lowering above requires the following interface functions and types:

function apply end

splattable(x) = x

struct VA{T}
    args::T
end

struct Arguments{T <: Tuple}
    args::T
    Arguments(args...) = new{typeof(args)}(args)
end
  • apply must be dispatched on the first argument type and may be dispatched on the second argument type.
  • Per-vararg processing should be done via splattable. This is analogous to broadcastable (@yurivish suggested this in https://github.com/JuliaLang/julia/issues/29114#issuecomment-515294850).
  • The type VA (whose name can/should be improved) must be used only for defining apply; its constructor must not be overloaded. This is for making it hard to break splatting semantics.
  • The constructor for type Arguments must not be overloaded for the same reason.

Using current Core._apply, the default apply can be implemented as

apply(f, x) = Core._apply(Core._apply, (f,), _default_splattables(x.args))
_default_splattables(args) = map(x -> x isa VA ? materialize(x.args) : (x,), args)

Example overloads

Associative binary operators

Many useful operations can be expressed using splatting into associative operators

+(xs...)      == sum(xs)
*(xs...)      == prod(xs)
min(xs...)    == minimum(xs)
max(xs...)    == maximum(xs)

or "mapped-splatting"

+(xs.^2...)      == norm(xs)
+(xs .* ys...)   == dot(xs, ys)

(This is reminiscent of the "big operator" in Fortress.)

There are also various other associative binary operators in Base. Invoking reduce with splatting could be useful:

const AssociativeOperator = Union{
    typoef(*),
    typoef(+),
    typoef(&),
    typoef(|),
    typoef(min),
    typoef(max),
    typoef(intersect),
    typoef(union),
    typoef(vcat),
    typoef(hcat),
    typoef(merge),
    # what else?
}

apply(op::AssociativeOperator, args::Arguments{Tuple{<:VA}}) =
    reduce(op, args.args[1].args)

For example, concatenating vectors would be efficiently done via vcat(vectors...) thanks to #27188.

It also is possible to support

op :: AssociativeOperator
op(a, bs..., cs...)

such that it is computed as

op(op(a, reduce(op, bs)), reduce(op, cs))

This may be implemented as

apply(op::AssociativeOperator, args::Arguments) = mapreduce(apply1, op, args.args)
_apply1(op::AssociativeOperator, args::VA) = reduce(op, args.args)
_apply1(op, x) = x

Note that, since reduce would degrade to foldl when the input is not an array (or not Broadcasted after #31020), we can also use it to fuse filtering with reduction

+((x for x in xs if x > 0)...)

Non-associative binary functions

Splatting is useful for non-associative binary functions:

const BinaryFunction = Union{
    typoef(/),
    typoef(-),
    typoef(intersect!),
    typoef(union!),
    typoef(merge!),
    typoef(append!),
    typoef(push!),
    # what else?
}

function apply(op::BinaryFunction, args::Arguments{Tuple{Any, <:VA}})
    @assert !(args.args[1] isa VA)
    return foldl(op, args.args[2].args; init=args.args[1])
end

or more generally

function apply(op::BinaryFunction, args::Arguments)
    @assert !(args.args[1] isa VA)
    return foldl(op, flatten(x isa VA ? x.args : (x,) for x in args.args[2:end]); init=args.args[1])
end

Matrix-vector multiplications

Not sure how many people need this, but *(matrices..., vector) can be (somewhat) efficiently evaluated by defining

apply(::typeof(*), args::Arguments{Tuple{<:VA, <:AbstractVector}}) =
    foldr(*, args.args[1].args; init=args.args[2])

(Of course, allocation could be much more minimized if we really want this.)

Similar optimization can be done for ∘(fs...); but I'm not sure about the exact usecase.

Higher-order functions (map(f, iters..) etc.)

As this mechanism let any function optimize splatting, higher-order functions that may call splatting of given function can be optimized by defining their own apply specialization. For example, map(f, iters..) can be specialized as

function apply(::typeof(map), args::Arguments{Tuple{Any, <:VA}})
    f = args.args[1]
    iters = args.args[2].args
    return map(splat(f), _zipsplat(iters))
end

where _zipsplat(iters) behaves like zip(iters...) but its element type does not have to be a Tuple.

Defining a similar overload for broadcasted may be possible provided that the object returned by _zipsplat(iters) is indexable. This let us nest reduction inside mapping and avoid allocation in some cases:

vector .= +.(eachcol(matrix)...)

Other map-like functions including map! and foreach can also implement this overload.

print-like functions

print, println and write can be invoked with varargs. We can make, e.g., println(xs...) more compiler friendly and efficient when xs is a generic iterator. Note that apply(string, Arguments(xs)) can also be implemented in terms of apply(print, Arguments(io, xs)).

splattable

splattable may be used to solve performance problem discussed in #29114:

splattable(x::Number) = (x,)
splattable(x::StaticArray) = Tuple(x)

In case of Broadcasted, it can be used for calling instantiate:

splattable(x::Broadcasted) = instantiate(x)

Most helpful comment

I'm pretty strongly against this. Actually, I think the idea is not best described as "overloadable splatting", but as "change ... to syntax for reduce instead of splatting". And that's not a bad idea --- maybe we should just add some syntax for reduce.

Splatting is supposed to be completely orthogonal to function calling. Calling functions is the core operation in the language, and splatting is just an alternate way to provide arguments. If this proposal were adopted, we would need some other way to call a function on arguments taken from a container. For example, this change makes it impossible to implement utilities like

∘(f, g) = (x...)->f(g(x...))

because there would be no way to generically take some arguments, and then pass all of them on to another function as if it were called directly with them.

So in short I'd favor coming up with some syntax for reduce rather than re-purposing ... and removing any ability to have a first-class representation of an argument list.

All 62 comments

I'm pretty strongly against this. Actually, I think the idea is not best described as "overloadable splatting", but as "change ... to syntax for reduce instead of splatting". And that's not a bad idea --- maybe we should just add some syntax for reduce.

Splatting is supposed to be completely orthogonal to function calling. Calling functions is the core operation in the language, and splatting is just an alternate way to provide arguments. If this proposal were adopted, we would need some other way to call a function on arguments taken from a container. For example, this change makes it impossible to implement utilities like

∘(f, g) = (x...)->f(g(x...))

because there would be no way to generically take some arguments, and then pass all of them on to another function as if it were called directly with them.

So in short I'd favor coming up with some syntax for reduce rather than re-purposing ... and removing any ability to have a first-class representation of an argument list.

Would it be a problem if there is a way to seal methods (#31222)? I imagine that #31222 allows us to construct lowering such that it is impossible to change the behavior when splatting tuples.

I understand that function call is too fundamental so that you don't want to play any magic with it. At the same time, since it is very fundamental, many people understand its semantics; including the users who are not very familiar with functional programming construct like reduce. Although..., if there is some syntax that resembles splatting, maybe that's enough? (I am a bit ambivalent about a new syntax for reduce. It's certainly is a _huge_ win if it has a concise syntax that interacts nicely with broadcasting.)

concise syntax that interacts nicely with broadcasting

Yes if we can get syntax for mapreduce which reuses broadcast syntax for the mapping part that could be compelling.

Maybe we just need more dots in more places :grimacing:

+...(xs.^2)   # mapreduce(x->x^2, +, xs)
              # [Edit: really reduce(+, broadcasted(literal_pow, ^, xs, Val(2)))]

(motivation: an ellipsis indicates more items in the list. In this case "many +'s")

@c42f Moving ... to outside is interesting! I thought comprehensions like +((x for x in xs if x > 0)...) could become hard to read when the expression is large. Putting ... right next to the binary operator is a good idea because you know that it is invoking a reduction right away: +...(x for x in xs if x > 0).

By the way, we only need the syntax for reduce and foldl because mapreduce and mapfoldl can be "derived" from reduce and foldl. That is to say, mapreduce(f, op, xs) is equivalent to reduce(op, broadcasted(f, xs)) once we have #31020. Generally speaking, it is actually redundant to define mapreduce/mapfoldl if you have transducers or iterator transforms (which include broadcasted).

we only need the syntax for reduce and foldl because mapreduce and mapfoldl can be "derived" from reduce and foldl

Agreed; all I really mean is that you want to be able compile to something as efficient as mapreduce using existing broadcast notation combined with some reduction syntax so that lowering can avoid emitting the intervening Base.materialize.

I checked whether +...(xs.^2) is currently a parse error.

To my surprise:

julia> :(+...(xs.^2))
:((Core.:(@int128_str))(".") .. xs .^ 2)

Haha! What.

Ok, harder problem: can we get syntax for reduce(f, A; init=x, dims=d)?

For init, the following might, maybe, be possible:

0 +... xs  # Now we're talking!
f...(0, xs)

For dims, the composable way seems to be broadcasting the reduction across slices from eachslice ().

(+...).(eachslice(A, dims=2))    # reduce(+, A, dims=1) if A is a matrix
+....(eachslice(A, dims=2))      # ?? Is it even possible to parse this? :-(

(+...).(eachslice(A, dims=2).^2)  # Well, it should work with broadcast at least

That's... a lot of dots...

Though a bit tangential, this discussion also reminds me of @mbauman's comment at https://github.com/JuliaLang/julia/pull/31217#issuecomment-468782768

Dealing with init makes me think that the rule should be something like

f...(a, xs; h=1, g=2)   # reduce((x1,x2)->f(x1, x2; h=1, g=2), xs, init=a)

but that naturally leads to wondering about other arguments to the "binary" operator

f...(a, xs, b, c; h=1, g=2)  # Unsure whether this can have a sensible meaning.

That's... a lot of dots...

Maybe throw more dots? How about

+...(x.^2 .+ 1)[., ., :]

# Equivalent to:
dropdims(reduce(+, (for x.^2 .+ 1); dims=(1, 2)); dims=(1, 2))
# `.`s inside `[., ., :]` become `dims`

and

y .= +...(x.^2 .+ 1)[., ., :]

# Equivalent to:
reducedim!(+, reshape(fill!(y, 0), (1, 1, size(y)...)), (for x.^2 .+ 1))
# `.`s insert singleton dimensions in the destination array

(I'm using for broadcasting_expression notation from #31553)

The dropdims in the first example is motivated by the second example. But I'm not super sure about this.

Regarding init, I think it is the reason why splatting is so appealing:

+(a, b...)           # reduce(+, (a, reduce(+, b))) or reduce(+, b; init=a)
+(a, b..., c..., d)  # reduce(+, (a, reduce(+, b), reduce(+, c), d))
push!([], b...)      # foldl(push!, b; init=[])

Actually, [.] can be used instead of splatting?:

+(a, b[.])           # => reduce(+, (a, reduce(+, b))) or reduce(+, b; init=a)
+(a, b[.], c[.], d)  # => reduce(+, (a, reduce(+, b), reduce(+, c), d))
push!([], b[.])      # => foldl(push!, b; init=[])
write(io, a[.])      # => foreach(x -> write(io, x), a)

+(x.^2 .+ 1)[., ., :]
# => dropdims(reduce(+, (for x.^2 .+ 1); dims=(1, 2)); dims=(1, 2))

y .= +(x.^2 .+ 1)[., ., :]
# => reducedim!(+, reshape(fill!(y, 0), (1, 1, size(y)...)), (for x.^2 .+ 1))

It also motivates dropdims.

Actually, [.] can be used instead of splatting?

Very interesting idea.

I think we're tackling two somewhat orthogonal problems here — slice iteration (AKA the curse of the metastasizing dims keyword) and reduction. What if a[., ., :] was syntax for slice iteration:

a[., ., :]
# (@view(a[i,j,:]) for i=axes(a,1), j=axes(a,2))  # or equivalent iterator type

Then you can express various useful things independently from having reduction syntax

sum.(a[., ., :])
# sum(a, dims=3)
sum.(a[., :, :])
# sum.(a, dims=(2,3))
sum.(a[., :, 1])
# sum.(a[:,:,1], dims=2)
diff.((a .* b)[., :])
# Fused version of diff(a .* b, dims=2) ?

For diff it's hard to see how dropdims on the .s could make sense but in other cases it might be desirable. Also transformation of the diff version into an in-place algorithm seems like it might be difficult in general. There's also something off here about concatenation vs non-concatenation of the arrays resulting from diff. Especially as applied to functions like unique which would result in a ragged array.

Still it would be neat if the dims keywords could be removed from all, any, prod, sum, maximum, minimum, argmax, argmin, cumsum, cumprod, reverse, diff etc etc.

sum.(a[., ., :])
# sum(a, dims=3)

If we are to go this direction, I think it'd better to drop . in sum. and let sum(a[., ., :]) be equivalent to sum(a, dims=(1, 2)) (or dropdims of it).

That is to say, there is a lazy object like

struct Axed{T, D <: Tuple{Vararg{Int}}}
    x::T
    dims::D
end

such that a[., ., :] is lowered to

Axed(a, (1, 2))

and sum has a method

sum(a::Axed) = sum(a.x; dims=a.dims)

or maybe

sum(a::Axed) = Axed(sum(a.x; dims=a.dims), a.dims)

so that dropdims can know the dims to be dropped.

Having said that, I think

two somewhat orthogonal problems

is not quite right (although I half agree). Specifying "loop axes" (dims) already signals that "I want something to happen along these axes." So, I think combining it with reduce syntax makes sense.

Can we have other syntax for sum(a, dims=(1, 2))? There are a few ASCII usable inside []:

x[?]
x[@]
x[=]
x[&]  # edit: actually this is currently valid and would be breaking
x[$]
x[_]  # but this one may go to anonymous function?

x[?, ?, :] kind of makes sense because ? is paired with :?

sum(x[&, &, :]) could mean "let me _refer_ to dimensions 1 and 2 in the summation." (edit: x[&] is currently valid so this can't be used.)

It looks like prefix ... is available?

+(...a)
+(a, ...b)
+(a, ...b, ...c, d)

So I guess we can use

y .= +(...(x.^2 .+ 1)[., ., :])

if "syntax orthogonalization" is desirable.

If we are to go this direction, I think it'd better to drop . in sum. and let sum(a[., ., :]) be equivalent to sum(a, dims=(1, 2)) (or dropdims of it).

Right, this makes particular sense for things like diff where there's a conflation of broadcasting (to compute the diff down a slice) and concatenation of the resulting diff'd slices together. I do think the broadcasted version makes sense for user-defined functions f returning a scalar: these can efficiently use f.(a[., ., :]) and importantly the user doesn't need to define the overload f(::Axed). Also it seems consistent and useful that things like unique.(a[., ., :]) work and return an array of arrays.

Specifying "loop axes" (dims) already signals that "I want something to happen along these axes." So, I think combining it with reduce syntax makes sense.

I agree that reduction is closely related and that the syntaxes should compose in a way which allows for lowing into efficient fused loops (ie, allow the compiler to remove the materialize completely or at least move it outward as far as possible). But syntax is such a precious resource that it seems a shame not to have it do double duty (or really, exponential duty if it composes nicely with all N other syntaxes!)

I must admit, I'm still somewhat taken by +... if it can be parsed because of the ellipsis analogy. For example,

+(a, ...bs, ...cs, d)
# vs
a +... bs +... cs + d

— the second seems much more like natural julia syntax to me. For more complex cases it's admittedly a toss up

y .= +(...(x.^2 .+ 1)[., ., :])
# vs
y .= +...(x.^2 .+ 1)[., ., :]

The ? syntax is an attractive alternative, perhaps a bit less visually confusing than all those tiny dots.

y .= +...(x.^2 .+ 1)[?, ?, :]

I do think the broadcasted version makes sense for user-defined functions f returning a scalar

I guess you can already do this with current broadcasting facility, like this?

f.(view.(a, axes(a, 1), reshape(axes(a, 2), 1, :), Ref(:)))

Yea, I know it's super ugly. But I think syntaxes to make this cleaner is already discussed elsewhere. I think we "just" need x.[i] from #30845 and &<var> from #27563

f.(@view a.[:, :, &:])

(or maybe even without @view if non-scalar indexing can be aware of &:?)

But syntax is such a precious resource that it seems a shame not to have it do double duty (or really, exponential duty if it composes nicely with all N other syntaxes!)

Good point. Thank you for pointing it out!

For example,

+(a, ...bs, ...cs, d)
# vs
a +... bs +... cs + d

— the second seems much more like natural julia syntax to me.

I totally agree that infix syntax is more natural in Julia for operations like+. But I think we need it for functions in general, even for non associative functions:

vcat(...f.(as), b, ...cs)
intersect(...f.(as), b, ...cs)
push!([], ...f.(as), b, ...cs)
write(io, ...f.(as), b, ...cs)

+ was just an example associative operator. Maybe using max in example was better.

But, to make the infix version and function call versions compatible, I think

a + ...bs + ...cs + d

would make more sense (although maybe parser would handle both cases anyway?).

Some of these proposals remind me of the (*,:) notation which JuliennedArrays used to have. Here's a quick hack to play with:

using JuliennedArrays
Base.getindex(A::Array, fs::Function...) = Slices(A, map(f -> f==(:) ? True() : False(), fs)...) 

ones(2,3)[:,*]       # 3-element Slices, ≈ eachcol(ones(2,3))
sum(ones(2,3)[:,*])  # == dropdims(sum(ones(2,3), dims=2), dims=2)
sum.(ones(2,3)[:,*]) # == dropdims(sum(ones(2,3), dims=1), dims=1)

A version of this (perhaps with a different symbol) might be nice as an alternative to dims keywords, although there is still some potential for confusion about which dimensions are which.

Reducing a broadcast like sum((x.^2 .+ 1)[:,*]) of course still materializes here. It would be great to have a lazy version, and something like y .= +...(x.^2 .+ 1)[:,*] doesn't look too crazy to me. Perhaps better than the earlier proposal of sum!(y, @: x.^2 .+ 1).

Perhaps better than the earlier proposal of sum!(y, @: x.^2 .+ 1).

@mcabbott I think non-materializing broadcast is orthogonal to the issue here because non-materializing broadcast is useful outside the context of reduction. For example, it lets you write a function

f(x) = @: x.^2 .+ 1

which does not have to care at all how this expression will be materialized (copy, copy!, reduce, etc.).

But syntax is such a precious resource that it seems a shame not to have it do double duty (or really, exponential duty if it composes nicely with all N other syntaxes!)

Good point. Thank you for pointing it out!

You're welcome, I think I should thank @andyferris for this observation that N orthogonal composable features leads to some kind of exponential functionality in N, but that N non-composable features leads to only O(N) functionality.

I think we "just" need x.[i] from #30845 and &<var> from #27563

Thanks for pointing these out again, they're extremely relevant.

There seems to be some confusion over at #30845 as to whether to_indices would be called before the broadcast or afterward. Matt originally suggested that it should be called before which I think means a.[:, :, &:] might need to be written a.[:, ^:, &:] in that proposal? If & becomes syntax for Ref it might also make some logical sense to use a simple & in place of &:. (And have Base.to_indices of & return :?):

f.(a.[:, ^:, &])
# [f(a[i,j,:]) for i = axes(a, 1), j = axes(a,2)]

At this point I'm thinking out loud and it's unclear to me how this could be fused with the reduction f. It also seems quite confusing that the reduction f happens not along the dimensions of a indexed with the :, but along the other dimension. (This is a direct result of the specialized placement of to_indices in the #30845 OP.)

@tkf I agree it would be awesome if some syntax for reduction could be used for foldl in non-associative cases. My thought there is that one could lower not to reduce, but to some function reduction(f, ...) which would dispatch on the type of the function f to determine whether to call reduce vs foldl (or maybe foldr?). (Or more specifically dispatch on a trait Associativity(f) which might default to a safe NonAssociative()). But at the moment this is more a vague thought bubble than a concrete idea.

Perhaps you could explain more why you'd like syntax for reducing multiple collections in one call? To me this seems sufficiently unusual that you could just chain the reductions (provided that the use of init is guaranteed). Is it the difference between writing the name of the reduction operator multiple times, or is there some deeper reason? That is:

vcat(...f.(as), b, ...cs)
# vs the much uglier
vcat...(vcat(vcat...(f.(as)), b)), cs)

Is it the difference between writing the name of the reduction operator multiple times, or is there some deeper reason?

Actually for the case of vcat there's the should-have-been-obvious reason that you'd like to preallocate the result array. Fair enough I guess; food for thought.

a.[:, :, &:] might need to be written a.[:, ^:, &:] in that proposal

Ah, I missed that. I think you are right.

use a simple & in place of &:

In previous comment I said x[&] was an available syntax but I was totally lying; it's usable now (e.g.; x = Dict(); x[&] = 1). So, I think using a.[:, ^:, &] is not possible (at least in 1.x) because it would be breaking (or at least confusing).

f.(a.[:, ^:, &])
# [f(a[i,j,:]) for i = axes(a, 1), j = axes(a,2)]

At this point I'm thinking out loud and it's unclear to me how this could be fused with the reduction f.

Isn't it already fused? I mean, it's a nested loop and allocating as-is, but if you specify the destination then it would be equivalent to

y .= f.(view.(a, axes(a, 1), reshape(axes(a, 2), 1, :), Ref(:)))

So "loops over axes 1 and 2" are fused with "copying to y."

It also seems quite confusing that the reduction f happens not along the dimensions of a indexed with the :, but along the other dimension. (This is a direct result of the specialized placement of to_indices in the #30845 OP.)

It may be tricky but it's something understandable once you think about the rules, isn't it? I mean, there are two nested loops (a loop inside f and a loop due to the dot calls) so I think it's natural that it's confusing at the first sight. Maybe the mnemonic is "&: means to pass reference to index to the callee." ... or maybe it's time to give up on index-free notation and go with index notation?

My thought there is that one could lower not to reduce, but to some function reduction(f, ...) which would dispatch on the type of the function f to determine whether to call reduce vs foldl

In the OP, my proposal was to not lower directly to reduce but rather to:

# f(a, b..., c, d...) is expanded to:
apply(f, Arguments(a, VA(splattable(b)), c, VA(splattable(d))))

So, apply(::typeof(f), ...) can call whatever f wants. But I think the basic guideline is that it should:

  • call reduce if f is an associative binary operator/function (e.g., +)
  • call foldl if f is a non-associative binary function (e.g., push!)
  • call foreach if f is purely for side-effect (e.g., print)

Those properties are already satisfied for splatting. That is to say,

+(numbers...)
push!([], objects...)
print(printables...)

already follow the guideline I just mentioned. It means that the users only have to learn to change xs... with ...xs to do what they want. For example, if you know that vcat(xs, ys) concatenates vector and how splatting works, it's just a very small step to write vcat(...vector_of_vectors).

(or maybe foldr?).

It was hard to come up with the case foldr is useful. The only one I could think of was:

...matrices * vector

(Oh, actually, this is why ... should be associated with an argument, not an operator.)

Is it the difference between writing the name of the reduction operator multiple times, or is there some deeper reason?

Other than the efficiency reason you just explained, we already have

vcat(f.(as)..., b, cs...)

which works nicely for tuples. So why not have a syntax that works with arbitrary iterators?

So, I think using a.[:, ^:, &] is not possible (at least in 1.x) because it would be breaking (or at least confusing).

It would be confusing, but : itself is a function, albeit more closely related to ranges and hence indexing. Regardless, it's not a big deal to use &: and might be clearer anyway.

Isn't it already fused? I mean, it's a nested loop and allocating as-is, but if you specify the destination then it would be equivalent to

y .= f.(view.(a, axes(a, 1), reshape(axes(a, 2), 1, :), Ref(:)))

So "loops over axes 1 and 2" are fused with "copying to y."

The allocation for the destination array y isn't what I'm thinking about. I'm wondering about how the call to view can get inserted in there. Not only that, but a very big part of the attraction of #30845 is the elimination of much of the views-vs-copies problem completely. It just doesn't appear to work in this case.

So, apply(::typeof(f), ...) can call whatever f wants. But I think the basic guideline is that it should...

Right, that seems clear. Sorry — I'd glossed over that part of the OP given that we're not going to use actual splatting for reduction. I wonder what a good default is out of reduce vs foldl vs foreach? It seems quite reasonable to have to define a Associative trait for the small number of associative functions, but choosing a good default between foldl vs foreach might be harder.

By the way, I would love to have println(...xs) or equivalent. My current workaround in the REPL is println.(xs); which I always feel a little dirty about :-)

: itself is a function

Good point. I guess it can be special-cased in .[] context, in principle.

I'm wondering about how the call to view can get inserted in there. Not only that, but a very big part of the attraction of #30845 is the elimination of much of the views-vs-copies problem completely. It just doesn't appear to work in this case.

Ah, I was assuming that @view macro does the magic and I missed that you didn't write @view. I guess you can also write f.(view.(a, :, ^:, &:)) although the definition of ^ may be unclear here. One alternative is to define it as "add a singleton dimension" https://github.com/JuliaLang/julia/issues/30845#issuecomment-458244811

I wonder what a good default is out of reduce vs foldl vs foreach?

I'd say reduce is out, since associativity is a strong condition. Actually, "foreach if side-effect" was a bit of simplification. What I really meant was "equivalent to splatting." For example, I think write(io, ...xs) should be implemented as

acc = 0
for x in xs
    acc += write(io, x)
end
acc

to be equivalent with write(io, xs...).

I think foldl is most useful and applicable to many functions out-of-the-box. For example, anything that works with reduce automatically works with foldl. I'm actually not sure if there should be a default. Maybe this should be an opt-in feature (e.g., using some kind of traits as you suggested)? But if there were a default I think it should be foldl.

I would _love_ to have println(...xs) or equivalent

OK. You will hate to know what I proposed... I wanted println(xs...) and println(...xs) to be equivalent in effect. So it's not actually foreach(println, xs) but rather foreach(print, xs); println()... But I agree foreach(println, xs) would be much more useful. I wonder if there is a principle such that foreach(println, xs) actually makes sense.

But I agree foreach(println, xs) would be much more useful. I wonder if there is a principle such that foreach(println, xs) actually makes sense.

Yes you're right; println is in no way the reduction operator, rather it's the mapping operator in a mapreduce with a kind of "null reduction". Perhaps that's just an unadorned ...println.(xs) :-)

If we had a function like

printlnio(io, x) = (println(io, x); io)

then printlnio(stdout, ...xs) works with the foldl version. In this case makes sense to have a shortcut printlnio(...xs) = printlnio(stdout, ...xs).

Alternatively, maybe having a helper function like

ln(x) = string(x, "\n")

for print(...ln.(xs)) maybe makes sense? It's useful for write(filename, ...ln.(join.(vector_of_vectors, ","))) to do a quick-and-dirty CSV writing.

This comment https://github.com/JuliaLang/julia/issues/16606#issuecomment-222233093 by @StefanKarpinski makes me realized that

there is a lazy object like

struct Axed{T, D <: Tuple{Vararg{Int}}}
    x::T
    dims::D
end

such that a[., ., :] is lowered to

Axed(a, (1, 2))

should really be

struct Axed{dims, T}
    x::T
end

a[., ., :] === Axed{(1, 2)}(a)

for type stability.

should really be

struct Axed{dims, T}
    x::T
end

a[., ., :] === Axed{(1, 2)}(a)

for type stability.

The Slices type of JuliennedArrays works a bit like this. Edit: In this notation there is no way to create Axed{(2,1)}(a), which Slices does not distinguish. There is also https://github.com/JuliaLang/julia/pull/32310 which defines an EachSlice type, with some differences, including caring about the order of dims.

I don't know if this is poor form, but this gist has some further attempts at making such slicing things work, very roughly. It can't of course parse a[.,.,:], so allows a[*,*,:] or a[&,&,:]. I wonder if the dots are really too small to see easily anyway.

Good point! A single dot . (or any other syntax) just can be lowered to a singleton object. The dimensions can then be extracted from the positions in the arguments. It's a nicer strategy as it works with function calls like view(x, ., ., 1:2) etc.

Perhaps one more relevant link is https://github.com/JuliaLang/julia/issues/29146 about mapslices. Perhaps this should be written

@views b[:, .] .= f.(a[:, .])  # foreach((x,y) -> x .= f(y), eachcol(b), eachcol(a))
c[:, .] := f.(a[:, .])         # c = mapslices(f, a, dims=1)

ideally with the f.() fused with the c = reduce(hcat(...)).

(Apologies if I confused things by editing above while you were typing!)

I think mapslices is orthogonal to the issue we discuss here. I think it would/could be automatically derived from #30845 and #27563. See the discussion about &: started at https://github.com/JuliaLang/julia/issues/32860#issuecomment-520337649 above.

In this notation there is no way to create Axed{(2,1)}(a)

Does reduce actually care about the order of dims?

julia> reduce(vcat, reshape(1:12, (2, 2, 3)); dims=(1,2), init=[])
1×1×3 Array{Array{Any,1},3}:
[:, :, 1] =
 [1, 2, 3, 4]

[:, :, 2] =
 [5, 6, 7, 8]

[:, :, 3] =
 [9, 10, 11, 12]

julia> reduce(vcat, reshape(1:12, (2, 2, 3)); dims=(2,1), init=[])
1×1×3 Array{Array{Any,1},3}:
[:, :, 1] =
 [1, 2, 3, 4]

[:, :, 2] =
 [5, 6, 7, 8]

[:, :, 3] =
 [9, 10, 11, 12]

@tkf I think I finally understand your analogy between splatting and reduction. Simply:

Many functions which take varargs are reductions in disguise. vcat and + being good examples.

The reasons these functions exist in the multi arg form is

  • syntactic convenience
  • efficiency

Therefore an ideal end game for reduction syntax+lowering is to allow the removal the multi-arg versions of these functions. But without loosing convenient syntax or efficiency.

That is, can we have syntax for vcat(a,b,c) when only the binary function vcat(a,b) is defined by the user? It's not just about splatting!

If we can do this, we get benefits exactly analogous to the sudden freedom of broadcast dot notation:

  • The author of a binary function doesn't need to foresee its use in reduction
  • Existing functions which take multiple args for reduction can be deleted

Thanks, that's a good way to motivate what I have in mind.

That is, can we have syntax for vcat(a,b,c) when only the binary function vcat(a,b) is defined by the user?

I guess an obvious choice would be vcat(a, ..., b, c). If ..., is lowered to ...(), then the definition I gave in the OP should take care of the rest.

But I think vcat(a, b, c) is still better. One way to do this automatically is to create a type hierarchy

abstract type AssociativeOperator <: Function end

(op::AssociativeOperator)(a, b, cs::Vararg) = op(op(a, b), reduce(op, cs))
# + the definition of `apply` in the OP

struct Plus <: Associative end
const + = Plus()
+(x, y) = ...

or maybe even

abstract type FoldableFunction <: Function end
abstract type AssociativeOperator <: FoldableFunction end

struct Plus <: Associative end
const + = Plus()

struct Push! <: FoldableFunction end
const push! = Push!()

(It would be nice if we can write function push! isa FoldableFunction end.)

  • The author of a binary function doesn't need to foresee its use in reduction

I half-agree with this. By that, I mean foldl-based reduction can be broadly applicable to many situations. However, you still need a mechanism to encode that, e.g., + is associative and has an identity. It also would be better to handle foreach-like behavior (print).

  • The author of a binary function doesn't need to foresee its use in reduction

I half-agree with this. By that, I mean foldl-based reduction can be broadly applicable to many situations. However, you still need a mechanism to encode that, e.g., + is associative and has an identity. It also would be better to handle foreach-like behavior (print).

I admit my statement was provocative ;-). In fact I do have this counter argument in mind: https://discourse.julialang.org/t/ann-transducers-jl-efficient-and-composable-algorithms-for-map-and-reduce-like-operations/19159/19

What I really mean is that there should be some foldl based default which can take many binary operators and make them useful without needing the author's consent. Then to add extra performance one should be able to extend some functions from the lowered form as relevant to this particular binary operator. Again, without the author's consent where necessary (though this would be type piracy).

Speaking of transducers, how do they fit into the picture here?

What I really mean is that there should be some foldl based default which can take many binary operators and make them useful without needing the author's consent.

Yes, I think it is worth emphasising this aspect. Maybe foldl-by-default is the way to go.

Speaking of transducers, how do they fit into the picture here?

My "secrete" plan here is to make it just work. It should be easy since Transducers.jl overloads Base.reduce and Base.foldl. For example, with the definition in the OP,

+(...eduction(x^2 for x in xs if x > 0))

will be dispatched to reduce(+, Filter(x -> x > 0) |> Map(x -> x^2), xs). This will then fuse filtering, mapping, and reduction (and execute them in parallel).

(eduction is a way to bundle transducers and input collection together. It also has an overload that converts iterator comprehensions to an eduction. See: https://tkf.github.io/Transducers.jl/dev/manual/#Transducers.eduction)

Nice. Even better, perhaps, if x^2 for x in xs if x > 0 would return eduction(Filter(x -> x > 0) |> Map(x -> x^2), xs) in the future (maybe in 2.0) :smile:

I just watched the transducers talk. The way we currently use Core._apply to concatenate all arguments into a buffer before applying, eg, vcat to that buffer is exactly the problem Rich Hickey is talking about when he says that transducers avoid putting data in concrete collections to pass between steps. In this language, our multiple arguments are processed as a Cat transducer taking a Tuple of the lowered arguments:

+(...f.(as), b, ...cs)
# eventually ends up in something like ?
reduce(+, eduction(Cat(), (Base.broadcasted(identity, as), (b,), cs)))

By the way, @andyferris had the following to say to me offline:

Ignore dims, reduction is about the iteration API not indexing.

I agree — while the question of notation for slice iteration is fascinating in its own right, I do think it's orthogonal. The gist @mcabbott linked is a pretty neat way to do that. Where it gets messy is wanting to fuse all the things...

Another syntax option is to have the ... on the operator/function, but to unescape arguments which should not be splatted:

vcat...(as, &b, cs)
# vs
vcat(...as, b, ...cs)

The idea is that it allow us to express vcat(a,b,c) in terms of the binary vcat even when none of the arguments are splatted. Unfortunately it's not pretty:

vcat...(&a, &b, &c) # ow my eyes.

It's a stronger analogy with broadcasting syntax where the . is on the operator and one must unescape things which shouldn't participate. At least 1 +... xs ≡ +...(1, xs) would work without escaping the 1 because numbers are self-iterating. It seems like there could be a close analogy with broadcastable vs iterable and unescaping.

Having said all that, a cleaner syntax for this case is just to use a tuple:

vcat...((a,b,c))
# or
vcat(...(a,b,c))

These are still much uglier than vcat(a,b,c) though.

if x^2 for x in xs if x > 0 would return eduction(Filter(x -> x > 0) |> Map(x -> x^2), xs) in the future (maybe in 2.0)

I don't think there is any fundamental reasons why automatic eduction cannot happen automatically in Base's reduce/foldl. At the level of API ("expressive power"), iterators and eductions are identical. I talked about it a bit in JuliaCon (see slide 32 and 33). The difference is in how you "generate" the code for reduction. We have plenty of performance specializations in Julia. So, I don't think it's super crazy to convert iterator-transform (Iterators.Filter) to transducer automatically in reduce/foldl.

@andyferris had the following to say to me offline:

Ignore dims, reduction is about the iteration API not indexing.

I agree

Me too. In fact, I think we already have arrived at the API where +...a[., :] just works without the "reduction syntax" knowing anything about a[., :]. It should work as long as reduce has specialization for the "container" a[., :].

(Well, I don't strictly agree with the "reduction is about the iteration API" part because reduction is much more general; e.g., it's parallelizable. But I'm guessing that's probably not what he meant.)

Another syntax option is to have the ... on the operator/function

One of the disadvantages with this approach is that we loose a chance to assign a meaning to +.(...xs) which could be useful, e.g., when xs = eachrcol(matrix).

IIUC your motivation is to "auto-define" splatting-like behavior? Then why not use abstract types to do it (https://github.com/JuliaLang/julia/issues/32860#issuecomment-521444963)? It would make vararg actually work and no new language constructs have to be introduced. (But types like AssociativeOperator are highly useful for reduction too.)

Another syntax option is to have the ... on the operator/function

One of the disadvantages with this approach is that we loose a chance to assign a meaning to +.(...xs) which could be useful, e.g., when xs = eachrcol(matrix).

Actually I'm confused by what what you want this to mean. Is it map(x->reduce(+,x), xs)?

IIUC your motivation is to "auto-define" splatting-like behavior? Then why not use abstract types to do it (#32860 (comment))? It would make vararg actually work and no new language constructs have to be introduced. (But types like AssociativeOperator are highly useful for reduction too.)

My observation is that vcat(a,b,c) is in fact the operation reduce(vcat, (a,b,c)), so having a single syntax for reduction which covers this case and which also covers cases like reduce(vcat, xs) might make sense.

Put another way, if you want vcat(a, b, c, d, ...xs) to turn into reduction, what if the only "splatted" argument is removed? Suddenly this means something completely different which seems odd.

About the AssociativeOperator approach: it's very neat that can be done already and it could bring some more useful meaning to the type tree of subtypes of Function.

But it's a very different way to address the observation that varargs methods of some functions arguably "shouldn't even exist" and instead be replaced by a reduction involving the binary version of that function only (if only we had a sufficiently slick syntax; honestly I'm not sure this is possible).

Another syntax option is to have the ... on the operator/function

One of the disadvantages with this approach is that we loose a chance to assign a meaning to +.(...xs) which could be useful, e.g., when xs = eachrcol(matrix).

Actually I'm confused by what what you want this to mean. Is it map(x->reduce(+,x), xs)?

My mental model is "prefix ... should act like splatting when the argument is a tuple." So, I was thinking reduce((x, y) -> map(+, x, y), xs).

varargs methods of some functions arguably "shouldn't even exist"

I see. So remove vararg definitions from various functions like +, vcat, push!, print etc.? I didn't realize that that's what you've been suggesting. But I agree that it would simplify the Base/stdlib interface a lot.

But... vcat(a, b, c) is such an easy syntax that I couldn't resist to suggest keeping it. That's why I initially suggested to overload splatting and keep insisting to use syntax that would _look_ like "just" a function call. This way, vcat(a, b, c) and vcat(a, b, ...cs) syntaxes can provide a uniform feel even though they would use very different code paths to reach reduce. I don't think this is too bad because, with abstract function subtypes, we can automate this plumbing part with a few lines of code.

This way, vcat(a, b, c) and vcat(a, b, ...cs) syntaxes can provide a uniform feel even though they would use very different code paths to reach reduce.

Yes, it's that differing code path which bothers me, I think it highlights a deeper design flaw in this way of arranging things. Again, compare to broadcast where nobody needs to worry about defining "broadcastable" functions; all functions are naturally broadcastable with a uniform syntax and implementation.

To come at this from another angle: in StaticArrays we're constantly fighting ambiguities in functions like vcat where the varargs list contains a mixture of static vs dynamically sized arrays, but nobody has added the necessary trait-based dispatch to the vcat reduction implementation. Contrast this with the broadcast machinery which gives a uniform way to speak about the result array type and we just don't have this problem.

My mental model is "prefix ... should act like splatting when the argument is a tuple."

The thing I find confusing about this is it supposes that the splatted version exists and has the meaning of a reduction. This is fine for +, but less obvious for generic binary operators.

An alternative meaning of prefix-... occurs to me which would make a certain kind of sense:

println(io, ...xs, " ", ...ys)
# foreach((x,y)->println(io, x, " ", y), xs, ys)

Or in general,

foo(a, b, ...cs, d, ...es)
# foreach((c,e)->foo(a, b, c, d, e), cs, es)

I realize this diverges in rather a different direction. Coming back to an earlier point you made about printlnio; I think it would be a bit odd to have a specially named function just for this. But it's an interesting idea and I do wonder whether it's a good motivation for having println itself return the io (though I'm not sure how this could be made non-annoying at the repl).

compare to broadcast where nobody needs to worry about defining "broadcastable" functions

On the other hand, the way how _arguments_ are broadcasted is highly customizable; there is an intricate pipeline for processing arguments to change the semantics (e.g., some types are considered scalar by default) and to choose the best broadcasting strategy (style). Also, strictly speaking, you can tweak broadcasted(::typeof(f), ...) to change the semantics of the function itself. Of course, it would be evil to do such a thing. But my point is that broadcasting works as long as implementers stick with the canonical semantics. It's not "safe-by-design." So, as long as we can assure that the code paths of f(xs...) and f(...xs) converges unless f provides an "evil overload" (which is easy if we can use abstract function types), no users have to worry about it more than they do for broadcasting. We can increase the safety further if the mechanism to "seal" methods is implemented #31222.

Another way to look at it is that in broadcasting the behavior of arguments are highly customizable while in the prefix-... syntax the behavior of function is highly customizable.

in StaticArrays we're constantly fighting ambiguities in functions like vcat

Isn't it an argument in favor of AssociativeOperator function type? There is no need to fight ambiguity because vararg version would be derived from binary method.

If I were to point out the design flaw, it's the fact that f(...xs) may choose to do reduce, foldl or "foreach" depending on the type of f. But it's already the case for splatting so it does not introduce the ambiguity any further. I'd even say abstract types like AssociativeOperator _decreases_ ambiguity because you can know that splatting call op(xs...) of an AssociativeOperator would do reduce (again, unless op defines an "evil overload").

My mental model is "prefix ... should act like splatting when the argument is a tuple."

The thing I find confusing about this is it supposes that the splatted version exists and has the meaning of a reduction. This is fine for +, but less obvious for generic binary operators.

I was just explaining an easy mnemonic. It perhaps is better to formalize f. as a lifted function bclift(f) which is just a curried version of broadcast (ignoring laziness).

bclift(f) = (args...) -> broadcast(f, args...)

Then

f.(...xs) == bclift(f)(...xs) == reduce(bclift(f), xs)

An alternative meaning of prefix-... occurs to me which would make a certain kind of sense:

println(io, ...xs, " ", ...ys)
# foreach((x,y)->println(io, x, " ", y), xs, ys)

This is broadcasting so this can be written as print(io, ...string.(xs, " ", ys, "\n")) in my proposal. As we already have dot-call, I think it's better to attach a new semantics to a new syntax.

in StaticArrays we're constantly fighting ambiguities in functions like vcat

Isn't it an argument in favor of AssociativeOperator function type? There is no need to fight ambiguity because vararg version would be derived from binary method.

It's an argument in favor of having some systematic way to treat reductions. AssociativeOperator is one way to do that and it should work out fine. If someone didn't see fit to declare their associative function f as a subtype of AssociativeOperator we'd always have f(...(a,b,c)) as a reasonable fallback.

f.(...xs) == bclift(f)(...xs) == reduce(bclift(f), xs)

Right this makes sense though I do find it confusing syntax. I think it's because the ... is inside the brackets while the . is outside. The proposed lowering flips things around with the reduction being outside and the mapping inside.

It's true that attaching the reduction to arguments does make a difference in what can be easily expressed. Roughly,

f.(...xs) == reduce((x,y)->map(f, x, y), xs)
# vs
(f...).(xs) == map(x->reduce(f, x), xs)
f.(...xs) == reduce((x,y)->map(f, x, y), xs)
# vs
(f...).(xs) == map(x->reduce(f, x), xs)

I didn't think of (f...).(xs) but it indeed sounds useful. I guess one way to formalize what you described is that a "modifier" (like . and ...) closer to the function lifts it first. For example, if I we had f? to lift functions to Union{T, Missing}, it would make sense to write +?(...xs) to mean

reduce((x, y) -> ismissing(x) || ismissing(y) ? missing : x + y, xs)

But under the "closer modifier lifts function first" rule, it wouldn't make sense to write (f...)?(xs) (or it won't be super useful). So, I think moving ... to the argument side makes sense.

I do find it confusing syntax.

I guess this is typical for when you are composing multiple higher order functions (e.g., transducer composition "feels" like happening in the reverse order)? I think that's why a mnemonic based on splatting is important.

Now that I used the word "lift", I think I may understand one of your points. Since the transformation of two-argument function to a reduction is a some kind of "lift", the syntax should act on the function, not the arguments. Is this your point?

But I think it an important observation is that we need "unlifting" operation like &/Ref and it is far more common in reduction than in broadcasting. In particular, you always need to unlift the first argument for push! and write. It may be possible workaround this for some functions by saying that the first argument is automatically unlifted. However, doing so with "nearly associative" binary functions append!/merge!/union!/intersect! would loose the ability to use it on single container (of containers) like Vector{<:Vector}. So, I think expressing lifting with respect to arguments is essential.

I took the liberty of renaming this topic again, as I think it's been a very interesting exploration of reduction syntax even though the option of overriding existing splatting syntax was quickly discarded.

Since the transformation of two-argument function to a reduction is a some kind of "lift", the syntax should act on the function, not the arguments. Is this your point?

Yes, I think that's accurate. I just noticed today that this is the model used by APL where the / operator lifts an arbitrary function into a reduction (for example in APL +/ is sum). See https://www.dyalog.com/uploads/documents/MasteringDyalogAPL.pdf for example. It seems that APL doesn't express the "multiple arguments to be reduced" case directly because its functions only take up to two arguments.

I'll have to think about unlifting some more. I'm still a bit confused about how precisely this interacts with the init argument to foldl when the unlifted argument can occur in any position.

Thanks, I guess the updated title reflect more what is going on here.

the model used by APL where the / operator lifts an arbitrary function into a reduction

FYI, there is a table of foldl/foldr syntaxes of various languages in https://en.wikipedia.org/wiki/Fold_(higher-order_function)#In_various_languages

J and Scala may be interesting in that they support initial value with APL like syntax. I think what I suggest is closest to C++.

Reading the table, I realized that another axis of discussion can be whether or not it makes sense to support reduce/foldl/foldr distinction at syntax level. The languages like J, Scala and C++ seem to support it. I think it makes sense to handle by the binary function implementers (as in the OP) since what should happen is most of the time clear from the function and argument type.

I'm still a bit confused about how precisely this interacts with the init argument to foldl when the unlifted argument can occur in any position.

Just in case looking at implementation can help, here is what I'd suggest for non-associative binary functions:

const FoldableFunction = Union{
    typeof(/),
    typeof(-),
    typeof(intersect!),
    typeof(union!),
    typeof(merge!),
    typeof(append!),
    typeof(push!),
}

struct ReduceMe{T}  # need to find a better name :)
    value::T
end

function apply(op::FoldableFunction, x, xs...)
    init = x isa ReduceMe ? foldl(op, x.value) : x
    return foldl(xs; init=init) do acc, input
        if input isa ReduceMe
            foldl(op, input.value, init=acc)
        else
            op(acc, input)
        end
    end
end

using Test
# push!(...[[], 1, 2])
@test apply(push!, ReduceMe([[], 1, 2])) == [1, 2]
# push!([], ...[1, 2])
@test apply(push!, [], ReduceMe([1, 2])) == [1, 2]
# push!([], ...[1, 2], 3, ...[4])
@test apply(push!, [], ReduceMe([1, 2]), 3, ReduceMe([4])) == [1, 2, 3, 4]

I am not super sure if the first argument should be allowed to be a "ReduceMe" but it is handy for "nearly symmetric" reducing function:

# intersect!(...[[1, 2], [1], [0, 1]])
@test apply(intersect!, ReduceMe([[1, 2], [1], [0, 1]])) == [1]

An interesting data point is something. It's "almost" an associative binary operator (the "find-first" monoid) but due to its eagerly throwing nature we _don't_ have

something(xs...) === reduce(something, xs)

Instead, more useful definition of something(...xs) might be

firstsomething(x, y) = x isa Nothing ? y : x
something(reduce(firstsomething, xs; init=nothing))
# something(reduce(right, ReduceIf(!isnothing), xs))  # using Transducers

Here is an example where using foldl for splatting becomes weird:

julia> xs = (:Base, :CoreLogging, :Info);

julia> foldl(getproperty, xs, init=Main)
Info

This code snippet is highly useful but getproperty(Main, xs...) is not defined. If we rely on the analogy, then maybe users would find it strange that getproperty(Main, ...xs) works.

Very interesting example. Again it makes me wonder if we should be looking to delete methods which have varargs versions which are implicitly reductions :-)

I think it's OK to get rid of vararg definitions as long as the "lifting" occurs per-argument basis rather than the whole function (i.e., f(x, ...y) and not f...(&x, y).

I have been thinking a bit about reductions over lazy collections. I've got a different tangent to suggest.

We have these awesome generators which can do map and filter, like

julia> (x^2 for x in 1:10 if iseven(x))
Base.Generator{Base.Iterators.Filter{typeof(iseven),UnitRange{Int64}},getfield(Main, Symbol("##41#42"))}(getfield(Main, Symbol("##41#42"))(), Base.Iterators.Filter{typeof(iseven),UnitRange{Int64}}(iseven, 1:10))

julia> collect(x^2 for x in 1:10 if iseven(x))
5-element Array{Int64,1}:
   4
  16
  36
  64
 100

So we have these beautiful almost-English sentences describing in a rather declaritive way an iterator (I kind of think of it as a "SQL for iterators"). However, one operation I often want to do is reductions. Take this example:

julia> reduce(+, x^2 for x in 1:10 if iseven(x))
220

So what if I could use a "generator-style" syntax for reductions as well? I'm going to suggest the syntax/operator over for this but there are probably better ideas:

julia> + over x^2 for x in 1:10 if iseven(x))  # Note: this doesn't actually work
220

The thing on the right of over is just a standard filtered generator. We should be able to + over anything:

julia> + over [4, 16, 36, 64, 100]
220

Therefore over becomes a binary operator for a function which basically goes like:

over(f, x) = reduce(f, x)

If you like... we could simply make reduce a binary operator.

julia> + reduce [4, 16, 36, 64, 100]
220

:)

I'm not sure how much + over xs is better than over(+, xs). As noted by Dijkstra, I think infix operators make sense primary when the operations are associative. (Although I do write x in xs when I can write in(x, xs)...)

But connection to the comprehension expressions is interesting. It makes me think that my emphasis on "lifting per argument" is yet another example of non-orthogonal features. The observation is that what I've been proposing with f(a, ...b, ...c, d) does two things; lifting f to foldl(f, _) and flatten the arguments. So, it can be expressed as

foldl(f, (x for y in (&a, b, c, &d) for x in y))

if &x means Ref(x). Can we have a syntax for writing lazy concatenation? Maybe

.[xs; ys; zs]

for Iterators.Flatten((xs, ys, zs))? Then, f(a, ...b, ...c, d) would be written as

f...(.[&a; b; c; &d])

or maybe we can have a shorthand for this:

f.[&a; b; c; &d]

It is nice because we have syntax for orthogonal features:

  1. &x for singleton container
  2. .[xs; ys; zs] for lazy concatenation
  3. f...(xs) for lifting to foldl(f, xs) etc.

Alternatively, f...(xs) can be written as f.[xs;]. A bit ugly part is that f.[xs] would be somewhat ambiguous since [xs] is not [xs;].

Having said that, we can also treat (...xs, ...ys, ...zs) as lazy concatenation and say that f(...xs, ...ys, ...zs) is a shorthand for f...((...xs, ...ys, ...zs)).

I think infix operators make sense primary when the operations are associative

Sure, but the generator/comprehension thing seems well-received (in more languages than Julia), and the for, in and if keywords are not associative operators (and I suspet the over thing may need to be keyword syntax rather than a simple binary operator anyway, which incidentally would allow it to interact with broadcasting .s).

I just realized that over is "implementable" already:

julia> const ^ᵒᵛᵉʳ = reduce;

julia> vcat ^ᵒᵛᵉʳ (1:x for x in 1:3)
6-element Array{Int64,1}:
 1
 1
 2
 1
 2
 3

julia> (+) ^ᵒᵛᵉʳ (x^2 for x in 1:3)  # + has to be in ()
14

Yeah, so it makes sense to make it a keyword if we go with this direction, to get rid of the extra parenthesis.

I still think over has some disadvantages, though. For example, it's hard to cover foldl(f, xs; init) with over, as you need to handle three arguments (e.g., foldl(push!, (1, 2, 3); init=[])).

But one advantage might be that + over xs is least scary syntax so far? One of my main motivations was to make reduce less scary and "easy" to use. That's why I've been trying to make an interface that hooks into people's intuition of splatting. But maybe it'd be more powerful to use English-like syntax if that's the primary goal?

Why does over need to be special syntax?

julia> struct Over{F}
           f::F
       end

julia> (o::Over)(args...; kwargs...) = reduce(o.f, args...; kwargs...)

julia> Over(push!)((1,2,3); init=[])
3-element Array{Any,1}:
 1
 2
 3

I think that's the same as asking why [f(x) for x in xs] is more "user friendly" than map(f, xs). My impression is that special syntax helps beginners to understand operations like this than using higher-order functions.

I think at this point, julia has so much special syntax that almost any new syntax we add at this point that's not just a slight generalization of existing syntax (e.g. a, b... = (1,2,3)), would be user unfriendly to add.

There's a huge amount of special forms to learn at this point.

Furthermore, I really don't see how f over x is any more clear or easy to understand than Over(f)(x). This is in stark contrast to [f(x) for x in xs] which is highly suggestive, unlike map.

reduce is the most important building block of data parallel computation. If we can help new users learning reduce (without them noticing it) by adding a syntax, I think it is worth the complexity.

a slight generalization of existing syntax (e.g. a, b... = (1,2,3))

f(...xs) is a "generalization" of f(xs...).

f(...xs) is a "generalization" of f(xs...).

Personally, I like the idea of f(...xs), but practically, I can't help but wonder "if we took a survey of 100 random julia users who haven't seen this thread and asked them

what would you expect f(...xs) to do?

I think that you'd be lucky if 5 of them guessed correctly. From that pov, adding a syntax like that is just another step along the road to becoming perl.

The most important thing I care at this point in this discussion is to make reduce/foldl easy to use, understand, and teach, as much as possible. I'm mostly sticking with f(...xs) now just because I thought it's the most "intuitive" syntax for me.

For example, Fortress has \big operator for turning a monoid to reduction. I think it's an interesting approach as you've seen \sum and \prod many times. It's probably not a good option for Julia, but it's an example that having "intuitive" syntax makes sense if you are serious about first-class data-parallel computing in your language.

Anyway, it's conceivable that "don't be Perl" is the best option possible at this point. I don't really know. But I don't think we have explored syntax enough and I think it's too early to say no more syntax.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

StefanKarpinski picture StefanKarpinski  Â·  3Comments

dpsanders picture dpsanders  Â·  3Comments

manor picture manor  Â·  3Comments

i-apellaniz picture i-apellaniz  Â·  3Comments

yurivish picture yurivish  Â·  3Comments