I suggest a mechanism to customize splatting behavior such that code like
+((x .- y).^2...) / length(x) # compute MSE
can be executed efficiently without any allocations.
The idea is inspired by discussion in #29114 by @c42f et al.
Like dot-call syntax, I suggest to lower splatting to a series of function calls that are overloadable. One possibility is
# f(a...) is expanded to:
apply(f, Arguments(VA(splattable(a))))
# f(a, b..., c, d...) is expanded to:
apply(f, Arguments(a, VA(splattable(b)), c, VA(splattable(d))))
(To avoid recursion, this lowering should happen only when the call includes splatting.)
When the dot-call syntax appears in the splatting operand, I suggest _not_ materialize the dot-call. That is to say, for example, op(f.(xs)...)
is lowered to
bc = broadcasted(f, xs) # `bc` not materialized
apply(op, Arguments(VA(splattable(bc))))
This let us evaluate op(f.(xs)...)
without any allocation once reduce
/foldl
supports Broadcasted
object (#31020 started tackle this).
The lowering above requires the following interface functions and types:
function apply end
splattable(x) = x
struct VA{T}
args::T
end
struct Arguments{T <: Tuple}
args::T
Arguments(args...) = new{typeof(args)}(args)
end
apply
must be dispatched on the first argument type and may be dispatched on the second argument type.splattable
. This is analogous to broadcastable
(@yurivish suggested this in https://github.com/JuliaLang/julia/issues/29114#issuecomment-515294850).VA
(whose name can/should be improved) must be used only for defining apply
; its constructor must not be overloaded. This is for making it hard to break splatting semantics.Arguments
must not be overloaded for the same reason.Using current Core._apply
, the default apply
can be implemented as
apply(f, x) = Core._apply(Core._apply, (f,), _default_splattables(x.args))
_default_splattables(args) = map(x -> x isa VA ? materialize(x.args) : (x,), args)
Many useful operations can be expressed using splatting into associative operators
+(xs...) == sum(xs)
*(xs...) == prod(xs)
min(xs...) == minimum(xs)
max(xs...) == maximum(xs)
or "mapped-splatting"
+(xs.^2...) == norm(xs)
+(xs .* ys...) == dot(xs, ys)
(This is reminiscent of the "big operator" in Fortress.)
There are also various other associative binary operators in Base. Invoking reduce
with splatting could be useful:
const AssociativeOperator = Union{
typoef(*),
typoef(+),
typoef(&),
typoef(|),
typoef(min),
typoef(max),
typoef(intersect),
typoef(union),
typoef(vcat),
typoef(hcat),
typoef(merge),
# what else?
}
apply(op::AssociativeOperator, args::Arguments{Tuple{<:VA}}) =
reduce(op, args.args[1].args)
For example, concatenating vectors would be efficiently done via vcat(vectors...)
thanks to #27188.
It also is possible to support
op :: AssociativeOperator
op(a, bs..., cs...)
such that it is computed as
op(op(a, reduce(op, bs)), reduce(op, cs))
This may be implemented as
apply(op::AssociativeOperator, args::Arguments) = mapreduce(apply1, op, args.args)
_apply1(op::AssociativeOperator, args::VA) = reduce(op, args.args)
_apply1(op, x) = x
Note that, since reduce
would degrade to foldl
when the input is not an array (or not Broadcasted
after #31020), we can also use it to fuse filtering with reduction
+((x for x in xs if x > 0)...)
Splatting is useful for non-associative binary functions:
const BinaryFunction = Union{
typoef(/),
typoef(-),
typoef(intersect!),
typoef(union!),
typoef(merge!),
typoef(append!),
typoef(push!),
# what else?
}
function apply(op::BinaryFunction, args::Arguments{Tuple{Any, <:VA}})
@assert !(args.args[1] isa VA)
return foldl(op, args.args[2].args; init=args.args[1])
end
or more generally
function apply(op::BinaryFunction, args::Arguments)
@assert !(args.args[1] isa VA)
return foldl(op, flatten(x isa VA ? x.args : (x,) for x in args.args[2:end]); init=args.args[1])
end
Not sure how many people need this, but *(matrices..., vector)
can be (somewhat) efficiently evaluated by defining
apply(::typeof(*), args::Arguments{Tuple{<:VA, <:AbstractVector}}) =
foldr(*, args.args[1].args; init=args.args[2])
(Of course, allocation could be much more minimized if we really want this.)
Similar optimization can be done for ∘(fs...)
; but I'm not sure about the exact usecase.
map(f, iters..)
etc.)As this mechanism let any function optimize splatting, higher-order functions that may call splatting of given function can be optimized by defining their own apply
specialization. For example, map(f, iters..)
can be specialized as
function apply(::typeof(map), args::Arguments{Tuple{Any, <:VA}})
f = args.args[1]
iters = args.args[2].args
return map(splat(f), _zipsplat(iters))
end
where _zipsplat(iters)
behaves like zip(iters...)
but its element type does not have to be a Tuple
.
Defining a similar overload for broadcasted
may be possible provided that the object returned by _zipsplat(iters)
is indexable. This let us nest reduction inside mapping and avoid allocation in some cases:
vector .= +.(eachcol(matrix)...)
Other map
-like functions including map!
and foreach
can also implement this overload.
print
-like functionsprint
, println
and write
can be invoked with varargs. We can make, e.g., println(xs...)
more compiler friendly and efficient when xs
is a generic iterator. Note that apply(string, Arguments(xs))
can also be implemented in terms of apply(print, Arguments(io, xs))
.
splattable
splattable
may be used to solve performance problem discussed in #29114:
splattable(x::Number) = (x,)
splattable(x::StaticArray) = Tuple(x)
In case of Broadcasted
, it can be used for calling instantiate
:
splattable(x::Broadcasted) = instantiate(x)
I'm pretty strongly against this. Actually, I think the idea is not best described as "overloadable splatting", but as "change ...
to syntax for reduce instead of splatting". And that's not a bad idea --- maybe we should just add some syntax for reduce
.
Splatting is supposed to be completely orthogonal to function calling. Calling functions is the core operation in the language, and splatting is just an alternate way to provide arguments. If this proposal were adopted, we would need some other way to call a function on arguments taken from a container. For example, this change makes it impossible to implement utilities like
∘(f, g) = (x...)->f(g(x...))
because there would be no way to generically take some arguments, and then pass all of them on to another function as if it were called directly with them.
So in short I'd favor coming up with some syntax for reduce rather than re-purposing ...
and removing any ability to have a first-class representation of an argument list.
Would it be a problem if there is a way to seal methods (#31222)? I imagine that #31222 allows us to construct lowering such that it is impossible to change the behavior when splatting tuples.
I understand that function call is too fundamental so that you don't want to play any magic with it. At the same time, since it is very fundamental, many people understand its semantics; including the users who are not very familiar with functional programming construct like reduce
. Although..., if there is some syntax that resembles splatting, maybe that's enough? (I am a bit ambivalent about a new syntax for reduce
. It's certainly is a _huge_ win if it has a concise syntax that interacts nicely with broadcasting.)
concise syntax that interacts nicely with broadcasting
Yes if we can get syntax for mapreduce
which reuses broadcast syntax for the mapping part that could be compelling.
Maybe we just need more dots in more places :grimacing:
+...(xs.^2) # mapreduce(x->x^2, +, xs)
# [Edit: really reduce(+, broadcasted(literal_pow, ^, xs, Val(2)))]
(motivation: an ellipsis indicates more items in the list. In this case "many +
's")
@c42f Moving ...
to outside is interesting! I thought comprehensions like +((x for x in xs if x > 0)...)
could become hard to read when the expression is large. Putting ...
right next to the binary operator is a good idea because you know that it is invoking a reduction right away: +...(x for x in xs if x > 0)
.
By the way, we only need the syntax for reduce
and foldl
because mapreduce
and mapfoldl
can be "derived" from reduce
and foldl
. That is to say, mapreduce(f, op, xs)
is equivalent to reduce(op, broadcasted(f, xs))
once we have #31020. Generally speaking, it is actually redundant to define mapreduce
/mapfoldl
if you have transducers or iterator transforms (which include broadcasted
).
we only need the syntax for reduce and foldl because mapreduce and mapfoldl can be "derived" from reduce and foldl
Agreed; all I really mean is that you want to be able compile to something as efficient as mapreduce
using existing broadcast notation combined with some reduction syntax so that lowering can avoid emitting the intervening Base.materialize
.
I checked whether +...(xs.^2)
is currently a parse error.
To my surprise:
julia> :(+...(xs.^2))
:((Core.:(@int128_str))(".") .. xs .^ 2)
Haha! What.
Ok, harder problem: can we get syntax for reduce(f, A; init=x, dims=d)
?
For init
, the following might, maybe, be possible:
0 +... xs # Now we're talking!
f...(0, xs)
For dims
, the composable way seems to be broadcasting the reduction across slices from eachslice
().
(+...).(eachslice(A, dims=2)) # reduce(+, A, dims=1) if A is a matrix
+....(eachslice(A, dims=2)) # ?? Is it even possible to parse this? :-(
(+...).(eachslice(A, dims=2).^2) # Well, it should work with broadcast at least
That's... a lot of dots...
Though a bit tangential, this discussion also reminds me of @mbauman's comment at https://github.com/JuliaLang/julia/pull/31217#issuecomment-468782768
Dealing with init
makes me think that the rule should be something like
f...(a, xs; h=1, g=2) # reduce((x1,x2)->f(x1, x2; h=1, g=2), xs, init=a)
but that naturally leads to wondering about other arguments to the "binary" operator
f...(a, xs, b, c; h=1, g=2) # Unsure whether this can have a sensible meaning.
That's... a lot of dots...
Maybe throw more dots? How about
+...(x.^2 .+ 1)[., ., :]
# Equivalent to:
dropdims(reduce(+, (for x.^2 .+ 1); dims=(1, 2)); dims=(1, 2))
# `.`s inside `[., ., :]` become `dims`
and
y .= +...(x.^2 .+ 1)[., ., :]
# Equivalent to:
reducedim!(+, reshape(fill!(y, 0), (1, 1, size(y)...)), (for x.^2 .+ 1))
# `.`s insert singleton dimensions in the destination array
(I'm using for broadcasting_expression
notation from #31553)
The dropdims
in the first example is motivated by the second example. But I'm not super sure about this.
Regarding init
, I think it is the reason why splatting is so appealing:
+(a, b...) # reduce(+, (a, reduce(+, b))) or reduce(+, b; init=a)
+(a, b..., c..., d) # reduce(+, (a, reduce(+, b), reduce(+, c), d))
push!([], b...) # foldl(push!, b; init=[])
Actually, [.]
can be used instead of splatting?:
+(a, b[.]) # => reduce(+, (a, reduce(+, b))) or reduce(+, b; init=a)
+(a, b[.], c[.], d) # => reduce(+, (a, reduce(+, b), reduce(+, c), d))
push!([], b[.]) # => foldl(push!, b; init=[])
write(io, a[.]) # => foreach(x -> write(io, x), a)
+(x.^2 .+ 1)[., ., :]
# => dropdims(reduce(+, (for x.^2 .+ 1); dims=(1, 2)); dims=(1, 2))
y .= +(x.^2 .+ 1)[., ., :]
# => reducedim!(+, reshape(fill!(y, 0), (1, 1, size(y)...)), (for x.^2 .+ 1))
It also motivates dropdims
.
Actually,
[.]
can be used instead of splatting?
Very interesting idea.
I think we're tackling two somewhat orthogonal problems here — slice iteration (AKA the curse of the metastasizing dims
keyword) and reduction. What if a[., ., :]
was syntax for slice iteration:
a[., ., :]
# (@view(a[i,j,:]) for i=axes(a,1), j=axes(a,2)) # or equivalent iterator type
Then you can express various useful things independently from having reduction syntax
sum.(a[., ., :])
# sum(a, dims=3)
sum.(a[., :, :])
# sum.(a, dims=(2,3))
sum.(a[., :, 1])
# sum.(a[:,:,1], dims=2)
diff.((a .* b)[., :])
# Fused version of diff(a .* b, dims=2) ?
For diff
it's hard to see how dropdims
on the .
s could make sense but in other cases it might be desirable. Also transformation of the diff
version into an in-place algorithm seems like it might be difficult in general. There's also something off here about concatenation vs non-concatenation of the arrays resulting from diff
. Especially as applied to functions like unique
which would result in a ragged array.
Still it would be neat if the dims keywords could be removed from all
, any
, prod
, sum
, maximum
, minimum
, argmax
, argmin
, cumsum
, cumprod
, reverse
, diff
etc etc.
sum.(a[., ., :]) # sum(a, dims=3)
If we are to go this direction, I think it'd better to drop .
in sum.
and let sum(a[., ., :])
be equivalent to sum(a, dims=(1, 2))
(or dropdims
of it).
That is to say, there is a lazy object like
struct Axed{T, D <: Tuple{Vararg{Int}}}
x::T
dims::D
end
such that a[., ., :]
is lowered to
Axed(a, (1, 2))
and sum
has a method
sum(a::Axed) = sum(a.x; dims=a.dims)
or maybe
sum(a::Axed) = Axed(sum(a.x; dims=a.dims), a.dims)
so that dropdims
can know the dims
to be dropped.
Having said that, I think
two somewhat orthogonal problems
is not quite right (although I half agree). Specifying "loop axes" (dims
) already signals that "I want something to happen along these axes." So, I think combining it with reduce
syntax makes sense.
Can we have other syntax for sum(a, dims=(1, 2))
? There are a few ASCII usable inside []
:
x[?]
x[@]
x[=]
x[&] # edit: actually this is currently valid and would be breaking
x[$]
x[_] # but this one may go to anonymous function?
x[?, ?, :]
kind of makes sense because ?
is paired with :
?
(edit: sum(x[&, &, :])
could mean "let me _refer_ to dimensions 1 and 2 in the summation."x[&]
is currently valid so this can't be used.)
It looks like prefix ...
is available?
+(...a)
+(a, ...b)
+(a, ...b, ...c, d)
So I guess we can use
y .= +(...(x.^2 .+ 1)[., ., :])
if "syntax orthogonalization" is desirable.
If we are to go this direction, I think it'd better to drop
.
insum.
and letsum(a[., ., :])
be equivalent tosum(a, dims=(1, 2))
(or dropdims of it).
Right, this makes particular sense for things like diff
where there's a conflation of broadcasting (to compute the diff down a slice) and concatenation of the resulting diff'd slices together. I do think the broadcasted version makes sense for user-defined functions f
returning a scalar: these can efficiently use f.(a[., ., :])
and importantly the user doesn't need to define the overload f(::Axed)
. Also it seems consistent and useful that things like unique.(a[., ., :])
work and return an array of arrays.
Specifying "loop axes" (dims) already signals that "I want something to happen along these axes." So, I think combining it with reduce syntax makes sense.
I agree that reduction is closely related and that the syntaxes should compose in a way which allows for lowing into efficient fused loops (ie, allow the compiler to remove the materialize
completely or at least move it outward as far as possible). But syntax is such a precious resource that it seems a shame not to have it do double duty (or really, exponential duty if it composes nicely with all N
other syntaxes!)
I must admit, I'm still somewhat taken by +...
if it can be parsed because of the ellipsis analogy. For example,
+(a, ...bs, ...cs, d)
# vs
a +... bs +... cs + d
— the second seems much more like natural julia syntax to me. For more complex cases it's admittedly a toss up
y .= +(...(x.^2 .+ 1)[., ., :])
# vs
y .= +...(x.^2 .+ 1)[., ., :]
The ?
syntax is an attractive alternative, perhaps a bit less visually confusing than all those tiny dots.
y .= +...(x.^2 .+ 1)[?, ?, :]
I do think the broadcasted version makes sense for user-defined functions
f
returning a scalar
I guess you can already do this with current broadcasting facility, like this?
f.(view.(a, axes(a, 1), reshape(axes(a, 2), 1, :), Ref(:)))
Yea, I know it's super ugly. But I think syntaxes to make this cleaner is already discussed elsewhere. I think we "just" need x.[i]
from #30845 and &<var>
from #27563
f.(@view a.[:, :, &:])
(or maybe even without @view
if non-scalar indexing can be aware of &:
?)
But syntax is such a precious resource that it seems a shame not to have it do double duty (or really, exponential duty if it composes nicely with all
N
other syntaxes!)
Good point. Thank you for pointing it out!
For example,
+(a, ...bs, ...cs, d) # vs a +... bs +... cs + d
— the second seems much more like natural julia syntax to me.
I totally agree that infix syntax is more natural in Julia for operations like+
. But I think we need it for functions in general, even for non associative functions:
vcat(...f.(as), b, ...cs)
intersect(...f.(as), b, ...cs)
push!([], ...f.(as), b, ...cs)
write(io, ...f.(as), b, ...cs)
+
was just an example associative operator. Maybe using max
in example was better.
But, to make the infix version and function call versions compatible, I think
a + ...bs + ...cs + d
would make more sense (although maybe parser would handle both cases anyway?).
Some of these proposals remind me of the (*,:)
notation which JuliennedArrays used to have. Here's a quick hack to play with:
using JuliennedArrays
Base.getindex(A::Array, fs::Function...) = Slices(A, map(f -> f==(:) ? True() : False(), fs)...)
ones(2,3)[:,*] # 3-element Slices, ≈ eachcol(ones(2,3))
sum(ones(2,3)[:,*]) # == dropdims(sum(ones(2,3), dims=2), dims=2)
sum.(ones(2,3)[:,*]) # == dropdims(sum(ones(2,3), dims=1), dims=1)
A version of this (perhaps with a different symbol) might be nice as an alternative to dims
keywords, although there is still some potential for confusion about which dimensions are which.
Reducing a broadcast like sum((x.^2 .+ 1)[:,*])
of course still materializes here. It would be great to have a lazy version, and something like y .= +...(x.^2 .+ 1)[:,*]
doesn't look too crazy to me. Perhaps better than the earlier proposal of sum!(y, @: x.^2 .+ 1)
.
Perhaps better than the earlier proposal of
sum!(y, @: x.^2 .+ 1)
.
@mcabbott I think non-materializing broadcast is orthogonal to the issue here because non-materializing broadcast is useful outside the context of reduction. For example, it lets you write a function
f(x) = @: x.^2 .+ 1
which does not have to care at all how this expression will be materialized (copy
, copy!
, reduce
, etc.).
But syntax is such a precious resource that it seems a shame not to have it do double duty (or really, exponential duty if it composes nicely with all N other syntaxes!)
Good point. Thank you for pointing it out!
You're welcome, I think I should thank @andyferris for this observation that N
orthogonal composable features leads to some kind of exponential functionality in N
, but that N
non-composable features leads to only O(N)
functionality.
I think we "just" need
x.[i]
from #30845 and&<var>
from #27563
Thanks for pointing these out again, they're extremely relevant.
There seems to be some confusion over at #30845 as to whether to_indices
would be called before the broadcast or afterward. Matt originally suggested that it should be called before which I think means a.[:, :, &:]
might need to be written a.[:, ^:, &:]
in that proposal? If &
becomes syntax for Ref
it might also make some logical sense to use a simple &
in place of &:
. (And have Base.to_indices
of &
return :
?):
f.(a.[:, ^:, &])
# [f(a[i,j,:]) for i = axes(a, 1), j = axes(a,2)]
At this point I'm thinking out loud and it's unclear to me how this could be fused with the reduction f
. It also seems quite confusing that the reduction f
happens not along the dimensions of a
indexed with the :
, but along the other dimension. (This is a direct result of the specialized placement of to_indices
in the #30845 OP.)
@tkf I agree it would be awesome if some syntax for reduction could be used for foldl
in non-associative cases. My thought there is that one could lower not to reduce
, but to some function reduction(f, ...)
which would dispatch on the type of the function f
to determine whether to call reduce
vs foldl
(or maybe foldr
?). (Or more specifically dispatch on a trait Associativity(f)
which might default to a safe NonAssociative()
). But at the moment this is more a vague thought bubble than a concrete idea.
Perhaps you could explain more why you'd like syntax for reducing multiple collections in one call? To me this seems sufficiently unusual that you could just chain the reductions (provided that the use of init
is guaranteed). Is it the difference between writing the name of the reduction operator multiple times, or is there some deeper reason? That is:
vcat(...f.(as), b, ...cs)
# vs the much uglier
vcat...(vcat(vcat...(f.(as)), b)), cs)
Is it the difference between writing the name of the reduction operator multiple times, or is there some deeper reason?
Actually for the case of vcat
there's the should-have-been-obvious reason that you'd like to preallocate the result array. Fair enough I guess; food for thought.
a.[:, :, &:]
might need to be writtena.[:, ^:, &:]
in that proposal
Ah, I missed that. I think you are right.
use a simple
&
in place of&:
In previous comment I said x[&]
was an available syntax but I was totally lying; it's usable now (e.g.; x = Dict(); x[&] = 1
). So, I think using a.[:, ^:, &]
is not possible (at least in 1.x) because it would be breaking (or at least confusing).
f.(a.[:, ^:, &]) # [f(a[i,j,:]) for i = axes(a, 1), j = axes(a,2)]
At this point I'm thinking out loud and it's unclear to me how this could be fused with the reduction
f
.
Isn't it already fused? I mean, it's a nested loop and allocating as-is, but if you specify the destination then it would be equivalent to
y .= f.(view.(a, axes(a, 1), reshape(axes(a, 2), 1, :), Ref(:)))
So "loops over axes 1 and 2" are fused with "copying to y
."
It also seems quite confusing that the reduction
f
happens not along the dimensions ofa
indexed with the:
, but along the other dimension. (This is a direct result of the specialized placement ofto_indices
in the #30845 OP.)
It may be tricky but it's something understandable once you think about the rules, isn't it? I mean, there are two nested loops (a loop inside f
and a loop due to the dot calls) so I think it's natural that it's confusing at the first sight. Maybe the mnemonic is "&:
means to pass reference to index to the callee." ... or maybe it's time to give up on index-free notation and go with index notation?
My thought there is that one could lower not to
reduce
, but to some functionreduction(f, ...)
which would dispatch on the type of the functionf
to determine whether to callreduce
vsfoldl
In the OP, my proposal was to not lower directly to reduce
but rather to:
# f(a, b..., c, d...) is expanded to: apply(f, Arguments(a, VA(splattable(b)), c, VA(splattable(d))))
So, apply(::typeof(f), ...)
can call whatever f
wants. But I think the basic guideline is that it should:
reduce
if f
is an associative binary operator/function (e.g., +
)foldl
if f
is a non-associative binary function (e.g., push!
)foreach
if f
is purely for side-effect (e.g., print
)Those properties are already satisfied for splatting. That is to say,
+(numbers...)
push!([], objects...)
print(printables...)
already follow the guideline I just mentioned. It means that the users only have to learn to change xs...
with ...xs
to do what they want. For example, if you know that vcat(xs, ys)
concatenates vector and how splatting works, it's just a very small step to write vcat(...vector_of_vectors)
.
(or maybe
foldr
?).
It was hard to come up with the case foldr
is useful. The only one I could think of was:
...matrices * vector
(Oh, actually, this is why ...
should be associated with an argument, not an operator.)
Is it the difference between writing the name of the reduction operator multiple times, or is there some deeper reason?
Other than the efficiency reason you just explained, we already have
vcat(f.(as)..., b, cs...)
which works nicely for tuples. So why not have a syntax that works with arbitrary iterators?
So, I think using a.[:, ^:, &] is not possible (at least in 1.x) because it would be breaking (or at least confusing).
It would be confusing, but :
itself is a function, albeit more closely related to ranges and hence indexing. Regardless, it's not a big deal to use &:
and might be clearer anyway.
Isn't it already fused? I mean, it's a nested loop and allocating as-is, but if you specify the destination then it would be equivalent to
y .= f.(view.(a, axes(a, 1), reshape(axes(a, 2), 1, :), Ref(:)))
So "loops over axes 1 and 2" are fused with "copying to y."
The allocation for the destination array y
isn't what I'm thinking about. I'm wondering about how the call to view
can get inserted in there. Not only that, but a very big part of the attraction of #30845 is the elimination of much of the views-vs-copies problem completely. It just doesn't appear to work in this case.
So, apply(::typeof(f), ...) can call whatever f wants. But I think the basic guideline is that it should...
Right, that seems clear. Sorry — I'd glossed over that part of the OP given that we're not going to use actual splatting for reduction. I wonder what a good default is out of reduce
vs foldl
vs foreach
? It seems quite reasonable to have to define a Associative
trait for the small number of associative functions, but choosing a good default between foldl
vs foreach
might be harder.
By the way, I would love to have println(...xs)
or equivalent. My current workaround in the REPL is println.(xs);
which I always feel a little dirty about :-)
:
itself is a function
Good point. I guess it can be special-cased in .[]
context, in principle.
I'm wondering about how the call to
view
can get inserted in there. Not only that, but a very big part of the attraction of #30845 is the elimination of much of the views-vs-copies problem completely. It just doesn't appear to work in this case.
Ah, I was assuming that @view
macro does the magic and I missed that you didn't write @view
. I guess you can also write f.(view.(a, :, ^:, &:))
although the definition of ^
may be unclear here. One alternative is to define it as "add a singleton dimension" https://github.com/JuliaLang/julia/issues/30845#issuecomment-458244811
I wonder what a good default is out of
reduce
vsfoldl
vsforeach
?
I'd say reduce
is out, since associativity is a strong condition. Actually, "foreach
if side-effect" was a bit of simplification. What I really meant was "equivalent to splatting." For example, I think write(io, ...xs)
should be implemented as
acc = 0
for x in xs
acc += write(io, x)
end
acc
to be equivalent with write(io, xs...)
.
I think foldl
is most useful and applicable to many functions out-of-the-box. For example, anything that works with reduce
automatically works with foldl
. I'm actually not sure if there should be a default. Maybe this should be an opt-in feature (e.g., using some kind of traits as you suggested)? But if there were a default I think it should be foldl
.
I would _love_ to have
println(...xs)
or equivalent
OK. You will hate to know what I proposed... I wanted println(xs...)
and println(...xs)
to be equivalent in effect. So it's not actually foreach(println, xs)
but rather foreach(print, xs); println()
... But I agree foreach(println, xs)
would be much more useful. I wonder if there is a principle such that foreach(println, xs)
actually makes sense.
But I agree foreach(println, xs) would be much more useful. I wonder if there is a principle such that foreach(println, xs) actually makes sense.
Yes you're right; println
is in no way the reduction operator, rather it's the mapping operator in a mapreduce
with a kind of "null reduction". Perhaps that's just an unadorned ...println.(xs)
:-)
If we had a function like
printlnio(io, x) = (println(io, x); io)
then printlnio(stdout, ...xs)
works with the foldl
version. In this case makes sense to have a shortcut printlnio(...xs) = printlnio(stdout, ...xs)
.
Alternatively, maybe having a helper function like
ln(x) = string(x, "\n")
for print(...ln.(xs))
maybe makes sense? It's useful for write(filename, ...ln.(join.(vector_of_vectors, ",")))
to do a quick-and-dirty CSV writing.
This comment https://github.com/JuliaLang/julia/issues/16606#issuecomment-222233093 by @StefanKarpinski makes me realized that
there is a lazy object like
struct Axed{T, D <: Tuple{Vararg{Int}}} x::T dims::D end
such that
a[., ., :]
is lowered toAxed(a, (1, 2))
should really be
struct Axed{dims, T}
x::T
end
a[., ., :] === Axed{(1, 2)}(a)
for type stability.
should really be
struct Axed{dims, T} x::T end a[., ., :] === Axed{(1, 2)}(a)
for type stability.
The Slices
type of JuliennedArrays works a bit like this. Edit: In this notation there is no way to create Axed{(2,1)}(a)
, which Slices
does not distinguish. There is also https://github.com/JuliaLang/julia/pull/32310 which defines an EachSlice
type, with some differences, including caring about the order of dims
.
I don't know if this is poor form, but this gist has some further attempts at making such slicing things work, very roughly. It can't of course parse a[.,.,:]
, so allows a[*,*,:]
or a[&,&,:]
. I wonder if the dots are really too small to see easily anyway.
Good point! A single dot .
(or any other syntax) just can be lowered to a singleton object. The dimensions can then be extracted from the positions in the arguments. It's a nicer strategy as it works with function calls like view(x, ., ., 1:2)
etc.
Perhaps one more relevant link is https://github.com/JuliaLang/julia/issues/29146 about mapslices
. Perhaps this should be written
@views b[:, .] .= f.(a[:, .]) # foreach((x,y) -> x .= f(y), eachcol(b), eachcol(a))
c[:, .] := f.(a[:, .]) # c = mapslices(f, a, dims=1)
ideally with the f.()
fused with the c = reduce(hcat(...))
.
(Apologies if I confused things by editing above while you were typing!)
I think mapslices
is orthogonal to the issue we discuss here. I think it would/could be automatically derived from #30845 and #27563. See the discussion about &:
started at https://github.com/JuliaLang/julia/issues/32860#issuecomment-520337649 above.
In this notation there is no way to create
Axed{(2,1)}(a)
Does reduce
actually care about the order of dims
?
julia> reduce(vcat, reshape(1:12, (2, 2, 3)); dims=(1,2), init=[])
1×1×3 Array{Array{Any,1},3}:
[:, :, 1] =
[1, 2, 3, 4]
[:, :, 2] =
[5, 6, 7, 8]
[:, :, 3] =
[9, 10, 11, 12]
julia> reduce(vcat, reshape(1:12, (2, 2, 3)); dims=(2,1), init=[])
1×1×3 Array{Array{Any,1},3}:
[:, :, 1] =
[1, 2, 3, 4]
[:, :, 2] =
[5, 6, 7, 8]
[:, :, 3] =
[9, 10, 11, 12]
@tkf I think I finally understand your analogy between splatting and reduction. Simply:
Many functions which take varargs are reductions in disguise. vcat
and +
being good examples.
The reasons these functions exist in the multi arg form is
Therefore an ideal end game for reduction syntax+lowering is to allow the removal the multi-arg versions of these functions. But without loosing convenient syntax or efficiency.
That is, can we have syntax for vcat(a,b,c)
when only the binary function vcat(a,b)
is defined by the user? It's not just about splatting!
If we can do this, we get benefits exactly analogous to the sudden freedom of broadcast dot notation:
Thanks, that's a good way to motivate what I have in mind.
That is, can we have syntax for
vcat(a,b,c)
when only the binary functionvcat(a,b)
is defined by the user?
I guess an obvious choice would be vcat(a, ..., b, c)
. If ...,
is lowered to ...(),
then the definition I gave in the OP should take care of the rest.
But I think vcat(a, b, c)
is still better. One way to do this automatically is to create a type hierarchy
abstract type AssociativeOperator <: Function end
(op::AssociativeOperator)(a, b, cs::Vararg) = op(op(a, b), reduce(op, cs))
# + the definition of `apply` in the OP
struct Plus <: Associative end
const + = Plus()
+(x, y) = ...
or maybe even
abstract type FoldableFunction <: Function end
abstract type AssociativeOperator <: FoldableFunction end
struct Plus <: Associative end
const + = Plus()
struct Push! <: FoldableFunction end
const push! = Push!()
(It would be nice if we can write function push! isa FoldableFunction end
.)
- The author of a binary function doesn't need to foresee its use in reduction
I half-agree with this. By that, I mean foldl
-based reduction can be broadly applicable to many situations. However, you still need a mechanism to encode that, e.g., +
is associative and has an identity. It also would be better to handle foreach
-like behavior (print
).
- The author of a binary function doesn't need to foresee its use in reduction
I half-agree with this. By that, I mean foldl-based reduction can be broadly applicable to many situations. However, you still need a mechanism to encode that, e.g., + is associative and has an identity. It also would be better to handle foreach-like behavior (print).
I admit my statement was provocative ;-). In fact I do have this counter argument in mind: https://discourse.julialang.org/t/ann-transducers-jl-efficient-and-composable-algorithms-for-map-and-reduce-like-operations/19159/19
What I really mean is that there should be some foldl
based default which can take many binary operators and make them useful without needing the author's consent. Then to add extra performance one should be able to extend some functions from the lowered form as relevant to this particular binary operator. Again, without the author's consent where necessary (though this would be type piracy).
Speaking of transducers, how do they fit into the picture here?
What I really mean is that there should be some
foldl
based default which can take many binary operators and make them useful without needing the author's consent.
Yes, I think it is worth emphasising this aspect. Maybe foldl
-by-default is the way to go.
Speaking of transducers, how do they fit into the picture here?
My "secrete" plan here is to make it just work. It should be easy since Transducers.jl overloads Base.reduce
and Base.foldl
. For example, with the definition in the OP,
+(...eduction(x^2 for x in xs if x > 0))
will be dispatched to reduce(+, Filter(x -> x > 0) |> Map(x -> x^2), xs)
. This will then fuse filtering, mapping, and reduction (and execute them in parallel).
(eduction
is a way to bundle transducers and input collection together. It also has an overload that converts iterator comprehensions to an eduction. See: https://tkf.github.io/Transducers.jl/dev/manual/#Transducers.eduction)
Nice. Even better, perhaps, if x^2 for x in xs if x > 0
would return eduction(Filter(x -> x > 0) |> Map(x -> x^2), xs)
in the future (maybe in 2.0) :smile:
I just watched the transducers talk. The way we currently use Core._apply
to concatenate all arguments into a buffer before applying, eg, vcat
to that buffer is exactly the problem Rich Hickey is talking about when he says that transducers avoid putting data in concrete collections to pass between steps. In this language, our multiple arguments are processed as a Cat
transducer taking a Tuple
of the lowered arguments:
+(...f.(as), b, ...cs)
# eventually ends up in something like ?
reduce(+, eduction(Cat(), (Base.broadcasted(identity, as), (b,), cs)))
By the way, @andyferris had the following to say to me offline:
Ignore
dims
, reduction is about the iteration API not indexing.
I agree — while the question of notation for slice iteration is fascinating in its own right, I do think it's orthogonal. The gist @mcabbott linked is a pretty neat way to do that. Where it gets messy is wanting to fuse all the things...
Another syntax option is to have the ...
on the operator/function, but to unescape arguments which should not be splatted:
vcat...(as, &b, cs)
# vs
vcat(...as, b, ...cs)
The idea is that it allow us to express vcat(a,b,c)
in terms of the binary vcat
even when none of the arguments are splatted. Unfortunately it's not pretty:
vcat...(&a, &b, &c) # ow my eyes.
It's a stronger analogy with broadcasting syntax where the .
is on the operator and one must unescape things which shouldn't participate. At least 1 +... xs ≡ +...(1, xs)
would work without escaping the 1
because numbers are self-iterating. It seems like there could be a close analogy with broadcastable vs iterable and unescaping.
Having said all that, a cleaner syntax for this case is just to use a tuple:
vcat...((a,b,c))
# or
vcat(...(a,b,c))
These are still much uglier than vcat(a,b,c)
though.
if
x^2 for x in xs if x > 0
would returneduction(Filter(x -> x > 0) |> Map(x -> x^2), xs)
in the future (maybe in 2.0)
I don't think there is any fundamental reasons why automatic eduction
cannot happen automatically in Base
's reduce
/foldl
. At the level of API ("expressive power"), iterators and eductions are identical. I talked about it a bit in JuliaCon (see slide 32 and 33). The difference is in how you "generate" the code for reduction. We have plenty of performance specializations in Julia. So, I don't think it's super crazy to convert iterator-transform (Iterators.Filter
) to transducer automatically in reduce
/foldl
.
@andyferris had the following to say to me offline:
Ignore
dims
, reduction is about the iteration API not indexing.I agree
Me too. In fact, I think we already have arrived at the API where +...a[., :]
just works without the "reduction syntax" knowing anything about a[., :]
. It should work as long as reduce
has specialization for the "container" a[., :]
.
(Well, I don't strictly agree with the "reduction is about the iteration API" part because reduction is much more general; e.g., it's parallelizable. But I'm guessing that's probably not what he meant.)
Another syntax option is to have the
...
on the operator/function
One of the disadvantages with this approach is that we loose a chance to assign a meaning to +.(...xs)
which could be useful, e.g., when xs = eachrcol(matrix)
.
IIUC your motivation is to "auto-define" splatting-like behavior? Then why not use abstract types to do it (https://github.com/JuliaLang/julia/issues/32860#issuecomment-521444963)? It would make vararg actually work and no new language constructs have to be introduced. (But types like AssociativeOperator
are highly useful for reduction too.)
Another syntax option is to have the
...
on the operator/functionOne of the disadvantages with this approach is that we loose a chance to assign a meaning to
+.(...xs)
which could be useful, e.g., whenxs = eachrcol(matrix)
.
Actually I'm confused by what what you want this to mean. Is it map(x->reduce(+,x), xs)
?
IIUC your motivation is to "auto-define" splatting-like behavior? Then why not use abstract types to do it (#32860 (comment))? It would make vararg actually work and no new language constructs have to be introduced. (But types like
AssociativeOperator
are highly useful for reduction too.)
My observation is that vcat(a,b,c)
is in fact the operation reduce(vcat, (a,b,c))
, so having a single syntax for reduction which covers this case and which also covers cases like reduce(vcat, xs)
might make sense.
Put another way, if you want vcat(a, b, c, d, ...xs)
to turn into reduction, what if the only "splatted" argument is removed? Suddenly this means something completely different which seems odd.
About the AssociativeOperator
approach: it's very neat that can be done already and it could bring some more useful meaning to the type tree of subtypes of Function
.
But it's a very different way to address the observation that varargs methods of some functions arguably "shouldn't even exist" and instead be replaced by a reduction involving the binary version of that function only (if only we had a sufficiently slick syntax; honestly I'm not sure this is possible).
Another syntax option is to have the
...
on the operator/functionOne of the disadvantages with this approach is that we loose a chance to assign a meaning to
+.(...xs)
which could be useful, e.g., whenxs = eachrcol(matrix)
.Actually I'm confused by what what you want this to mean. Is it
map(x->reduce(+,x), xs)
?
My mental model is "prefix ...
should act like splatting when the argument is a tuple." So, I was thinking reduce((x, y) -> map(+, x, y), xs)
.
varargs methods of some functions arguably "shouldn't even exist"
I see. So remove vararg definitions from various functions like +
, vcat
, push!
, print
etc.? I didn't realize that that's what you've been suggesting. But I agree that it would simplify the Base
/stdlib interface a lot.
But... vcat(a, b, c)
is such an easy syntax that I couldn't resist to suggest keeping it. That's why I initially suggested to overload splatting and keep insisting to use syntax that would _look_ like "just" a function call. This way, vcat(a, b, c)
and vcat(a, b, ...cs)
syntaxes can provide a uniform feel even though they would use very different code paths to reach reduce
. I don't think this is too bad because, with abstract function subtypes, we can automate this plumbing part with a few lines of code.
This way,
vcat(a, b, c)
andvcat(a, b, ...cs)
syntaxes can provide a uniform feel even though they would use very different code paths to reachreduce
.
Yes, it's that differing code path which bothers me, I think it highlights a deeper design flaw in this way of arranging things. Again, compare to broadcast where nobody needs to worry about defining "broadcastable" functions; all functions are naturally broadcastable with a uniform syntax and implementation.
To come at this from another angle: in StaticArrays
we're constantly fighting ambiguities in functions like vcat
where the varargs list contains a mixture of static vs dynamically sized arrays, but nobody has added the necessary trait-based dispatch to the vcat
reduction implementation. Contrast this with the broadcast machinery which gives a uniform way to speak about the result array type and we just don't have this problem.
My mental model is "prefix ... should act like splatting when the argument is a tuple."
The thing I find confusing about this is it supposes that the splatted version exists and has the meaning of a reduction. This is fine for +
, but less obvious for generic binary operators.
An alternative meaning of prefix-...
occurs to me which would make a certain kind of sense:
println(io, ...xs, " ", ...ys)
# foreach((x,y)->println(io, x, " ", y), xs, ys)
Or in general,
foo(a, b, ...cs, d, ...es)
# foreach((c,e)->foo(a, b, c, d, e), cs, es)
I realize this diverges in rather a different direction. Coming back to an earlier point you made about printlnio
; I think it would be a bit odd to have a specially named function just for this. But it's an interesting idea and I do wonder whether it's a good motivation for having println itself return the io
(though I'm not sure how this could be made non-annoying at the repl).
compare to broadcast where nobody needs to worry about defining "broadcastable" functions
On the other hand, the way how _arguments_ are broadcasted is highly customizable; there is an intricate pipeline for processing arguments to change the semantics (e.g., some types are considered scalar by default) and to choose the best broadcasting strategy (style). Also, strictly speaking, you can tweak broadcasted(::typeof(f), ...)
to change the semantics of the function itself. Of course, it would be evil to do such a thing. But my point is that broadcasting works as long as implementers stick with the canonical semantics. It's not "safe-by-design." So, as long as we can assure that the code paths of f(xs...)
and f(...xs)
converges unless f
provides an "evil overload" (which is easy if we can use abstract function types), no users have to worry about it more than they do for broadcasting. We can increase the safety further if the mechanism to "seal" methods is implemented #31222.
Another way to look at it is that in broadcasting the behavior of arguments are highly customizable while in the prefix-...
syntax the behavior of function is highly customizable.
in
StaticArrays
we're constantly fighting ambiguities in functions likevcat
Isn't it an argument in favor of AssociativeOperator
function type? There is no need to fight ambiguity because vararg version would be derived from binary method.
If I were to point out the design flaw, it's the fact that f(...xs)
may choose to do reduce
, foldl
or "foreach
" depending on the type of f
. But it's already the case for splatting so it does not introduce the ambiguity any further. I'd even say abstract types like AssociativeOperator
_decreases_ ambiguity because you can know that splatting call op(xs...)
of an AssociativeOperator
would do reduce
(again, unless op
defines an "evil overload").
My mental model is "prefix ... should act like splatting when the argument is a tuple."
The thing I find confusing about this is it supposes that the splatted version exists and has the meaning of a reduction. This is fine for
+
, but less obvious for generic binary operators.
I was just explaining an easy mnemonic. It perhaps is better to formalize f.
as a lifted function bclift(f)
which is just a curried version of broadcast
(ignoring laziness).
bclift(f) = (args...) -> broadcast(f, args...)
Then
f.(...xs) == bclift(f)(...xs) == reduce(bclift(f), xs)
An alternative meaning of prefix-
...
occurs to me which would make a certain kind of sense:println(io, ...xs, " ", ...ys) # foreach((x,y)->println(io, x, " ", y), xs, ys)
This is broadcasting so this can be written as print(io, ...string.(xs, " ", ys, "\n"))
in my proposal. As we already have dot-call, I think it's better to attach a new semantics to a new syntax.
in
StaticArrays
we're constantly fighting ambiguities in functions likevcat
Isn't it an argument in favor of
AssociativeOperator
function type? There is no need to fight ambiguity because vararg version would be derived from binary method.
It's an argument in favor of having some systematic way to treat reductions. AssociativeOperator
is one way to do that and it should work out fine. If someone didn't see fit to declare their associative function f
as a subtype of AssociativeOperator
we'd always have f(...(a,b,c))
as a reasonable fallback.
f.(...xs) == bclift(f)(...xs) == reduce(bclift(f), xs)
Right this makes sense though I do find it confusing syntax. I think it's because the ...
is inside the brackets while the .
is outside. The proposed lowering flips things around with the reduction being outside and the mapping inside.
It's true that attaching the reduction to arguments does make a difference in what can be easily expressed. Roughly,
f.(...xs) == reduce((x,y)->map(f, x, y), xs)
# vs
(f...).(xs) == map(x->reduce(f, x), xs)
f.(...xs) == reduce((x,y)->map(f, x, y), xs) # vs (f...).(xs) == map(x->reduce(f, x), xs)
I didn't think of (f...).(xs)
but it indeed sounds useful. I guess one way to formalize what you described is that a "modifier" (like .
and ...
) closer to the function lifts it first. For example, if I we had f?
to lift functions to Union{T, Missing}
, it would make sense to write +?(...xs)
to mean
reduce((x, y) -> ismissing(x) || ismissing(y) ? missing : x + y, xs)
But under the "closer modifier lifts function first" rule, it wouldn't make sense to write (f...)?(xs)
(or it won't be super useful). So, I think moving ...
to the argument side makes sense.
I do find it confusing syntax.
I guess this is typical for when you are composing multiple higher order functions (e.g., transducer composition "feels" like happening in the reverse order)? I think that's why a mnemonic based on splatting is important.
Now that I used the word "lift", I think I may understand one of your points. Since the transformation of two-argument function to a reduction is a some kind of "lift", the syntax should act on the function, not the arguments. Is this your point?
But I think it an important observation is that we need "unlifting" operation like &
/Ref
and it is far more common in reduction than in broadcasting. In particular, you always need to unlift the first argument for push!
and write
. It may be possible workaround this for some functions by saying that the first argument is automatically unlifted. However, doing so with "nearly associative" binary functions append!
/merge!
/union!
/intersect!
would loose the ability to use it on single container (of containers) like Vector{<:Vector}
. So, I think expressing lifting with respect to arguments is essential.
I took the liberty of renaming this topic again, as I think it's been a very interesting exploration of reduction syntax even though the option of overriding existing splatting syntax was quickly discarded.
Since the transformation of two-argument function to a reduction is a some kind of "lift", the syntax should act on the function, not the arguments. Is this your point?
Yes, I think that's accurate. I just noticed today that this is the model used by APL where the /
operator lifts an arbitrary function into a reduction (for example in APL +/
is sum
). See https://www.dyalog.com/uploads/documents/MasteringDyalogAPL.pdf for example. It seems that APL doesn't express the "multiple arguments to be reduced" case directly because its functions only take up to two arguments.
I'll have to think about unlifting some more. I'm still a bit confused about how precisely this interacts with the init
argument to foldl
when the unlifted argument can occur in any position.
Thanks, I guess the updated title reflect more what is going on here.
the model used by APL where the
/
operator lifts an arbitrary function into a reduction
FYI, there is a table of foldl/foldr syntaxes of various languages in https://en.wikipedia.org/wiki/Fold_(higher-order_function)#In_various_languages
J and Scala may be interesting in that they support initial value with APL like syntax. I think what I suggest is closest to C++.
Reading the table, I realized that another axis of discussion can be whether or not it makes sense to support reduce/foldl/foldr distinction at syntax level. The languages like J, Scala and C++ seem to support it. I think it makes sense to handle by the binary function implementers (as in the OP) since what should happen is most of the time clear from the function and argument type.
I'm still a bit confused about how precisely this interacts with the
init
argument tofoldl
when the unlifted argument can occur in any position.
Just in case looking at implementation can help, here is what I'd suggest for non-associative binary functions:
const FoldableFunction = Union{
typeof(/),
typeof(-),
typeof(intersect!),
typeof(union!),
typeof(merge!),
typeof(append!),
typeof(push!),
}
struct ReduceMe{T} # need to find a better name :)
value::T
end
function apply(op::FoldableFunction, x, xs...)
init = x isa ReduceMe ? foldl(op, x.value) : x
return foldl(xs; init=init) do acc, input
if input isa ReduceMe
foldl(op, input.value, init=acc)
else
op(acc, input)
end
end
end
using Test
# push!(...[[], 1, 2])
@test apply(push!, ReduceMe([[], 1, 2])) == [1, 2]
# push!([], ...[1, 2])
@test apply(push!, [], ReduceMe([1, 2])) == [1, 2]
# push!([], ...[1, 2], 3, ...[4])
@test apply(push!, [], ReduceMe([1, 2]), 3, ReduceMe([4])) == [1, 2, 3, 4]
I am not super sure if the first argument should be allowed to be a "ReduceMe
" but it is handy for "nearly symmetric" reducing function:
# intersect!(...[[1, 2], [1], [0, 1]])
@test apply(intersect!, ReduceMe([[1, 2], [1], [0, 1]])) == [1]
An interesting data point is something
. It's "almost" an associative binary operator (the "find-first" monoid) but due to its eagerly throwing nature we _don't_ have
something(xs...) === reduce(something, xs)
Instead, more useful definition of something(...xs)
might be
firstsomething(x, y) = x isa Nothing ? y : x
something(reduce(firstsomething, xs; init=nothing))
# something(reduce(right, ReduceIf(!isnothing), xs)) # using Transducers
Here is an example where using foldl
for splatting becomes weird:
julia> xs = (:Base, :CoreLogging, :Info);
julia> foldl(getproperty, xs, init=Main)
Info
This code snippet is highly useful but getproperty(Main, xs...)
is not defined. If we rely on the analogy, then maybe users would find it strange that getproperty(Main, ...xs)
works.
Very interesting example. Again it makes me wonder if we should be looking to delete methods which have varargs versions which are implicitly reductions :-)
I think it's OK to get rid of vararg definitions as long as the "lifting" occurs per-argument basis rather than the whole function (i.e., f(x, ...y)
and not f...(&x, y)
.
I have been thinking a bit about reductions over lazy collections. I've got a different tangent to suggest.
We have these awesome generators which can do map
and filter
, like
julia> (x^2 for x in 1:10 if iseven(x))
Base.Generator{Base.Iterators.Filter{typeof(iseven),UnitRange{Int64}},getfield(Main, Symbol("##41#42"))}(getfield(Main, Symbol("##41#42"))(), Base.Iterators.Filter{typeof(iseven),UnitRange{Int64}}(iseven, 1:10))
julia> collect(x^2 for x in 1:10 if iseven(x))
5-element Array{Int64,1}:
4
16
36
64
100
So we have these beautiful almost-English sentences describing in a rather declaritive way an iterator (I kind of think of it as a "SQL for iterators"). However, one operation I often want to do is reductions. Take this example:
julia> reduce(+, x^2 for x in 1:10 if iseven(x))
220
So what if I could use a "generator-style" syntax for reductions as well? I'm going to suggest the syntax/operator over
for this but there are probably better ideas:
julia> + over x^2 for x in 1:10 if iseven(x)) # Note: this doesn't actually work
220
The thing on the right of over
is just a standard filtered generator. We should be able to + over
anything:
julia> + over [4, 16, 36, 64, 100]
220
Therefore over
becomes a binary operator for a function which basically goes like:
over(f, x) = reduce(f, x)
If you like... we could simply make reduce
a binary operator.
julia> + reduce [4, 16, 36, 64, 100]
220
:)
I'm not sure how much + over xs
is better than over(+, xs)
. As noted by Dijkstra, I think infix operators make sense primary when the operations are associative. (Although I do write x in xs
when I can write in(x, xs)
...)
But connection to the comprehension expressions is interesting. It makes me think that my emphasis on "lifting per argument" is yet another example of non-orthogonal features. The observation is that what I've been proposing with f(a, ...b, ...c, d)
does two things; lifting f
to foldl(f, _)
and flatten the arguments. So, it can be expressed as
foldl(f, (x for y in (&a, b, c, &d) for x in y))
if &x
means Ref(x)
. Can we have a syntax for writing lazy concatenation? Maybe
.[xs; ys; zs]
for Iterators.Flatten((xs, ys, zs))
? Then, f(a, ...b, ...c, d)
would be written as
f...(.[&a; b; c; &d])
or maybe we can have a shorthand for this:
f.[&a; b; c; &d]
It is nice because we have syntax for orthogonal features:
&x
for singleton container.[xs; ys; zs]
for lazy concatenationf...(xs)
for lifting to foldl(f, xs)
etc.Alternatively, f...(xs)
can be written as f.[xs;]
. A bit ugly part is that f.[xs]
would be somewhat ambiguous since [xs]
is not [xs;]
.
Having said that, we can also treat (...xs, ...ys, ...zs)
as lazy concatenation and say that f(...xs, ...ys, ...zs)
is a shorthand for f...((...xs, ...ys, ...zs))
.
I think infix operators make sense primary when the operations are associative
Sure, but the generator/comprehension thing seems well-received (in more languages than Julia), and the for
, in
and if
keywords are not associative operators (and I suspet the over
thing may need to be keyword syntax rather than a simple binary operator anyway, which incidentally would allow it to interact with broadcasting .
s).
I just realized that over
is "implementable" already:
julia> const ^ᵒᵛᵉʳ = reduce;
julia> vcat ^ᵒᵛᵉʳ (1:x for x in 1:3)
6-element Array{Int64,1}:
1
1
2
1
2
3
julia> (+) ^ᵒᵛᵉʳ (x^2 for x in 1:3) # + has to be in ()
14
Yeah, so it makes sense to make it a keyword if we go with this direction, to get rid of the extra parenthesis.
I still think over
has some disadvantages, though. For example, it's hard to cover foldl(f, xs; init)
with over
, as you need to handle three arguments (e.g., foldl(push!, (1, 2, 3); init=[])
).
But one advantage might be that + over xs
is least scary syntax so far? One of my main motivations was to make reduce
less scary and "easy" to use. That's why I've been trying to make an interface that hooks into people's intuition of splatting. But maybe it'd be more powerful to use English-like syntax if that's the primary goal?
Why does over
need to be special syntax?
julia> struct Over{F}
f::F
end
julia> (o::Over)(args...; kwargs...) = reduce(o.f, args...; kwargs...)
julia> Over(push!)((1,2,3); init=[])
3-element Array{Any,1}:
1
2
3
I think that's the same as asking why [f(x) for x in xs]
is more "user friendly" than map(f, xs)
. My impression is that special syntax helps beginners to understand operations like this than using higher-order functions.
I think at this point, julia has so much special syntax that almost any new syntax we add at this point that's not just a slight generalization of existing syntax (e.g. a, b... = (1,2,3)
), would be user unfriendly to add.
There's a huge amount of special forms to learn at this point.
Furthermore, I really don't see how f over x
is any more clear or easy to understand than Over(f)(x)
. This is in stark contrast to [f(x) for x in xs]
which is highly suggestive, unlike map
.
reduce
is the most important building block of data parallel computation. If we can help new users learning reduce
(without them noticing it) by adding a syntax, I think it is worth the complexity.
a slight generalization of existing syntax (e.g.
a, b... = (1,2,3)
)
f(...xs)
is a "generalization" of f(xs...)
.
f(...xs)
is a "generalization" off(xs...)
.
Personally, I like the idea of f(...xs)
, but practically, I can't help but wonder "if we took a survey of 100 random julia users who haven't seen this thread and asked them
what would you expect
f(...xs)
to do?
I think that you'd be lucky if 5 of them guessed correctly. From that pov, adding a syntax like that is just another step along the road to becoming perl.
The most important thing I care at this point in this discussion is to make reduce
/foldl
easy to use, understand, and teach, as much as possible. I'm mostly sticking with f(...xs)
now just because I thought it's the most "intuitive" syntax for me.
For example, Fortress has \big
operator for turning a monoid to reduction. I think it's an interesting approach as you've seen \sum
and \prod
many times. It's probably not a good option for Julia, but it's an example that having "intuitive" syntax makes sense if you are serious about first-class data-parallel computing in your language.
Anyway, it's conceivable that "don't be Perl" is the best option possible at this point. I don't really know. But I don't think we have explored syntax enough and I think it's too early to say no more syntax.
Most helpful comment
I'm pretty strongly against this. Actually, I think the idea is not best described as "overloadable splatting", but as "change
...
to syntax for reduce instead of splatting". And that's not a bad idea --- maybe we should just add some syntax forreduce
.Splatting is supposed to be completely orthogonal to function calling. Calling functions is the core operation in the language, and splatting is just an alternate way to provide arguments. If this proposal were adopted, we would need some other way to call a function on arguments taken from a container. For example, this change makes it impossible to implement utilities like
because there would be no way to generically take some arguments, and then pass all of them on to another function as if it were called directly with them.
So in short I'd favor coming up with some syntax for reduce rather than re-purposing
...
and removing any ability to have a first-class representation of an argument list.