Julia: array reductions (sum, mean, etc.) and dropping dimensions

Created on 27 May 2016  Â·  48Comments  Â·  Source: JuliaLang/julia

Following this discussion on julia-users: https://groups.google.com/forum/#!topic/julia-users/V-meqUZtb7k

I find the fact that sum(A,1) and A[1,:] have different shapes in 0.5 counter intuitive. I am not aware of other languages for which the equivalent expressions have different shapes. It leads to unexpected results when computing e.g. A[1,:] ./ sum(A,1). At least in the case of sum, it seems like sum(A,1) should be equivalent to A[1,:] + A[2,:] + A[3,:] + ...

However, as @timholy pointed out, this behavior can be useful for broadcasting with expressions like A ./ sum(A,1).

For me, having the default slicing and array reduction behavior be consistent seems the natural choice, meaning that sum(A,1) should drop the first dimension. But I understand the attractiveness of not dropping dimensions for broadcasting. What are others' thoughts?

arrays

Most helpful comment

Back when we were brainstorming this, it was before dims was a keyword argument. Now it seems much more obvious to me that a nice solution here could be the use of a separate and parallel squeeze keyword argument. That is:

sum(A, dims=1) -> returns a 1xNxPx… object
sum(A, squeeze=1) -> returns a NxPx… object
sum(A, dims=1, squeeze=2) -> error for simplicity's sake

My solution in #23500 never really sat well with me, so that's why I haven't pushed on it. This is — at least in this brainstorming stage — much more compelling to me.

(Via https://github.com/JuliaLang/julia/pull/27608#issuecomment-398084051, brought up there due to the possibility of a surface syntax that'd ease these woes)

All 48 comments

I agree with you. Perhaps the default behavior could be to drop (for consistency with A[1,:]) and have a keyword argument on reduction functions to keep dims.

Having a keyword argument affect the type (i.e. number of dimensions) of the result would make this function type unstable.

The way to keep the dimensions is to use a range, i.e. A[1:1,:].

I am aware that A[1:1,:] does not drop dimensions, but it seems clear that the "default" behavior now is dropping dimensions APL-style with A[1,:], and, intuitively, it seems that the "default" behavior of sum should be compatible.

My comment was a reply to @BobPortmann's comment, not to the general issue.

I think this was settled in https://github.com/JuliaLang/julia/pull/13612

Edit: "settled" is a bit too strong

I am aware of the way A[1:1,:] vs A[1,:] works and think it is nice. My comment was about the reduction functions, which can be seen as inconsistent to slicing with a scalar. There was much discussion in #3262 but that was before the scalar dimensions dropping occurred. I thought perhaps that sentiment may have changed since then but it appears not.

18271 is the strongest argument I've seen yet in favor of dropping dimensions from reductions. It's a pity we don't have an analog of A[1:1, :] to indicate that a dimension should be kept. What if we introduced NoDrop, and used mean(A, NoDrop(1)) and mean(A, NoDrop(1:2))?

Alternatively we could switch the reduction functions so that they _required_ a boolean tuple for dims, i.e., mean(A, (true,false)) and mean(A, (true,true)) rather than mean(A, 1) and mean(A, 1:2). (true= "do reduce over the dimension in this slot".) Then we could also specify that the dimension be retained with mean(A, (true:true,false)) and mean(A, (true:true, true:true)), a direct analogy to the getindex syntax. I'm a little less excited about this variant, though.

This seems not-fully-decided, so moving to 1.0.

+1 for making reductions and getindex consistent.

FWIW, just got bitten by this: I was expecting something like sum(A.*x',2) to be the same type as A*x, and it wasn't.

Half-baked idea prompted by https://github.com/JuliaLang/julia/pull/22000#issuecomment-304445583: we could pass reduction functions through the indexing syntax in order to specify which dimensions get reduced. E.g., b[sum, :, sum] == sum(b, (1,3)). This would intuitively drop dimensions.

That's an appealing approach, which reminds me of the X_{i+} notation for denoting column sums.

Huh, that's a pretty cool syntax idea.

i guess it's a dumb question, but why don't we differ the behaviour in the same way as we do for array indexing? sum(A, 3) will drop while sum(A, 3:3) won't drop the dimension

Edit: rethinking my idea showed me why: for indexing colon style is used per dimension while for sum the colon style is used across dimensions

Edit2: https://github.com/JuliaLang/julia/issues/16606#issuecomment-304924412 sounds indeed very clever

I have proposed, elsewhere squeeze(f, A, n) = squeeze(f(A, n), n) so that you can write squeeze(sum, A, 2) instead of squeeze(sum(A, 2), 2).

Since we're unlikely to change this in 1.0 and adding squeeze is non-breaking, this is no longer release-blocking.

Closed by #23500.

Oops, I thought that had been merged already.

I frequently find doing

squeeze(mapslices(f, A, dims), dims)

ie with dims repeated, eg for fs which return a scalar. I think is (loosely) related to this issue, so I am commenting here, but I can open a new one if necessary. I have no good suggestion for the syntax.

The suggestion I've made a few times for this is squeeze(f, A, dims); this was implemented by @mbauman in https://github.com/JuliaLang/julia/pull/23500 but then stalled out on some API concerns that I confess I'm not quite groking.

Removing from the milestone since the suggested squeeze methods can be added at any time.

Back when we were brainstorming this, it was before dims was a keyword argument. Now it seems much more obvious to me that a nice solution here could be the use of a separate and parallel squeeze keyword argument. That is:

sum(A, dims=1) -> returns a 1xNxPx… object
sum(A, squeeze=1) -> returns a NxPx… object
sum(A, dims=1, squeeze=2) -> error for simplicity's sake

My solution in #23500 never really sat well with me, so that's why I haven't pushed on it. This is — at least in this brainstorming stage — much more compelling to me.

(Via https://github.com/JuliaLang/julia/pull/27608#issuecomment-398084051, brought up there due to the possibility of a surface syntax that'd ease these woes)

Just to clarify, the squeeze = ... keyword argument would be equivalent to calling squeeze on the result, just with the indices remapped accordingly (depending on dim?)

Or squeeze itself would affect the reduction?

+1 to please provide some keyword argument for sum to not return an n-dimensional array if it isn't wanted so that the ugly syntax sum(..)[:] can be avoided.

@aterenin: I recommend not to use sum(..)[:] which will create a copy but rather use vec(sum(..)). I think this is clean syntax and don't think using a keyword argument makes this cleaner.

Thanks for the tip! That syntax is definitely cleaner by Julia standards, though sum(..) |> vec is perhaps even cleaner because it avoids contributing to parentheses hell. Hopefully better pipes get added in one day as the language develops.

Now we have dims as a fairly ubiquitous keyword, and users often write code with repeated dims (e.g. https://github.com/JuliaLang/julia/issues/16606#issuecomment-348700496), is there any reason not to have e.g. sum(X, dims=i, dropdims=true) as nicer syntax for dropdims(sum(X, dims=i), dims=i)?

I had proposed dropdims(sum, X, dims=i) a while ago, which seems even cleaner to me.

I had proposed dropdims(sum, X, dims=i) a while ago, which seems even cleaner to me.

I don't feel strongly about one syntax over the other :)

Overall I think I slightly prefer sum(X, drop=true, kwargs...) to dropdims(sum, X, kwargs...) for a couple reasons

  • we may want the sum of some function f: sum(f, X, kwargs...) - not sure how that'd work in the dropdims(sum, X, ...) case?
  • to my eye, it best to have "the main thing you want" be the function call, i.e. the sum (mean, std, etc.) not the dropdims e.g. XÌ„ = mean(X, kwargs...) reads nicer _to me_ than XÌ„ = dropdims(mean, X, kwargs...)

on the other hand, it does add yet another keyword, and e.g. mean(f, X) was certainly the right choice (in my opinion) over something like mean(X, transform=f)

There's a big implementation difference:

  • sum(X, dims=i, drop=true) requires adding a drop keyword to every single reducer.
  • dropdims(sum, X, dims=i) is one completely generic method

The implementation of my proposal is this method:

dropdims(f::Callable, args...; dims) = dropdims(f(args..., dims=dims), dims=dims)
julia> A = rand(0:9, 3, 4)
3×4 Array{Int64,2}:
 3  2  6  8
 6  1  4  2
 3  0  7  6

julia> dropdims(sum, A, dims=1)
4-element Array{Int64,1}:
 12
  3
 17
 16

julia> dropdims(maximum, A, dims=2)
3-element Array{Int64,1}:
 8
 6
 7

If you want to support keywords, then you can do this:

dropdims(f::Callable, args...; dims, kwargs...) =
    dropdims(f(args...; kwargs..., dims=dims), dims=dims)

Note that this method will work on all reducers everywhere. If we want all reducers to support a drop keyword, that will require everyone who has ever implemented a reducer to add support for it. It also makes reducers type unstable since the return type depends on a keyword argument.

Nice! And just for completeness (answering my own question above), the dropdims version of e.g. sum(abs2, A, dims=1) would be

julia> dropdims(sum, abs2, A, dims=1)
4-element Array{Int64,1}:
  54
   5
 101
 104

I agree that it looks a bit stranger to write, but I think this approach is the most composable one I've seen for this particular functionality.

Problem with dropdims(sum, abs2, A, dims=1)
is it removes my ability to pass a do-block to the sum

eg I can't write:

dropdims(sum, A, dims=1) do x
    if x>0
        log(x)
    else
        0
    end
end

An option I don't love would be a macro.

@dropdims sum(abs2, A, dims=1) which would become

_dims=1
dropdims(sum(abs2, A, dims=_dims), dims=_dims)
end

The original argument against the kwarg is moot now, since that will constant-fold out now days.

The argument against the keyword form is not that keyword arguments are slow (when that applied, dims also wasn't a keyword, so I don't think that was every the problem), it's that you have to add that keyword to every single reducer everywhere. So we can do this in Base but any custom reducers out there will not have support for the same pattern. It's just weird to have to implement the same completely identical functionality over and over again, even though there's a completely composable solution that requires no code changes at all, just a new function.

Also, no one has addressed the type stability argument: changing return type based on a keyword argument is not generally a good idea.

Yeah, I think the arguments for dropdims(f, args...; dims, kwargs...) are pretty good. I think @oxinabox was just being greedy (which is good!), and pointing out that we can currently use a do-block to sum

sum(A; dims=1) do x
    x > 0 ? log(x) : x
end 

and in some ways in would be nice to be able to write (something like)

@dropdims sum(A; dims=1) do x
    x > 0 ? log(x) : x
end 

rather than e.g.

dropdims(sum, x -> x > 0 ? log(x) : x, A, dims=1)

or

g(x) = x > 0 ? log(x) : x
dropdims(sum, g, A, dims=1)

But all are an approvement over the current

dropdims(
    sum(X; dims=1) do x
        x > 0 ? log(x) : x
    end,
    dims=1
)

In #32860 we are discussing possible syntaxes for reduce/foldl/foreach. One idea for supporting multidimensional reduce is to add a syntax (say)

a[., ., :]

which is lowered to

Axed{(1, 2)}(a)

where

struct Axed{dims, T}
    x::T
end

(probably it should be Axed <: AbstractArray)

This way,

dropdims(sum(a[., ., :]))

can implement a type-stable version of

dropdims(sum(a; dims=(1, 2)); dims=(1, 2))

The argument that adding drop=true to all methods would be a pain also applies to things like sum(A, squeeze=1) from https://github.com/JuliaLang/julia/issues/16606#issuecomment-398086850 , unfortunately.

Here's a quick attempt at a macro which allows sum(A; dims=2) do ... end, to try it out:

using MacroTools

macro dropdims(ex) # doesn't handle reduce(...; dims=..., init=...) etc.
    MacroTools.postwalk(ex) do x
        if @capture(x, red_(args__, dims=d_)) || @capture(x, red_(args__; dims=d_)) 
            :( dropdims($x; dims=$d) )
        elseif @capture(x, dropdims(red_(args__, dims=d1_); dims=d2_) do z_ body_ end) || 
               @capture(x, dropdims(red_(args__; dims=d1_); dims=d2_) do z_ body_ end)
            :( dropdims($red($z -> $body, $(args...); dims=$d1); dims=$d2) )
        else
            x
        end
    end |> esc
end

@dropdims begin
    a = sum(ones(3,7), dims=2)
    b = sum(10 .* randn(2,10); dims=2) do x                      
        trunc(Int, x)
    end
    (a isa Vector, b isa Vector) # (true, true)
end

Thinking about "dims syntax" https://github.com/JuliaLang/julia/issues/16606#issuecomment-520983042 a bit more, sum(a[., ., :]) should probably return a lazy object so that y .= sum(a[., ., :]) does not have to allocate anything. Note that copyto!(::Array{N}, ::Axed{dims, <:Array{M}}) where N === M - length(dims) can automatically drop dimensions.

If this a[., ., :] object is something like what eachslice makes, then I suspect reductions should always drop dimensions, like prod.(eachcol(x)) does now. Although more efficiently, without breaking y .= prod.(eachcol(x)). Notice that prod(eachcol(x)) is an error.

If this a[., ., :] object is something like what eachslice makes

I should have clarified that it is not the case. My model of Axed is just "an array paired with dims" and not an iterator-of-iterators. I'd even define simple forwarding methods like:

ndims(a::Axed) = ndims(a.x)
size(a::Axed) = size(a.x)
getindex(a::Axed, I...) = getindex(a.x, I...)

Axed acts as a type-stable way to mark dimensions that can be dropped when there is no ambiguity (i.e., N === M - length(dims)).

Ah, I didn't realise you were thinking of Axed as behaving just as label, while I was thinking of it being like Slices (views of the array). Both ways allow something like dropdims(sum(A, dims=(2,4)), dims=(2,4)) to be written more visually. Are you picturing also using it for other things, perhaps like this?

sum(p .* log.(p[:,.] ./ q)

Here log needs to act on the scalars of Axed, which under Slices you'd have to write more like sum((p .* log.(p ./ q))[:,.])

In the other thread I suggested that the Slices view might allow all sorts of generalised mapslices! operations to be done neatly:

@views b[:, .] .= f.(a[:, .])  # foreach((x,y) -> x .= f(y), eachcol(b), eachcol(a))
c[:, .] := f.(a[:, .])         # c = mapslices(f, a, dims=1)

I implemented a function ReduceDims.along to demonstrate the idea. It's somewhat inspired by bramtayl's JuliennedArrays.jl and @mcabbott's comments. The syntax is something like

sum(along(x, &, &, :))

which is equivalent to sum(x; dims=(1, 2)). [1] Here is a demo

julia> using ReduceDims

julia> x = ones(2, 3, 4);

julia> y = sum(along(x, &, :, :))  # lazy sum(x; dims=1)
mapreduce(identity, add_sum, along(::Array{Float64,3}, (1,)))

julia> copy(y) isa Array{Float64, 3}
true

julia> dropdims(y) isa Array{Float64, 2}  # type stable
true

julia> zeros(1, 3, 4) .= y;  # non-allocating

julia> zeros(3, 4) .= y;  # automatically drop dims

With #31020, you can also fuse it with broadcasting:

julia> using LazyArrays: @~  # need LazyArrays until #19198 is resolved

julia> y = sum(along(@~(x .+ 1), :, &, :))
mapreduce(identity, add_sum, along(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{3},Tuple{Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64}},typeof(+),Tuple{Array{Float64,3},Int64}}, (2,)))

julia> zeros(1, 3, 4) .= y;

[1] There is also + to indicate "repeat previous marker" (like regexp):

along(x::Array{<:Any, 2}, &, +, :) === along(x, &, :)
along(x::Array{<:Any, 3}, &, +, :) === along(x, &, &, :)
along(x::Array{<:Any, 4}, &, +, :) === along(x, &, &, &, :)
along(x::Array{<:Any, 2}, :, +, &) === along(x, :, &)
along(x::Array{<:Any, 3}, :, +, &) === along(x, :, :, &)
along(x::Array{<:Any, 4}, :, +, &) === along(x, :, :, :, &)

@mcabbott I think something like sum((p .* log.(p ./ q))[:,.]) should work with dims-as-label approach. In fact y .= sum(along(@~(p .* log.(p ./ q)), :, &)) should already work.

The reason behind dims-as-label approach is that this also support non-dropping API as well. IIUC, your nested approach would drop dimensions unless you explicitly mark the non-dropping axes in the destination (as in@views b[:, .] .= f.(a[:, .]))?

But I agree that iterator-of-iterators approach is quite useful especially because it lets you use functions that are not aware of dims. I think it's useful to have both while sharing the "axes specifier" syntax. For example, it could be implemented using syntax like eachslice(x, &, &, :).

Can we do something about this? Would be great if we can replace dropdims(sum(A; dims=1); dims=1) that I have all over my codebase with something less ugly.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

StefanKarpinski picture StefanKarpinski  Â·  3Comments

manor picture manor  Â·  3Comments

dpsanders picture dpsanders  Â·  3Comments

iamed2 picture iamed2  Â·  3Comments

tkoolen picture tkoolen  Â·  3Comments