Julia: Make region a keyword argument for dimension reducing methods

Created on 11 Jan 2018 · 35Comments · Source: JuliaLang/julia

If possible, I think we should make the region argument in the dimension reduction methods of any and all a keyword argument. map has an incredibly handy method for multiple iterators, e.g. map(<, a, b). Currently any(<, a, b) will use the method any(f, itr, region), but it would be great to be able to reclaim any(<, a, b) to do what it looks like it should do. That would require making region a keyword argument in 0.7 with a deprecation, and introducing the new method in 1.0.

The downside to this is that we have to come up with a default value for region in order to use it as a keyword argument, and when region isn't specified then we want it to dispatch to the simpler, short-circuiting method not based on mapreducedim.

~~A similar argument could be made for other functions with mapreducedim-based methods, such as sum(f, A, region), but I mostly care about any/all for now.~~ We're doin' 'em all.

collections keyword arguments

Source

ararslan

👍8

Most helpful comment

Given what axes does, for us an "axis" is something like -10:10, telling you that the index values for a dimension (see what I did there) range from -10 to 10. So I think dims is a good choice.

JeffBezanson on 20 Jan 2018

👍4

All 35 comments

We could use region=nothing as a default.

I'm fine with doing this, but then let's use keyword arguments for all similar functions (sum, etc.). Are we also sure region is the best term? It could also be e.g. dim/dims?

nalimilan on 11 Jan 2018

Are we also sure region is the best term? It could also be e.g. dim/dims?

:100: "Region" to me means something like A[m:n, m:n]. dims would be fine. We should try to standardize on dims for "which dimensions" and size or sizes for "the sizes of dimensions".

JeffBezanson on 11 Jan 2018

👍2

Triage says :+1: . Should be applied to all relevant functions though. Issue of what name to use still somewhat up in the air.

JeffBezanson on 11 Jan 2018

The steps are:

Figure out which functions have this kind of API
Pick a name for the keyword argument
Apply the change to all of them, not just any and all

ararslan on 11 Jan 2018

❤4

TensorFlow uses axis, as another name candidate.

malmaud on 14 Jan 2018

Methods which perform dimension reduction using this kind of API:

mapreducedim(f, op, A, region, v0)
reducedim(op, A, region, v0)
sum(A, dims) / sum(f, A, region)
prod(A, dims) / prod(f, A, region)
maximum(A, dims) / maximum(f, A, region)
minimum(A, dims) / minimum(f, A, region)
all(A, dims) / all(f, A, region)
any(A, dims) / any(f, A, region)

Interestingly, we seem to opt for the "region" terminology only for methods which map prior to reduction, and for those that don't we use "dims." (reducedim is the exception here; it uses "region" and does not map.)

To me, "dims" seems like the most obvious and immediately descriptive term, and it's consistent both with the names of the functions used underneath, i.e. (map)reducedim, as well as with our Dims type alias. With that in mind, here is what I would propose: for each reducing function r in the above list, deprecate the methods to r(A; dims=Colon()) and r(f, A; dims=Colon()). Using : seems like the most natural way to request the current behavior when no dimension is specified, in my opinion.

I'm indifferent as to whether (map)reducedim should be folded into (map)reduce with dims and v0 as keyword arguments, but that would seem reasonable to me.

ararslan on 16 Jan 2018

👍3 ❤1

FWIW, Numpy also uses axis (see e.g. here). So if we decide to keep the axes function, we could as well adopt the axis terminology everywhere.

(Though I also like dims.)

nalimilan on 20 Jan 2018

On the one hand, we now use axes to get a description of the axes/dimensions of an array, which suggests axis. On the other hand, we still have the following dim names:

flipdim
ndims
permutedims
permutedims!
reducedim
slicedim
mapreducedim

So we're a bit internally split on this terminology. Are these reduction region arguments more similar to what axes tells you about or what ndims, etc. tells you about?

StefanKarpinski on 20 Jan 2018

Given what axes does, for us an "axis" is something like -10:10, telling you that the index values for a dimension (see what I did there) range from -10 to 10. So I think dims is a good choice.

JeffBezanson on 20 Jan 2018

👍4

i thought i was more or less decided here that we didn't want to support a variable number of iterators to any, all, etc. that being said, i'm all in favor of making region a keyword argument, and making the dim/region/axes nomenclature consistent. cc'ing @mbauman @Keno as they were the main proponents there.

bjarthur on 30 Jan 2018

I still think we should use a keyword argument for any/all, even if I don't particularly like the multi-arg meaning.

Keno on 30 Jan 2018

Well folks, we have a problem. I've implemented the change to reducing functions. Using sum as one example, I have sum(x, dims=d) set up to call _sum(x, dims), which dispatches appropriately for the given dimensions (using ::Colon for the fallback). The issue now is that the return type is no longer inferred to be concrete when a dimension is specified:

julia> @inferred(sum([1 2; 3 4], dims=1))
ERROR: return type Array{Int64,2} does not match inferred return type Union{Int64, Array{Int64,2}}
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] top-level scope

This may be because when a dimension is specified, it goes through the more general method _sum(::Any, ::Any) rather than _sum(::Any, ::Colon), which in turn goes through a bunch of magic to determine whether the given dimensions are "dimension-like," rather than dispatching to some specific method, say _sum(::Any, ::Vector{Int}).

Any ideas? Is this still feasible? I can put up a WIP PR with my current implementation, if that would be helpful for diagnostics.

ararslan on 31 Jan 2018

I've still catching up with all the keyword and compiler improvements, but I think you can get around this by immediately re-dispatching to a positional argument:

julia> f(;x=nothing) = g(x)
       g(::Nothing) = 1
       g(x) = Array{Int}(uninitialized, x)

julia> h() = f(x=1)
h (generic function with 1 method)

julia> @code_warntype h()
…
  end::Array{Int64,1}

julia> h(y) = f(x=y)
h (generic function with 2 methods)

julia> @code_warntype h(1)
…
  end::Array{Int64,1}

julia> @code_warntype h((1,2))
…
  end::Array{Int64,2}

It's just a little deceiving since @code_warntype f(x=1) shows red flags, but that's just because code_warntype (I think) emulates calls from a top-level scope. This kind of thing will typically be in a compiled function, at which point inlining and IPO can do their thing.

mbauman on 31 Jan 2018

Right, that is what I'm doing: sum(x, dims=d) calls the positional method _sum(x, d), but I'm still getting

julia> using Test

julia> X = [1 2; 3 4];

julia> @inferred Base._sum(X, 1)
1×2 Array{Int64,2}:
 4  6

julia> @inferred Base._sum(X, :)
10

julia> @inferred sum(X) # calls _sum(X, :)
10

julia> @inferred sum(X, dims=1) # calls _sum(X, 1)
ERROR: return type Array{Int64,2} does not match inferred return type Union{Int64, Array{Int64,2}}
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] top-level scope

ararslan on 31 Jan 2018

~~This'll only work if _sum inlines into sum(;dims=). I think you'll have to have a second intermediary function:~~

sum(…, ;dims=…) = _sum(…, dims)
@inline _sum(…, dims) = __sum(…, dims)
__sum(…, dims) = # real work

~~(edit: or just inline your _sum as it is, but I'm guessing that it has multiple methods)~~ Edit: this is wrong.

mbauman on 31 Jan 2018

Does @inferred (x -> sum(x, dims=1))(X) work?

EDIT: Ah, that's what #25835 is about. But is that the problem here or does that construct still fail?

martinholters on 1 Feb 2018

It does indeed solve the problem.

julia> using Test

julia> X = [1 2; 3 4];

julia> @inferred (x->sum(x, dims=1))(X)
1×2 Array{Int64,2}:
 4  6

julia> @inferred sum(X, dims=1)
ERROR: return type Array{Int64,2} does not match inferred return type AbstractArray
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] top-level scope

(Though interestingly, the inferred return type is no longer a Union but rather an abstract type...)

ararslan on 1 Feb 2018

Having been going through and implementing this, I'm actually not so sure I like it. It could be that I'm just used to writing things like sum(X, 1), but that does have a nice symmetry with size(X, 1), and I think size(X, dims=1) to maintain symmetry would be kind of unfortunate.

Really multi-arg any and all was the reason I wanted this, and if we aren't going to do that then I'm not so sure I see an overall benefit here. Is anyone a strong supporter of this change?

ararslan on 1 Feb 2018

I think the issue is that in sum(X, 1) it's not clear what the 1 refers to. sum or sort might support other kinds of arguments; for example sort(X, order) was discussed today.

JeffBezanson on 1 Feb 2018

I agree that it does clarify the intent somewhat, but has there been any confusion about it in the past? Regarding "might support other kinds of arguments," isn't that just taken care of by multiple methods and dispatch?

ararslan on 2 Feb 2018

Not necessarily; it's not ideal to re-use the same argument slot for a totally different meaning.

JeffBezanson on 2 Feb 2018

Losing the ability to concretely infer the return type for these pretty essential functions in global scope is pretty unfortunate. Though perhaps there's a way to get around that that I don't know how to do.

ararslan on 2 Feb 2018

It should be fairly easy to get that to infer; I believe all we need is a constant Bool from haskey.

EDIT: This already works.

JeffBezanson on 2 Feb 2018

What if instead of deprecating a positional dimension argument to a keyword argument, we go all-in on mapslices? That is, instead of sum(X, 1) -> sum(X, dims=1), we do mapslices(sum, X, 1). Then we have a single convenient function for this kind of reduction that can apply to any function rather than going through and whitelisting all of them. I quite like that consolidation.

ararslan on 3 Feb 2018

👎2 👍1

While I agree that would be a nice "orthogonalization" of APIs, I think this is far too common an operation on matrices to be made that much more clunky. Array reductions are kind of bread and butter when doing any numerical work.

StefanKarpinski on 3 Feb 2018

👍1

Agreed --- it would be great for mapslices to be fast enough to do that, but getting it to that point is probably a significant project, and the resulting API is too verbose to fully replace the old one.

JeffBezanson on 4 Feb 2018

Last I looked the one in JuliennedArrays is about as fast as it'll get (EDIT: short of being able to elide the SubArray creation or store the wrapper on the stack). But it will still lose compared to sum(A, 2) simply for reasons of cache-efficiency, unless we dispatch specifically on sum in mapslices: sum(A, 2) can compute its result while traversing A in cache-order, whereas mapslices does not.

timholy on 4 Feb 2018

Yes, I figured if we did go the mapslices route then we'd specialize on ::typeof(sum) and others to use the existing implementations under the hood.

ararslan on 4 Feb 2018

Re-marking for triage since I no longer think we should do this after partially implementing it if we aren't going to use multi-arg any/all. sum(A, dims=1) just seems annoying compared to sum(A, 1).

ararslan on 8 Feb 2018

Triage still mostly likes the keyword argument.

JeffBezanson on 8 Feb 2018

👍3 😕1

I had been very hesitant to support this, but I'm now fully on board. Two points crystallized my opinion — documenting them here for posterity:

What does any(==, a, b) do? (It errors — you likely wanted any(a .== b), but that's very different)
We may be able to fold max and maximum back together in the future with this change.

mbauman on 12 Feb 2018

If we are thinking of merging max and maximum we may want to do some thinking about it now since I believe we would have to fold max into maximum rather than the other way around because of this behavior:

julia> max([1,2,3], [2,1,3])
3-element Array{Int64,1}:
 2
 1
 3

The behavior we would presumably want for a combined max(args..., dims=dims) function would be semantically to do broadcasting across all args and reduction of dims in the resulting broadcast value, although, of course, the implementation should be more efficient than actually constructing the broadcast result. This is compatible with maximum but not with max, which is why it requires consideration since max is the name we probably want to keep.

StefanKarpinski on 12 Feb 2018

❤2

I agree that having both the names max and maximum around is not ideal, since it's not obvious what the difference is. However I feel they should remain separate functions, since they're just not the same operation. It would be nice if we could find a better name for maximum, but we can't simply combine them any more than we can combine + and sum.

JeffBezanson on 13 Feb 2018

👍2

Would this apply also to method that shift data along some dimensions, rather than reducing it?

For example, is the plan to have:

circshift(v, 1, dims = 2) == circshift(v, (0, 1)) # if v is a 2-dimensional array

Or the two issues are unrelated and circshift will stay as is?

piever on 13 Feb 2018

I think they're unrelated, since circshift accepts the amount to shift in each dimension, not a list of which dimensions to operate on.

JeffBezanson on 15 Feb 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

The devdocs page for functions is outdated

ararslan · 3Comments

split() enhancement, keep_splitter

i-apellaniz · 3Comments

isposdef() is incorrect

wilburtownsend · 3Comments

Literal NaNs in ASTs

Keno · 3Comments

Getting Openssl error while building from source on ubuntu 14.04.3 cmake version 2.8.12.2

arshpreetsingh · 3Comments