If possible, I think we should make the region
argument in the dimension reduction methods of any
and all
a keyword argument. map
has an incredibly handy method for multiple iterators, e.g. map(<, a, b)
. Currently any(<, a, b)
will use the method any(f, itr, region)
, but it would be great to be able to reclaim any(<, a, b)
to do what it looks like it should do. That would require making region
a keyword argument in 0.7 with a deprecation, and introducing the new method in 1.0.
The downside to this is that we have to come up with a default value for region
in order to use it as a keyword argument, and when region
isn't specified then we want it to dispatch to the simpler, short-circuiting method not based on mapreducedim
.
A similar argument could be made for other functions with We're doin' 'em all.mapreducedim
-based methods, such as sum(f, A, region)
, but I mostly care about any
/all
for now.
We could use region=nothing
as a default.
I'm fine with doing this, but then let's use keyword arguments for all similar functions (sum
, etc.). Are we also sure region
is the best term? It could also be e.g. dim
/dims
?
Are we also sure region is the best term? It could also be e.g. dim/dims?
:100: "Region" to me means something like A[m:n, m:n]
. dims
would be fine. We should try to standardize on dims
for "which dimensions" and size
or sizes
for "the sizes of dimensions".
Triage says :+1: . Should be applied to all relevant functions though. Issue of what name to use still somewhat up in the air.
The steps are:
any
and all
TensorFlow uses axis
, as another name candidate.
Methods which perform dimension reduction using this kind of API:
mapreducedim(f, op, A, region, v0)
reducedim(op, A, region, v0)
sum(A, dims)
/ sum(f, A, region)
prod(A, dims)
/ prod(f, A, region)
maximum(A, dims)
/ maximum(f, A, region)
minimum(A, dims)
/ minimum(f, A, region)
all(A, dims)
/ all(f, A, region)
any(A, dims)
/ any(f, A, region)
Interestingly, we seem to opt for the "region" terminology only for methods which map prior to reduction, and for those that don't we use "dims." (reducedim
is the exception here; it uses "region" and does not map.)
To me, "dims" seems like the most obvious and immediately descriptive term, and it's consistent both with the names of the functions used underneath, i.e. (map)reducedim
, as well as with our Dims
type alias. With that in mind, here is what I would propose: for each reducing function r
in the above list, deprecate the methods to r(A; dims=Colon())
and r(f, A; dims=Colon())
. Using :
seems like the most natural way to request the current behavior when no dimension is specified, in my opinion.
I'm indifferent as to whether (map)reducedim
should be folded into (map)reduce
with dims
and v0
as keyword arguments, but that would seem reasonable to me.
FWIW, Numpy also uses axis
(see e.g. here). So if we decide to keep the axes
function, we could as well adopt the axis terminology everywhere.
(Though I also like dims
.)
On the one hand, we now use axes
to get a description of the axes/dimensions of an array, which suggests axis
. On the other hand, we still have the following dim
names:
flipdim
ndims
permutedims
permutedims!
reducedim
slicedim
mapreducedim
So we're a bit internally split on this terminology. Are these reduction region arguments more similar to what axes
tells you about or what ndims
, etc. tells you about?
Given what axes
does, for us an "axis" is something like -10:10
, telling you that the index values for a dimension (see what I did there) range from -10 to 10. So I think dims
is a good choice.
i thought i was more or less decided here that we didn't want to support a variable number of iterators to any
, all
, etc. that being said, i'm all in favor of making region
a keyword argument, and making the dim/region/axes nomenclature consistent. cc'ing @mbauman @Keno as they were the main proponents there.
I still think we should use a keyword argument for any
/all
, even if I don't particularly like the multi-arg meaning.
Well folks, we have a problem. I've implemented the change to reducing functions. Using sum
as one example, I have sum(x, dims=d)
set up to call _sum(x, dims)
, which dispatches appropriately for the given dimensions (using ::Colon
for the fallback). The issue now is that the return type is no longer inferred to be concrete when a dimension is specified:
julia> @inferred(sum([1 2; 3 4], dims=1))
ERROR: return type Array{Int64,2} does not match inferred return type Union{Int64, Array{Int64,2}}
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] top-level scope
This may be because when a dimension is specified, it goes through the more general method _sum(::Any, ::Any)
rather than _sum(::Any, ::Colon)
, which in turn goes through a bunch of magic to determine whether the given dimensions are "dimension-like," rather than dispatching to some specific method, say _sum(::Any, ::Vector{Int})
.
Any ideas? Is this still feasible? I can put up a WIP PR with my current implementation, if that would be helpful for diagnostics.
I've still catching up with all the keyword and compiler improvements, but I think you can get around this by immediately re-dispatching to a positional argument:
julia> f(;x=nothing) = g(x)
g(::Nothing) = 1
g(x) = Array{Int}(uninitialized, x)
julia> h() = f(x=1)
h (generic function with 1 method)
julia> @code_warntype h()
…
end::Array{Int64,1}
julia> h(y) = f(x=y)
h (generic function with 2 methods)
julia> @code_warntype h(1)
…
end::Array{Int64,1}
julia> @code_warntype h((1,2))
…
end::Array{Int64,2}
It's just a little deceiving since @code_warntype f(x=1)
shows red flags, but that's just because code_warntype
(I think) emulates calls from a top-level scope. This kind of thing will typically be in a compiled function, at which point inlining and IPO can do their thing.
Right, that is what I'm doing: sum(x, dims=d)
calls the positional method _sum(x, d)
, but I'm still getting
julia> using Test
julia> X = [1 2; 3 4];
julia> @inferred Base._sum(X, 1)
1×2 Array{Int64,2}:
4 6
julia> @inferred Base._sum(X, :)
10
julia> @inferred sum(X) # calls _sum(X, :)
10
julia> @inferred sum(X, dims=1) # calls _sum(X, 1)
ERROR: return type Array{Int64,2} does not match inferred return type Union{Int64, Array{Int64,2}}
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] top-level scope
This'll only work if _sum
inlines into sum(;dims=)
. I think you'll have to have a second intermediary function:
sum(…, ;dims=…) = _sum(…, dims)
@inline _sum(…, dims) = __sum(…, dims)
__sum(…, dims) = # real work
(edit: or just inline your Edit: this is wrong._sum
as it is, but I'm guessing that it has multiple methods)
Does @inferred (x -> sum(x, dims=1))(X)
work?
EDIT: Ah, that's what #25835 is about. But is that the problem here or does that construct still fail?
It does indeed solve the problem.
julia> using Test
julia> X = [1 2; 3 4];
julia> @inferred (x->sum(x, dims=1))(X)
1×2 Array{Int64,2}:
4 6
julia> @inferred sum(X, dims=1)
ERROR: return type Array{Int64,2} does not match inferred return type AbstractArray
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] top-level scope
(Though interestingly, the inferred return type is no longer a Union
but rather an abstract type...)
Having been going through and implementing this, I'm actually not so sure I like it. It could be that I'm just used to writing things like sum(X, 1)
, but that does have a nice symmetry with size(X, 1)
, and I think size(X, dims=1)
to maintain symmetry would be kind of unfortunate.
Really multi-arg any
and all
was the reason I wanted this, and if we aren't going to do that then I'm not so sure I see an overall benefit here. Is anyone a strong supporter of this change?
I think the issue is that in sum(X, 1)
it's not clear what the 1
refers to. sum
or sort
might support other kinds of arguments; for example sort(X, order)
was discussed today.
I agree that it does clarify the intent somewhat, but has there been any confusion about it in the past? Regarding "might support other kinds of arguments," isn't that just taken care of by multiple methods and dispatch?
Not necessarily; it's not ideal to re-use the same argument slot for a totally different meaning.
Losing the ability to concretely infer the return type for these pretty essential functions in global scope is pretty unfortunate. Though perhaps there's a way to get around that that I don't know how to do.
It should be fairly easy to get that to infer; I believe all we need is a constant Bool from haskey
.
EDIT: This already works.
What if instead of deprecating a positional dimension argument to a keyword argument, we go all-in on mapslices
? That is, instead of sum(X, 1)
-> sum(X, dims=1)
, we do mapslices(sum, X, 1)
. Then we have a single convenient function for this kind of reduction that can apply to any function rather than going through and whitelisting all of them. I quite like that consolidation.
While I agree that would be a nice "orthogonalization" of APIs, I think this is far too common an operation on matrices to be made that much more clunky. Array reductions are kind of bread and butter when doing any numerical work.
Agreed --- it would be great for mapslices
to be fast enough to do that, but getting it to that point is probably a significant project, and the resulting API is too verbose to fully replace the old one.
Last I looked the one in JuliennedArrays is about as fast as it'll get (EDIT: short of being able to elide the SubArray
creation or store the wrapper on the stack). But it will still lose compared to sum(A, 2)
simply for reasons of cache-efficiency, unless we dispatch specifically on sum
in mapslices
: sum(A, 2)
can compute its result while traversing A
in cache-order, whereas mapslices
does not.
Yes, I figured if we did go the mapslices
route then we'd specialize on ::typeof(sum)
and others to use the existing implementations under the hood.
Re-marking for triage since I no longer think we should do this after partially implementing it if we aren't going to use multi-arg any
/all
. sum(A, dims=1)
just seems annoying compared to sum(A, 1)
.
Triage still mostly likes the keyword argument.
I had been very hesitant to support this, but I'm now fully on board. Two points crystallized my opinion — documenting them here for posterity:
any(==, a, b)
do? (It errors — you likely wanted any(a .== b)
, but that's very different)max
and maximum
back together in the future with this change.If we are thinking of merging max
and maximum
we may want to do some thinking about it now since I believe we would have to fold max
into maximum
rather than the other way around because of this behavior:
julia> max([1,2,3], [2,1,3])
3-element Array{Int64,1}:
2
1
3
The behavior we would presumably want for a combined max(args..., dims=dims)
function would be semantically to do broadcasting across all args
and reduction of dims
in the resulting broadcast value, although, of course, the implementation should be more efficient than actually constructing the broadcast result. This is compatible with maximum
but not with max
, which is why it requires consideration since max
is the name we probably want to keep.
I agree that having both the names max
and maximum
around is not ideal, since it's not obvious what the difference is. However I feel they should remain separate functions, since they're just not the same operation. It would be nice if we could find a better name for maximum
, but we can't simply combine them any more than we can combine +
and sum
.
Would this apply also to method that shift data along some dimensions, rather than reducing it?
For example, is the plan to have:
circshift(v, 1, dims = 2) == circshift(v, (0, 1)) # if v is a 2-dimensional array
Or the two issues are unrelated and circshift
will stay as is?
I think they're unrelated, since circshift
accepts the amount to shift in each dimension, not a list of which dimensions to operate on.
Most helpful comment
Given what
axes
does, for us an "axis" is something like-10:10
, telling you that the index values for a dimension (see what I did there) range from -10 to 10. So I thinkdims
is a good choice.