Ref https://groups.google.com/d/msg/julia-users/0im3DGRNbG0/_oH0TZ1TAgAJ
julia> mapslices(x->tuple(copy(x)), [1 2; 3 4], 1)
1×2 Array{Tuple{Array{Int64,1}},2}:
([1,3],) ([2,4],)
julia> mapslices(x->tuple(x), [1 2; 3 4], 1)
1×2 Array{Tuple{Array{Int64,1}},2}:
([2,4],) ([2,4],)
Is there any alternative to going back to allocating a new array for each slice, even when it shouldn't be necessary?
Maybe we should use views instead. The issue with median
along a dimension was a problem with the median
implementation. It's doesn't seem right to alias all the outputs of mapslices
.
The issue with median
was not really a problem with the median
implementation. It was a problem with changing the behavior of mapslices
when existing code expects a specific behavior, which is also the problem here.
The optimal approach depends on what is being done with the slice:
mapslices
needs to pass a new array each time.mapslices
is applied along the first dimension of an Array or the function makes no more than a few traversals through the input, views are optimal. (But for most algorithms that make O(n)
memory accesses, you can probably do better than the best case for mapslices
if you traverse the array in order.)mapslices
is applied along any dimension but the first and the function traverses the input many times, reusing the same array is optimal.I think this is a good argument for finding a way to let the user choose any of the three behaviors. Which should be the default depends on the expected use case.
The default should presumably be the one that is always correct. AFAICT, that means creating a new array for each slice.
That's the correct behavior in the sense that it's the old behavior, so I guess it should be what happens when you call mapslices
, but I can't help but feel like that's giving up a lot of performance for some pretty marginal use cases.
I took a brief look at how mapslices
is actually used in public code. In most cases, there is no reason for people to be using mapslices
at all, because they are calling things like sum
, maximum
, fft
, diff
, var
, or std
that can accept dimension arguments, or reverse
which is called flipdim
when applied to a dimension of an ND array. Beyond that, common cases are indmax
, norm
, quantile
, and sortperm
. I actually didn't see any code in the first 15 pages of results that would behave differently under any of the three approaches above.
One idea is to deprecate mapslices
and have functions slices
, views
, and sliced
that return generators corresponding to the three behaviors above. Then we could replace mapslices(f, X, dims)
with map(f, slices(X, dims))
etc. But maybe this is more functions than we really want.
I rather like slices
, views
, and sliced
with map(f, slices(X, dims))
, as I find it very clear what each function is being applied to.
On second thought, my proposal of composing map
and slice/view iterators will not really work here, since mapslices
also allows functions to return AbstractArrays and will combine them into a new ND array, e.g. mapslices(x->[1, 2], randn(5, 5), 1)
is a 2 x 5 matrix.
Most helpful comment
I rather like
slices
,views
, andsliced
withmap(f, slices(X, dims))
, as I find it very clear what each function is being applied to.