Julia: Trouble from reusing temporary in mapslices

Created on 15 Sep 2016  ·  6Comments  ·  Source: JuliaLang/julia

Ref https://groups.google.com/d/msg/julia-users/0im3DGRNbG0/_oH0TZ1TAgAJ

julia> mapslices(x->tuple(copy(x)), [1 2; 3 4], 1)
1×2 Array{Tuple{Array{Int64,1}},2}:
 ([1,3],)  ([2,4],)

julia> mapslices(x->tuple(x), [1 2; 3 4], 1)
1×2 Array{Tuple{Array{Int64,1}},2}:
 ([2,4],)  ([2,4],)

Is there any alternative to going back to allocating a new array for each slice, even when it shouldn't be necessary?

design

Most helpful comment

I rather like slices, views, and sliced with map(f, slices(X, dims)), as I find it very clear what each function is being applied to.

All 6 comments

Maybe we should use views instead. The issue with median along a dimension was a problem with the median implementation. It's doesn't seem right to alias all the outputs of mapslices.

The issue with median was not really a problem with the median implementation. It was a problem with changing the behavior of mapslices when existing code expects a specific behavior, which is also the problem here.

The optimal approach depends on what is being done with the slice:

  • For functions that use the whole array in their output, mapslices needs to pass a new array each time.
  • For functions that do not mutate the input, when mapslices is applied along the first dimension of an Array or the function makes no more than a few traversals through the input, views are optimal. (But for most algorithms that make O(n) memory accesses, you can probably do better than the best case for mapslices if you traverse the array in order.)
  • For functions that may mutate the input, or when mapslices is applied along any dimension but the first and the function traverses the input many times, reusing the same array is optimal.

I think this is a good argument for finding a way to let the user choose any of the three behaviors. Which should be the default depends on the expected use case.

The default should presumably be the one that is always correct. AFAICT, that means creating a new array for each slice.

That's the correct behavior in the sense that it's the old behavior, so I guess it should be what happens when you call mapslices, but I can't help but feel like that's giving up a lot of performance for some pretty marginal use cases.

I took a brief look at how mapslices is actually used in public code. In most cases, there is no reason for people to be using mapslices at all, because they are calling things like sum, maximum, fft, diff, var, or std that can accept dimension arguments, or reverse which is called flipdim when applied to a dimension of an ND array. Beyond that, common cases are indmax, norm, quantile, and sortperm. I actually didn't see any code in the first 15 pages of results that would behave differently under any of the three approaches above.

One idea is to deprecate mapslices and have functions slices, views, and sliced that return generators corresponding to the three behaviors above. Then we could replace mapslices(f, X, dims) with map(f, slices(X, dims)) etc. But maybe this is more functions than we really want.

I rather like slices, views, and sliced with map(f, slices(X, dims)), as I find it very clear what each function is being applied to.

On second thought, my proposal of composing map and slice/view iterators will not really work here, since mapslices also allows functions to return AbstractArrays and will combine them into a new ND array, e.g. mapslices(x->[1, 2], randn(5, 5), 1) is a 2 x 5 matrix.

Was this page helpful?
0 / 5 - 0 ratings