Julia: Equivalent of mapslices with views?

Created on 12 Sep 2018  Â·  13Comments  Â·  Source: JuliaLang/julia

After some discussion on discourse it was proposed to "feature request" a version of mapslices in Julia Base that does not allocate slices. Naively I would think the behavior and syntax should be identical to mapslices except that it iterates views rather than slices of the original array (mapviews is a tentative name). I think it'd be useful for all cases where one wants to apply mapslices with a function with no side effects (or that doesn't mutate the original array) and avoid the extra allocations. A typical use case is for example summing with some filtering condition along some dimension:

mapviews(v -> sum(Iterators.filter(!isnan, v)), M, dims=2)

for which I don't think there's an easy efficient solution in Base Julia right now.

It was pointed out that similar ideas have been implemented in the package JuliennedArrays and there was some discussion of moving some of that to Base, see this comment.

arrays

Most helpful comment

I'd still be a major fan of bringing JuliennedArrays into Base. The major blocker for me has been that it requires some oblique syntax to get type stability for julienne(array, (*, :)). If we could get it type stable with constant propagation — ideally constant propagation through keywords — then I'd say we spell this as map(f, slices(M, dims=2)). Or maybe eachslice.

All 13 comments

I'd still be a major fan of bringing JuliennedArrays into Base. The major blocker for me has been that it requires some oblique syntax to get type stability for julienne(array, (*, :)). If we could get it type stable with constant propagation — ideally constant propagation through keywords — then I'd say we spell this as map(f, slices(M, dims=2)). Or maybe eachslice.

I'm also not entirely satisfied about the JuliennedArrays syntax. I'm not expert on the technical side, but doesn't sum have the same issue? Meaning that sum(v, dims=d) output type depends on the keyword argument dims, so maybe the solution that was found for sum would also work for JuliennedArrays?

No, sum doesn't drop dimensions so while the algorithm performs very differently depending on dims it doesn't actually change the resulting dimensionality.

I see, sum only changes the dimensionality between sum(v) (returning a scalar) and sum(v, dims=d) (returning an AbstractArray with the dimensionality of v) so it's a less complex situation.

Do you think this needs to drop dimensions though? I don't have a strong preference either way but mapslices keeps a dimension of length 1, so it's a bit strange to have this inconsistency, Maybe there could be a keyword squeeze to decide what to do with those dimensions (if, of course, the associated technical difficulties don't make this unfeasible)?

The key is that exposing a fast first class slices (or julienne) iterator requires type-stable access to the elements (each slice), and those elements will have a different dimensionality (or SubArray type even if they don't drop dimensions).

I don’t think it would be impossible to get ‘dims = 2’ to constant propagate. But I’ve somewhat given up on constant propagation; getting things into the type domain as soon as possible makes things SO much easier. If someone wants to give it a stab I’d welcome a PR.

Whats the progress for this issue?

I wrote a very basic mutating mapviews!(...) for myself using views and would be curious how to improve its performance and what you guys think about it. Maybe it helps someone else aswell.

"""
    mapviews!(f, from, dest, dims)

Transform the given dimensions of array `from` and store to array `dest` using function `f`. `f` is called on each slice
of `from` resp. `dest` of the form `from[...,:,...,:,...]` resp. `dest[...,:,...,:,...]`. `dims` is an integer vector specifying where the
colons go in these expressions.
For example, if `dims` is `[1,2]`, `from` is 4-dimensional and `dest` is 4-dimensional aswell, `f` is called on `view(from,:,:,i,j)` and `view(dest,:,:,i,j)`
for all `i` and `j`.
"""
function mapviews!(f, from, dest, dims)
    allDims = 1:ndims(from)
    iterDims = setdiff(allDims, dims)
    for x in CartesianIndices(axes(from)[iterDims])
        x = [Tuple(x)...]
        sliceIdx = Tuple(if n in dims; (:) else popfirst!(x) end for n in allDims)
        f(view(from, sliceIdx...), view(dest, sliceIdx...))
    end
end

We now have #29749, so this can be written as map(f, eachslice(A; dims=...)). I think that's sufficient to close this, no?

Nice job: eachslice is extremely useful!

I think it almost closes this issue in that mapslices can take multiple dimensions (say mapslices(f, a, dims = (1,2))) whereas eachslice(a, dims = (1, 2)) errors. Still, feel free to close this and track the multi dimensional implementation separately.

@mbauman does that also work for map!? Similar to the ability my function had. In order to prevent fragmentation/the need to copy around the arrays. Btw will it also cat the results according to the dimensions? (in case the function returns an array itself)

I mean, it does generate views, so yes, you with a mutating function f! you can do something like:

foreach(f!, eachslice(dest, dims=d), eachslice(from, dims=d))

You can also use map!, but that does something different — it stores the output as single elements in an array (and not slices thereof). And map will also allocate an array to store the results, so that's why I used foreach instead.

And of course neither map nor map! will do the same sort of fancy concatenation that mapslices does.

Hi, I am not sure if this is the right place to post it, but I noticed an inconsistency in the meaning of the dims argument in the functions mapslices and eachslice.

Let's say I want to sum rows of an array...

julia> mat =[ 1 2;3 4;5 6]
3×2 Array{Int64,2}:
 1  2
 3  4
 5  6

julia> map(sum , eachslice(mat ; dims=1))
3-element Array{Int64,1}:
  3
  7
 11

However

julia> dropdims(mapslices(sum, mat ; dims=1); dims = 1 )
2-element Array{Int64,1}:
  9
 12

One should use the _other_ dimension for mapslices

julia> dropdims(mapslices(sum, mat ; dims=2); dims = 2 )
3-element Array{Int64,1}:
  3
  7
 11

The documentation explains it: in mapslices dims refers to the position of the : in the array indexes, in eachslice it refers to the position that _does not_ have the :. So they cannot be simply swapped in the code without accounting for this.

I think @dylanfesta points to a very important difference between eachslice and mapslices. The syntax of the two is complementary; eachslice iterates along the _given_ dimension and constructs views on all elements in the _remaining_ dimensions; mapslices iterates over the _remaining_ dimensions and has f act on the array of all elements of the _given_ dimensions.
When handling 2-D arrays, one can easily use eachslices to produce the result, that @piever initially asked for. For higher-dimensional arrays this is not true.
So I think, the request for a non-allocating version of mapslices is still reasonable... ?
There is a factor of two for JuliennedArrays and mapslices:

A = rand(Float32, 1000, 1000, 10)
@btime map(mean, Slices(A, 2));     #  18.793 ms (10019 allocations: 664.92 KiB)
@btime mapslices(mean, A, dims=2);  #  37.548 ms (129599 allocations: 3.09 MiB)

see also this discussion.

EDIT:
It might be sufficient to improve mapslices to act on views?

Was this page helpful?
0 / 5 - 0 ratings