After some discussion on discourse it was proposed to "feature request" a version of mapslices
in Julia Base that does not allocate slices. Naively I would think the behavior and syntax should be identical to mapslices
except that it iterates views rather than slices of the original array (mapviews
is a tentative name). I think it'd be useful for all cases where one wants to apply mapslices
with a function with no side effects (or that doesn't mutate the original array) and avoid the extra allocations. A typical use case is for example summing with some filtering condition along some dimension:
mapviews(v -> sum(Iterators.filter(!isnan, v)), M, dims=2)
for which I don't think there's an easy efficient solution in Base Julia right now.
It was pointed out that similar ideas have been implemented in the package JuliennedArrays and there was some discussion of moving some of that to Base, see this comment.
I'd still be a major fan of bringing JuliennedArrays into Base. The major blocker for me has been that it requires some oblique syntax to get type stability for julienne(array, (*, :))
. If we could get it type stable with constant propagation — ideally constant propagation through keywords — then I'd say we spell this as map(f, slices(M, dims=2))
. Or maybe eachslice
.
I'm also not entirely satisfied about the JuliennedArrays syntax. I'm not expert on the technical side, but doesn't sum
have the same issue? Meaning that sum(v, dims=d)
output type depends on the keyword argument dims
, so maybe the solution that was found for sum
would also work for JuliennedArrays?
No, sum
doesn't drop dimensions so while the algorithm performs very differently depending on dims
it doesn't actually change the resulting dimensionality.
I see, sum
only changes the dimensionality between sum(v)
(returning a scalar) and sum(v, dims=d)
(returning an AbstractArray
with the dimensionality of v
) so it's a less complex situation.
Do you think this needs to drop dimensions though? I don't have a strong preference either way but mapslices
keeps a dimension of length 1
, so it's a bit strange to have this inconsistency, Maybe there could be a keyword squeeze
to decide what to do with those dimensions (if, of course, the associated technical difficulties don't make this unfeasible)?
The key is that exposing a fast first class slices
(or julienne
) iterator requires type-stable access to the elements (each slice), and those elements will have a different dimensionality (or SubArray
type even if they don't drop dimensions).
I don’t think it would be impossible to get ‘dims = 2’ to constant propagate. But I’ve somewhat given up on constant propagation; getting things into the type domain as soon as possible makes things SO much easier. If someone wants to give it a stab I’d welcome a PR.
Whats the progress for this issue?
I wrote a very basic mutating mapviews!(...)
for myself using views and would be curious how to improve its performance and what you guys think about it. Maybe it helps someone else aswell.
"""
mapviews!(f, from, dest, dims)
Transform the given dimensions of array `from` and store to array `dest` using function `f`. `f` is called on each slice
of `from` resp. `dest` of the form `from[...,:,...,:,...]` resp. `dest[...,:,...,:,...]`. `dims` is an integer vector specifying where the
colons go in these expressions.
For example, if `dims` is `[1,2]`, `from` is 4-dimensional and `dest` is 4-dimensional aswell, `f` is called on `view(from,:,:,i,j)` and `view(dest,:,:,i,j)`
for all `i` and `j`.
"""
function mapviews!(f, from, dest, dims)
allDims = 1:ndims(from)
iterDims = setdiff(allDims, dims)
for x in CartesianIndices(axes(from)[iterDims])
x = [Tuple(x)...]
sliceIdx = Tuple(if n in dims; (:) else popfirst!(x) end for n in allDims)
f(view(from, sliceIdx...), view(dest, sliceIdx...))
end
end
We now have #29749, so this can be written as map(f, eachslice(A; dims=...))
. I think that's sufficient to close this, no?
Nice job: eachslice
is extremely useful!
I think it almost closes this issue in that mapslices
can take multiple dimensions (say mapslices(f, a, dims = (1,2))
) whereas eachslice(a, dims = (1, 2))
errors. Still, feel free to close this and track the multi dimensional implementation separately.
@mbauman does that also work for map!
? Similar to the ability my function had. In order to prevent fragmentation/the need to copy around the arrays. Btw will it also cat the results according to the dimensions? (in case the function returns an array itself)
I mean, it does generate views, so yes, you with a mutating function f!
you can do something like:
foreach(f!, eachslice(dest, dims=d), eachslice(from, dims=d))
You can also use map!
, but that does something different — it stores the output as single elements in an array (and not slices thereof). And map
will also allocate an array to store the results, so that's why I used foreach
instead.
And of course neither map
nor map!
will do the same sort of fancy concatenation that mapslices
does.
Hi, I am not sure if this is the right place to post it, but I noticed an inconsistency in the meaning of the dims
argument in the functions mapslices
and eachslice
.
Let's say I want to sum rows of an array...
julia> mat =[ 1 2;3 4;5 6]
3×2 Array{Int64,2}:
1 2
3 4
5 6
julia> map(sum , eachslice(mat ; dims=1))
3-element Array{Int64,1}:
3
7
11
However
julia> dropdims(mapslices(sum, mat ; dims=1); dims = 1 )
2-element Array{Int64,1}:
9
12
One should use the _other_ dimension for mapslices
julia> dropdims(mapslices(sum, mat ; dims=2); dims = 2 )
3-element Array{Int64,1}:
3
7
11
The documentation explains it: in mapslices
dims refers to the position of the :
in the array indexes, in eachslice
it refers to the position that _does not_ have the :
. So they cannot be simply swapped in the code without accounting for this.
I think @dylanfesta points to a very important difference between eachslice
and mapslices
. The syntax of the two is complementary; eachslice
iterates along the _given_ dimension and constructs views on all elements in the _remaining_ dimensions; mapslices
iterates over the _remaining_ dimensions and has f
act on the array of all elements of the _given_ dimensions.
When handling 2-D arrays, one can easily use eachslices
to produce the result, that @piever initially asked for. For higher-dimensional arrays this is not true.
So I think, the request for a non-allocating version of mapslices is still reasonable... ?
There is a factor of two for JuliennedArrays
and mapslices
:
A = rand(Float32, 1000, 1000, 10)
@btime map(mean, Slices(A, 2)); # 18.793 ms (10019 allocations: 664.92 KiB)
@btime mapslices(mean, A, dims=2); # 37.548 ms (129599 allocations: 3.09 MiB)
see also this discussion.
EDIT:
It might be sufficient to improve mapslices
to act on views?
Most helpful comment
I'd still be a major fan of bringing JuliennedArrays into Base. The major blocker for me has been that it requires some oblique syntax to get type stability for
julienne(array, (*, :))
. If we could get it type stable with constant propagation — ideally constant propagation through keywords — then I'd say we spell this asmap(f, slices(M, dims=2))
. Or maybeeachslice
.