Julia: Should we make reshape(::Array, …) return a ReshapedArray?

Created on 20 Oct 2017  Â·  11Comments  Â·  Source: JuliaLang/julia

When we introduced ReshapedArrays, we deferred making the breaking change where reshape(::Array, …) returns a ReshapedArray. I say we change that for 1.0.

There are two reasons to do this:

  • Semantics. The wrapper makes it much more obvious that the data are shared (see, e.g., https://stackoverflow.com/questions/38777469/function-for-reshape-view), and the semantics are much more obviously "just" an indexing transformation into the parent. It seems like folks are getting used to all the array wrappers.
  • Performance. We'll eventually add more optimizations for array wrappers, allowing them all to live on the stack (right?). ReshapedArray would ride along "for free" here, whereas enabling the same optimization for the C array header would be extra work. (I'm not in a position to estimate how much work — all I know is that it's work that few would be able to do.) This would put is in the unfortunate situation where ReshapedArray(::Array, …) would be completely allocation-free whereas reshape(::Array, …) would not be. Either way, cursory benchmarks on 0.6 show the Julian ReshapedArray to already be faster than the C reshape on construction, and the two look to be comparable on access.

While breaking, I think it's only marginally so. Folks who expect f(reshape(A, 2,3,4)) to hit f(A::Array) would have to widen their signatures, and similarly for struct fields. Is there anything else?

Benchmarks are on 0.6 just because it was so much easier to do so with BenchmarkTools right now. We'll definitely need the Nanosoldier back on its feet for this one.

  | | |_| | | | (_| |  |  Version 0.6.0 (2017-06-19 13:05 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-apple-darwin13.4.0

julia> using BenchmarkTools
WARNING: Compat.UTF8String is deprecated, use String instead.
  likely near /Users/mbauman/.julia/v0.6/JLD/src/JLD.jl:971

julia> # Base._reshape, but without the ::Array override
       function reshape2(parent::AbstractArray, dims::Dims)
           n = Base._length(parent)
           prod(dims) == n || throw(DimensionMismatch("parent has $n elements, which is incompatible with size $dims"))
           Base.__reshape((parent, IndexStyle(parent)), dims)
       end
reshape2 (generic function with 1 method)

julia> A = rand(60);

julia> C = @btime reshape($A, (5,4,3));
  48.178 ns (2 allocations: 112 bytes)

julia> J = @btime reshape2($A, (5,4,3));
  19.855 ns (1 allocation: 48 bytes)

julia> @btime sum($C);
  30.455 ns (0 allocations: 0 bytes)

julia> @btime sum($J);
  30.113 ns (0 allocations: 0 bytes)

julia> A = rand(50, 4, 3);

julia> C = @btime reshape($A, (100, 6));
  45.914 ns (2 allocations: 96 bytes)

julia> J = @btime reshape2($A, (100, 6));
  17.994 ns (1 allocation: 32 bytes)

julia> @btime sum($C)
  172.636 ns (0 allocations: 0 bytes)
290.51809277377106

julia> @btime sum($J)
  172.978 ns (0 allocations: 0 bytes)
290.51809277377106
arrays decision

Most helpful comment

The Array type (e.g. if we implemented it in julia, which we hopefully will eventually) contains a buffer and a dims tuple, and already implements a reshape of the linear memory buffer to N-d. One ought to be allowed to construct a new Array with the same buffer and different dimensions. So I think we should keep this as-is.

All 11 comments

I think this is worth considering, but then I also wonder whether this should just be called ReshapedArray(A, dims).

In Images-world I've been using an informal rule: lowercase functions are "functional equivalents" and uppercase means "give me this type." So ColorView(RGB, A) will wrap A in a ColorView (an AbstractArray subtype) no matter what, but colorview(RGB, A) might return an Array (using reinterpret) when possible or a ColorView when necessary. (The choosing is done inferrably, of course.) I sort of think of reshape as being in that same camp. So would we really even keep reshape if we did this PR?

Then it should be ReinterpretedArray as well?

By my logic, yes. (Whether you agree with it is another question.) Simply put: given that we use Int(3.0) and not int(3.0), should we decide that reinterpret(Float32, a) is better spelled as ReinterpretedArray(Float32, a)? I think when the lowercase version does something different there is a clear argument for having both available, but that argument becomes less clear when one is just a synonym for the other.

Again, just throwing this out as food for thought. I don't think there's a clear decision here. The biggest downside is that ReinterpretedArray and ReshapedArray demand a certain amount of knowledge about the internal implementation; reinterpret and reshape seem to offer a conceptual abstraction even if it's not different in actual practice as of Oct 20, 2017.

Should ReshapedArray() also be called for out-of-Base arrays, e.g. ReshapedArray(x::CategoricalArray, dims)?
Right now there's Base.reshape(x::CategoricalArray, dims).

I'm torn regarding the idea of always returning a ReshapedArray.

On the negative side, taking the example of CategoricalArray, calling reshape currently returns a CategoricalArray wrapping the same underlying Array after calling reshape on it. You can call CategoricalArray-specific functions like levels or ordered! on that object. If we were to return a ReshapedArray{CategoricalArray{...}} (because we couldn't get an Array as the result of reshape), things would look significantly more complex (just the type would be hard to read), and it would not be obvious that you can still call levels or ordered! on it (and we'd have to define specific methods).

On the positive side, this is already what happens when calling view and getting a SubArray view of a CategoricalArray. And I see the advantage in terms of systematicity and clarity (you know when you have a view and when you have a copy). So I'm really not sure what's the best decision.

SubArray{CategoricalArray} is not an AbstractCategoricalArray, though. The same for ReshapedArray{CategoricalArray}.
So in some cases these derived arrays could be mistaken for not containing categories.

Great examples. I can definitely see reasons to have the lowercase to allow "permuting the nesting order." An example that keeps cropping up in my own work is mappedarray(f, ::AxisArray), where it seems better to return another AxisArray and put the MappedArray inside of it.

So I drop any reservations about keeping reshape and reinterpret as separate functions.

If ReshapedArray is faster than its C equivalent, does this mean that we just want to have C-vectors and make all Arrays in Julia as ReshapdedArray of a vector ?

It's faster for construction and some access but it's missing the shape optimization and has more indirections both of which has performance impact that's hard to measure in synthetic benchmarks that are not designed to exploid those.

The Array type (e.g. if we implemented it in julia, which we hopefully will eventually) contains a buffer and a dims tuple, and already implements a reshape of the linear memory buffer to N-d. One ought to be allowed to construct a new Array with the same buffer and different dimensions. So I think we should keep this as-is.

Triage finds Jeff's argument compelling.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TotalVerb picture TotalVerb  Â·  3Comments

tkoolen picture tkoolen  Â·  3Comments

musm picture musm  Â·  3Comments

manor picture manor  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments