I would prefer that range indexing of an Array
(e.g. X[1:10, :]
) created a SubArray
(a view/reference), not a copy. This seems more in the spirit of Julia's pass-by-reference semantics, and would eliminate some of the performance gotchas with range indexing.
It might also make future loop-devectorization optimization easier, because subarray references can be devectorized back into references to the original array without worrying that you will be changing the semantics.
It would reduce Matlab compatibility, but we already do that with pass-by-reference and so I doubt many users will be surprised by having the same reference semantics for range indexing.
This is something that has come up informally several times on the mailing list, but I didn't see any issue for it.
Thanks for making the issue. We should definitely do this.
+1
At the moment, the issue is not that it would reduce Matlab compatibility, but that it would almost certainly reduce performance---for many operations SubArrays have nowhere near the performance of Arrays. #3503 will help when it can be merged (but there are a few "deep" issues that need to be addressed first).
@timholy, there is no technical reason (as far as I can tell) why accessing a SubArray should not generate the same performance (and in fact, almost exactly the same code) as accessing the same data directly in the original Array. If that's not the case now, then this is a performance bug that should be fixed, but I see that as a somewhat orthogonal issue to this one.
(There will be cases where it is advantageous for cache reasons to make a contiguous copy of a subarray that is discontiguous in memory, but of course the user can always call copy
explicitly in that case.)
As @timholy mentioned, we are working on an improved version of subarray -- the current implementation is way too slow.
However, the main obstacle is performance degradation. Even a very simple immutable wrapper over Array
would lead to 3x - 5x slow down --- things often do not get inlined. I believe Jeff has been working on improving the compiler and hopefully such issues would be resolved when it is done.
@stevengj, there are two issues. Dahua mentioned one, inlining the wrapper.
The other principal hurdle is linear indexing, which is heavily used. Suppose A
is a 10x20 Array
, I can access the memory location of each element using a simple for loop. But if A
is a 10x20 SubArray
of a 27x43 Array
, then I need to do something more complicated. These operations have never been as well optimized as they could be.
Heck, even getting good performance from Arrays
for operations like getindex
took some work, e.g., make_arrayind_loop_nest
. I never got around to doing that for SubArrays, although in principle it's a pretty small change. The issue in part is that you might need something like this even for "simple" operations like taking the sum of all elements of the subarray, whereas with arrays you can confine such complexity to a few operations.
@StefanKarpinski has been doing some work on the linear indexing problem, hopefully this will help.
Regarding inlining, that is a compiler issue; there is no technical reason why subarray accesses could not be inlined with zero overhead. It seems like a bad idea to decide fundamental language semantics based on temporary performance issues.
Regarding linear indexing, exactly the same issue applies to looping over a portion of the original Array
directly. As I said, there are cases where one would want to make a contiguous copy of a subarray for performance reasons—the appropriate performance comparison here is between looping over a portion of an Array
versus looping over a SubArray
abstraction of the _same memory locations_.
Regarding inlining, that is a compiler issue; there is no technical reason why subarray accesses could not be inlined with zero overhead. It seems like a bad idea to decide fundamental language semantics based on temporary performance issues.
Absolutely. We're all for range slices producing views. These performance issues are clearly surmountable.
The linear indexing issue with subarrays was actually one reason that I started to work on the broadcast
stuff. I think that with the proper broadcast primitives, we could handle most interesting uses of linear indexing, almost as efficiently and in a more powerful way (though generating a bit more code than the linear indexing approach.) But we're not there yet.
I strongly agree with having a good design and improving the compiler to match it. I know it will be possible to generate much better code for subarrays than we do now.
Is this already implemented in @lindahua's ArrayView's branch? #5556
Since we're in the chaos period, I'd be down to just rebase that branch and merge it as soon as @lindahua get's a chance to do so.
We can add it, but we can't remove SubArrays until the AbstractArray problem is solved. That's basically waiting on stagedfunctions.
To answer @quinnj 's question: No #5556 provides the infrastructure to make this happen and the getindex
methods have to be adapted.
Is this issue the reason why the v0.5 release will be called "Arraymaddon"?
One of several.
@stevengj: Since you initiated this issue it would be very interesting whats your opinion on the recent discussion on the topic in https://github.com/JuliaLang/julia/issues/13157
Current plan is to revisit this after all immutables can be inlined. See #9150
I'm pretty sure this was decided against and the thought now is to keep returning copies but with a more convenient syntax for views (@view
)
@KristofferC, is there a discussion/rationale leading to this decision available somewhere online? From several issues here in this repo I got an impression that people were generally in favor of returning views by default as soon as some general performance issues are fixed. I am really curious as to why the opposite decision was finally made.
Probably some useful discussion in #7941 and #13157. We also have @views
now and dot-fusion so there really isn't much need for it.
@timholy, thanks. I knew about @views
(and dot-fusion seems orthogonal to this), it's just that I come from numpy
, and almost never needed copied (or even mutable) slices in my numeric code, so the decision seemed a bit counter-intuitive. In fact, I only discovered that this is the case recently, when I got some unexpected performance drop from using slices; before that I was under the impression that slicing works just as it does in numpy
.
There are two major issues:
getindex!
function).copy(X[Y])
to see if they want to opt back out to retain their original performance. There'd need to be a really compelling story for it to be worth the churn. I used to think it'd be more compelling.As it stands, every single indexing expression in Julia can be performed as a view. You can opt-in to using views over large sections of code with @views begin … end
. Even entire modules can be written this way:
```
module M @views begin
end end
````
and dot-fusion seems orthogonal to this
One of the best reasons for returning views was to minimize the amount of copying in complex expressions; fusion is really a better solution to the same problem.
One thing that could help there is hammering out a definition for A.[X]
so indexing syntaxes can also participate in fusion.
The options seem to be
A.[X] := getindex.(A, X)
A.[X] := (x->A[x]).(X)
I've found myself wanting both on occasion. IIRC, there's a discussion about this somewhere.
Yes, as have I. I've wanted them both exclusively of each other (within a few lines) when working with nested JSON-like data. #19169.
It tends to only be a performance win when you're slicing inside Matlab/Numpy style vectorized inner loops. In some other cases it can be a huge performance loss.
To clarify my usage pattern, I'm talking about something like this (a stencil-type computation, essentially). Omitting @views
leads to a five times worse performance (which is what I would expect for large arrays, since creating a view object is a constant time operation in the size of the array, as opposed to the linear time copying). Of course, the views in my example contain large contiguous parts, perhaps the situation is different for sparse ones.
One of the best reasons for returning views was to minimize the amount of copying in complex expressions; fusion is really a better solution to the same problem.
I guess you can look at it that way. The problem is that loop fusion and views prevent the creation of temporary arrays resulting from two separate expression types: arithmetic and slicing respectively. If fusion worked for the latter type of expressions too (e.g. as @mbauman suggested), it would indeed be a better solution to this problem.
I find myself wanting to use this pattern all the time: a .= b[:,i]
But it seems that the right way to do this is: a .= view(b,:,i)
, which reads a bit less clearly (besides not being able to use end
).
a = zeros(1000)
b = zeros(1000,10)
function f1(a,b,i)
a[:] = b[:,i]
nothing
end
function f2(a,b,i)
a .= b[:,i]
nothing
end
function f3(a,b,i)
a .= view(b,:,i)
nothing
end
@time f1(a,b,1)
0.000011 seconds (6 allocations: 8.109 KiB)
@time f2(a,b,1)
0.000006 seconds (6 allocations: 8.109 KiB)
@time f3(a,b,1)
0.000002 seconds (4 allocations: 160 bytes)
Preserving the semantics of copying on slicing, could the compiler optimise away the copy in a .= b[:,i]
? That would seem logical for a dot-operation, barring further use of the RHS through the return value.
I find the idea of module-wide @views begin ... end
dangerous as it makes the semantics modal.
It is entirely possible that the compiler could at some point recognize that it doesn't actually need to make a copy in these cases. We've also considered the syntax b@[:,i]
for view slices which would make the awkwardness difference considerably less. Some people do hate that syntax though.
@views function f2(a,b,i)
a .= b[:,1]
nothing
end
isn't so bad and that way you don't have to put it on the entire module. Optimization would, of course, be better.
@views
on a function is more acceptable already, though I agree that optimisation would be better.
b@[:,i]
sounds fine to me for times when the intent is to create a view that will be used as such later in the code, though I don't mind view()
in that case either. But doesn't the dot syntax already imply that one wants to use views?
From the doc:
If the left-hand side is an array-indexing expression, e.g. X[2:end] .= sin.(Y), then it translates to broadcast! on a view, e.g. broadcast!(sin, view(X, 2:endof(X)), Y), so that the left-hand side is updated in-place.
How about a similar rule for the RHS, eg. indexing produces a view whenever the array is an operand of a dot call?
Most helpful comment
I find myself wanting to use this pattern all the time:
a .= b[:,i]
But it seems that the right way to do this is:
a .= view(b,:,i)
, which reads a bit less clearly (besides not being able to useend
).Preserving the semantics of copying on slicing, could the compiler optimise away the copy in
a .= b[:,i]
? That would seem logical for a dot-operation, barring further use of the RHS through the return value.I find the idea of module-wide
@views begin ... end
dangerous as it makes the semantics modal.