Indexing by end
is lowered here to lastindex(A, 2)
, which isn't correct when the other indices take up more than one dimension (or none):
julia> zeros(2,2,1)[CartesianIndex(2,2), 1]
0.0
julia> zeros(2,2,1)[CartesianIndex(2,2), end]
ERROR: BoundsError: attempt to access 2Γ2Γ1 Array{Float64,3} at index [2, 2, 2]
Stacktrace:
[1] getindex at ./array.jl:788 [inlined]
[2] getindex(::Array{Float64,3}, ::CartesianIndex{2}, ::Int64) at ./multidimensional.jl:552
[3] top-level scope at REPL[19]:1
julia> g(A, i, j) = A[i, end, j];
julia> @code_lowered g(zeros(4,3,2), CartesianIndex(1,1), CartesianIndex())
CodeInfo(
1 β %1 = Base.lastindex(A, 2)
β %2 = Base.getindex(A, i, %1, j)
βββ return %2
)
I assumed this worked, but @c42f actually checked!
I'm not really sure how this can be fixed. cc @mbauman
Another error case, from @simeonschaub:
julia> f(n) = CartesianIndex(1, n);
julia> rand(2,3)[f(end)]
ERROR: BoundsError: attempt to access 2Γ3 Array{Float64,2} at index [1, 6]
One idea is to make all such things into runtime errors. Perhaps by lowering to A[FixDim(1, f(lastindex(A,1)))]
and then checking with something like this:
function to_indices(A, ax, I::Tuple{FixDim, Vararg})
I[1].dim == ndims(A) + 1 - length(ax) || error("can't use end here")
I[1].val isa CartesianIndex && !( I[1].val isa CartesianIndex{1}) && error("can't use end here")
to_indices(A, ax, (I[1].val, tail(I)...))
end
Yeah, I donβt have a good solution here so we explicitly warn about this in the CartesianIndex docs.
Alright, there is indeed a giant yellow warning box about this!
Actually I think there's some hope to fix this, at least for non-pathological uses of end
. We just need to lower end
more conservatively by not assuming that each preceding index in a Expr(:ref)
corresponds to exactly one dimension. Rather, push the computation of the number of dimensions into a function (let's call it count_index_dims
for now):
julia> A = zeros(2,2,1)
2Γ2Γ1 Array{Float64,3}:
[:, :, 1] =
0.0 0.0
0.0 0.0
julia> c = CartesianIndex(2,2)
CartesianIndex(2, 2)
julia> A[c, end]
ERROR: BoundsError: attempt to access 2Γ2Γ1 Array{Float64,3} at index [2, 2, 2]
Stacktrace:
[1] getindex at ./array.jl:788 [inlined]
[2] getindex(::Array{Float64,3}, ::CartesianIndex{2}, ::Int64) at ./multidimensional.jl:543
[3] top-level scope at REPL[21]:1
Sketch of alternative lowering for A[c, end]
:
julia> count_index_dims(a, inds) = length(Base.to_indices(a, axes(a), inds))
count_index_dims (generic function with 1 method)
julia> getindex(A, c, lastindex(A, count_index_dims(A,(c,))+1))
0.0
For your g(A, i, j) = A[i, end, j]
example, it should be lowered to
g(A, i, j) = getindex(A, i, lastindex(A, count_index_dims(A,(i,))), j)
The case from @simeonschaub is truly diabolical, I like it! But edge cases like that shouldn't prevent us from fixing the common situations.
julia> count_index_dims(a, inds) = length(Base.to_indices(a, axes(a), inds)) count_index_dims (generic function with 1 method) julia> getindex(A, c, lastindex(A, count_index_dims(A,(c,))+1)) 0.0
Yes, perhaps I have been letting the lack of the perfect get in the way of the good here; π for re-opening this.
I don't particularly like calling to_indices
multiple times, although I suppose it's not as bad now that we use lazy logical indexing. Another option would be to simply _also_ emit a check to ensure the number of indexed dimensions is what we wanted and error otherwise. That could also be entirely hidden behind a @boundscheck
block.
I don't particularly like calling to_indices multiple times.
Right, that does seem kind of icky. I feel like we should have the end
in A[i, j, end]
become something simpler like
lastindex_afterdims(A, i, j)
Read "lastindex for the dimension which comes after dims i,j
. (Better naming suggestions extremely welcome!)
For splatting, eg, A[i, j, ks..., end]
we lower to:
lastindex_afterdims(A, i, j, ks...)
Current lowering rules just implement the following extremely simple rule:
lastindex_afterdims(A, inds...) = lastindex(length(inds) + 1)
I could have a go at doing this lowering tweak and you can separately work out exactly what the best implementation is for CartesianIndex
, etc?
How about lowering f(begin, end)
to LazyIndex(f)
and handle everything in to_indices
? That is to say, to_indices
can count leading dimensions and use it to materialize a LazyIndex
when it is encountered.
How about lowering
f(begin, end)
toLazyIndex(f)
and handle everything into_indices
That sounds kind of promising but I don't quite understand your notation. How do we handle A[i, min(j,end)]
?
What I meant was something like this:
# A[i, min(j,end)]
getindex(A, i, LazyIndex((_begin, _end) -> min(j, _end)))
# A[i, j, ks..., begin:2:end]
getindex(A, i, j, ks..., LazyIndex((_begin, _end) -> _begin:2:_end))
Hmm.... Actually, it might break some code that does something custom and bypasses to_indices
?
Yes backward compat seems quite hard with that option.
It's a really interesting idea though, I quite like the way that it allows getindex
itself to be aware of and handle the special syntax.
I can't really see how we can have a lazy version and also backward compatibility, unless we lower to some new getindex_
which handles the LazyIndex
wrappers and passes the results to normal getindex
by default. But that seems like an awful lot of machinery to fix an edge case :-/
OK, that's too bad. I thought we'll have faster closures after this as everybody will start nudging Jeff about indexing slowed down.... :trollface:
So I guess we need something like this?
indexdim(x) = indexdim(typeof(x))
indexdim(::Type) = 1 # best guess?
indexdim(::Type{T}) where {T <: CartesianIndex} = length(T)
indexdim(::Type{T}) where {T <: AbstractArray} = indexdim(eltype(T))
indexndims(args...) = sum(indexdim, args)
lastindex_afterdims(A, inds...) = lastindex(A, indexndims(inds...) + 1)
firstindex_afterdims(A, inds...) = firstindex(A, indexndims(inds...) + 1)
This issue makes me wonder if this lowering
julia> Meta.@lower x[i, f(end)...]
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope'
1 β %1 = Base.lastindex(x, 2)
β %2 = f(%1)
β %3 = Core.tuple(x, i)
β %4 = Core._apply_iterate(Base.iterate, Base.getindex, %3, %2)
βββ return %4
))))
can be improved. If x
is an Array{T, 3}
and i
is an Int
, it kind of makes sense to lower f(end)...
to f((lastindex(x, 2), lastindex(x, 3)))...
? Of course, it'd mean x[i, f(end)..., f(end)...]
would be an error at lowering time.
Can we avoid lifting indexdim
into the type domain? Also the Array case is quite troubling. For correctness It would need to be roughly:
indexdim(a::AbstractArray) =
mapreduce(indexdim, (d1,d2)->(d1==d2 ? d1 : error("xxx")), a)
Which catches terrible oddities such as A[[1, CartesianIndex(2,2)], end]
, while allowing A[Any[1,2], end]
to work correctly.
Ah, of course, Any
array is the problem. Though it sounds really hard to solve this without going to LazyIndex
direction. I agree type-level functions are ugly but I still think it's the upper limit (or something very close to it) of what we can do without breaking anything.
Which catches terrible oddities such as
A[[1, CartesianIndex(2,2)], end]
Maybe we can make something like A.[[1, CartesianIndex(2,2)], end, [CartesianIndex(3,3), 4]]
work with https://github.com/JuliaLang/julia/issues/30845. (Though I have no idea when you'd need this.)
Is there anything wrong in principle with the mapreduce
approach to solve the Any
problem? I expect it can be completely optimized away for arrays of concrete type, and indexing with an array of Any
is never going to be fast anyway.
I expect it can be completely optimized away for arrays of concrete type
How about Array{Union{Int,Nothing}}
? I think this is a likely type to get if you use findfirst
or something. Though maybe special casing Union{T,Missing}
and Union{T,Nothing}
is not so hard?
I'm also a bit worried about relying on the optimization in the happy path. I can see that
Base.@pure donothing(_) = nothing
@btime foreach(donothing, $(zeros(1000000)))
is optimized away but it looks like this happens at LLVM level, not at Julia level.
Yeah, this is why I ended up with a warning box π
I still think a reasonable option is to lower to an error function that simply asserts we're not mixing end
s/begin
s and CartesianIndex. Note that we can dispatch on AbstractArray{T} where T >: CartesianIndex
to catch all unions and Any
that can possibly contain CartesianIndex
(with a fallback no-op, of course).
asserts we're not mixing
end
s/begin
s and CartesianIndex
Where do you assert this? My guess is that doing this in to_indices
would have the same compat issue as LazyIndex
and doing this in lowering (= before getindex
) would be equivalent to lastindex_afterdims
in terms of the complexity of the code.
Currently:
julia> Meta.@lower A[end, begin, 1:end]
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope'
1 β %1 = Base.lastindex(A, 1)
β %2 = Base.axes(A, 2)
β %3 = Base.first(%2)
β %4 = Base.lastindex(A, 3)
β %5 = 1:%4
β %6 = Base.getindex(A, %1, %3, %5)
βββ return %6
))))
This would just add a line before the final %6 = getindex
that simply did assert_no_cartesians(%1, %3, %5)
. We'd only need to emit the assert_no_cartesians
if begin
/end
was present. And the entire implementation of assert_no_cartesians
can be inside a @boundscheck
block.
I see, yes, that's much simpler.
We'd only need to emit the
assert_no_cartesians
ifbegin
/end
was present
I like that this is simpler, it's true that introducing a pile of complexity for this relative edge case would be somewhat disappointing.
It still needs to handle icky situations like when [1, CartesianIndex(2,2)]
is passed though, unless it's an approximate mechanism which only produces an error in a subset of the more obvious cases.
While we wait for the scheme-heroics, EndpointRanges.jl is a simple workaround:
julia> a = reshape(1:24, 2, 3, 2, 1, 2)
2Γ3Γ2Γ1Γ2 reshape(::UnitRange{Int64}, 2, 3, 2, 1, 2) with eltype Int64:
[:, :, 1, 1, 1] =
1 3 5
2 4 6
[:, :, 2, 1, 1] =
7 9 11
8 10 12
[:, :, 1, 1, 2] =
13 15 17
14 16 18
[:, :, 2, 1, 2] =
19 21 23
20 22 24
julia> a[CartesianIndex(2, 2), end, CartesianIndex(1, 1)]
ERROR: BoundsError: attempt to access 2Γ3Γ2Γ1Γ2 reshape(::UnitRange{Int64}, 2, 3, 2, 1, 2) with eltype Int64 at index [2, 2, 3, 1, 1]
Stacktrace:
[1] throw_boundserror(A::Base.ReshapedArray{Int64, 5, UnitRange{Int64}, Tuple{}}, I::NTuple{5, Int64})
@ Base ./abstractarray.jl:601
[2] checkbounds
@ ./abstractarray.jl:566 [inlined]
[3] _getindex
@ ./abstractarray.jl:1146 [inlined]
[4] getindex(::Base.ReshapedArray{Int64, 5, UnitRange{Int64}, Tuple{}}, ::CartesianIndex{2}, ::Int64, ::CartesianIndex{2})
@ Base ./abstractarray.jl:1120
[5] top-level scope
@ REPL[31]:1
julia> using EndpointRanges
julia> a[CartesianIndex(2, 2), iend, CartesianIndex(1, 1)]
10
julia> a[2, 2, 2, 1, 1]
10
Most helpful comment
Actually I think there's some hope to fix this, at least for non-pathological uses of
end
. We just need to lowerend
more conservatively by not assuming that each preceding index in aExpr(:ref)
corresponds to exactly one dimension. Rather, push the computation of the number of dimensions into a function (let's call itcount_index_dims
for now):Sketch of alternative lowering for
A[c, end]
:For your
g(A, i, j) = A[i, end, j]
example, it should be lowered toThe case from @simeonschaub is truly diabolical, I like it! But edge cases like that shouldn't prevent us from fixing the common situations.