Julia: BoundsError when indexing an array with 2d bit array

Created on 28 Aug 2016  Â·  29Comments  Â·  Source: JuliaLang/julia

The code below:

A = rand((5, 5))
b = mean(A, 1) .> 0.5
A[:, b]

produces the following error on julia-0.5.0-rc3:

ERROR: BoundsError: attempt to access 5×5 Array{Float64,2} at index [Colon(),Bool[true true false true false]]
 in throw_boundserror(::Array{Float64,2}, ::Tuple{Colon,BitArray{2}}) at .\abstractarray.jl:355
 in checkbounds at .\abstractarray.jl:284 [inlined]
 in _getindex at .\multidimensional.jl:264 [inlined]
 in getindex(::Array{Float64,2}, ::Colon, ::BitArray{2}) at .\abstractarray.jl:752

This used to work on julia 0.4 - is the change intended?
Doing A[:, vec(b)] works of course.

Most helpful comment

Logical indexing with a non-vector does seem like a fairly strange and marginal case, so allowing/requiring boolean indexing with non-vectors with all but one dimension singular seems fairly sensible.

I don't think it's that uncommon, I've been using it pretty often in numpy and julia. I would be really sad if it's gone forever in Julia. :(

All 29 comments

Yes, this is intentional. With the new APL indexing behaviors, the shapes of the indices are significant, and so the meaning of indexing by a multidimensional logical array is a little muddled (unless it's the only index provided). See the discussion at https://github.com/JuliaLang/julia/pull/15431#issuecomment-215469927 and preceding comments which led to this decision if you're curious.

I think this could be reverted to a deprecation warning to suggest vec as you found, but it seems a little late in the game to do so… and I'm not sure I'll have time to get it in place quickly enough. Thoughts?

In retrospect it would have been nicer to have this go through a deprecation, I'm a little surprised no one said anything earlier. But it is getting very late to put that in, and some of other closely related indexing changes don't really have deprecations. Could just document the behavior change as clearly as we can?

Ok, I looked through the discussion. I'm still a bit sad and surprised that things like A[:, mean(A, 1) .> 0.5] or B[bitrand(3, 4), :] don't work (as they seem very common and convenient) but I will handle it. Thanks for clarifications!

Maybe part of the problem there is that mean(A, 1) should return a vector (#16606).

This is probably the strongest argument in favor of dropping dimensions from reductions. The harm to broadcasting is quite unfortunate, though. I'll post other discussion over at #16606.

What about advertising mean(A, 1:1) in the same way we advertise A[1:1,:] if you want to preserve the dimensions? Never mind. I forgot to read https://github.com/JuliaLang/julia/issues/16606#issuecomment-243139734 first

The restriction here is purely artificial, just aimed at making it easier to catch mistakes and giving us some leeway in deciding what to do in the future. Clearly, this wasn't an mistake.

We could certainly remove this restriction. Or we could loosen it such that it allows arrays with only one non-singleton dimension. I kinda like the latter option.

@andreasnoack, because you can also reduce over multiple dimensions at once, for example mean(A, 1:2) will calculate the mean over dimensions 1 and 2 of a 3d array.

@mbauman, won't that be inconsistent with "the output dimensions are equal to the sum of the dimensions of the indices"?

Yup, but logical indexing is already a fairly strange special case. It's a trade-off. I'm not convinced either way at the moment; just brainstorming possible solutions.

Logical indexing with a non-vector does seem like a fairly strange and marginal case, so allowing/requiring boolean indexing with non-vectors with all but one dimension singular seems fairly sensible.

On the other hand, I feel like we're going to get the "why doesn't produce a vector" question over and over again and the sum(A, 1:1) solution does what one wants. I feel like we all the more need a keep(1) operation that wraps a scalar as a lightweight single-dimensional "vector". There are actually a bunch of unary operators available, but I'm not sure we want to write something like sum(A,^1) and A[^1,:], sum(A,&1) and A[&1,:] (reclaimed), sum(A,\1) and A[\1,:]. If &x is taken to mean Ref(x) and Ref(x) is considered a single-element vector containing x then the &1 syntax could be made to work.

Maybe indexing with a one-element tuple, e.g. (1,)? That solution has been suggested at https://github.com/JuliaLang/julia/issues/18379 to prevent broadcasting.

Logical indexing with a non-vector does seem like a fairly strange and marginal case, so allowing/requiring boolean indexing with non-vectors with all but one dimension singular seems fairly sensible.

I don't think it's that uncommon, I've been using it pretty often in numpy and julia. I would be really sad if it's gone forever in Julia. :(

Stefan meant that it already violates some of our ordinary "rules," and was supporting the fix that's already been merged in #18401. In other words, your example works again on master.

I meant something different too - that I often do things like indexing 3d matrices with 2d BitArray and a normal index for the last dimension (A[A[:, :, 1] .> 0, 1] += 1). And that would not be possible in julia 0.5, right?

(but thanks for making my example work again!)

That doesn't work for 0.4, either. Hmm. It's a completely reasonable operation to want to support.

@timholy Ah, you are right - it doesn't work for 0.4, I got it mixed up with numpy, where it works. Would be really nice if 2d boolean indexing was supported :).

Thanks for reporting it, @mmagnuski! It's not something I intended to break.

Adding support for things like A[A[:, :, 1] .> 0, 1] is definitely something I'd like to see, but unfortunately I think it's at odds with ignoring the singleton dimensions. On the plus side, it's really easy to implement since we now support vectors of CartesianIndexs:

function findcartesian{N}(B::AbstractArray{Bool,N})
    idxs = Vector{CartesianIndex{N}}()
    for i in CartesianRange(indices(B))
        B[i] && push!(idxs, i)
    end
    idxs
end

julia> A = rand(-5:5, 4, 3, 2)
4×3×2 Array{Int64,3}:
[:, :, 1] =
  4   1   0
  0  -2   5
  0  -1   3
 -1  -3  -3

[:, :, 2] =
  5  -2   4
 -3  -4  -4
 -2  -3   5
 -5  -2   0

julia> A[findcartesian(A[:,:,1] .> 0), 1] += 10;
       A
4×3×2 Array{Int64,3}:
[:, :, 1] =
 14  11   0
  0  -2  15
  0  -1  13
 -1  -3  -3

[:, :, 2] =
  5  -2   4
 -3  -4  -4
 -2  -3   5
 -5  -2   0

@mbauman I like the trick with CartesianIndex, I didn't have much time before to play with it, thanks!
However - going back to boolean indexing and ignoring singleton dimensions - wouldn't it be possible (in principle) to check if the number of non-singleton dimensions is correct instead of allowing only boolean indexing with BitArrays that have only one non-singleton dimension?

I'm afraid there are some weird edge cases in combining these two functionalities… particularly when there's no non-singleton dimensions.

Here's a rather contrived example:

julia> A = reshape([-2, 5, -4, -1, -4, 5], 1, 1, 2, 3)
       A[A[:,:,1,2] .< 0, 1, 2] # This works on master! One singleton dimension is dropped, making it A[[true], 1, 2]
1-element Array{Int64,1}:
 5

julia> A[findcartesian(A[:,:,1,2] .< 0), 1, 2] # This actually returns the element below 0 as intended.
1-element Array{Int64,1}:
 -4

I don't think we can tell that case apart from the case where someone used a reduction and they wanted the singleton dimensions dropped. I think there exist less contrived examples, too.

I think we need to make reductions drop dimensions, or we need to go through a deprecation cycle to instruct users to use vec themselves to drop singleton dimensions. And then we can implement the nifty general multidimensional logical indexing.

I guess dropping singleton dimensions everywhere is the most consistent approach. That makes me want a concise syntax for keeping them all the more since now it's not just for slicing.

I should say I like your &i proposal better than anything else, if that is available.

Another option would be a binding intended for postfix multiplication — optimized multiplication with something that's effectively [1] does what we want.

Yes, that's a good idea too. Seems to be a common way to save on parentheses.

This issue followed a bit of a meandering path that's a little hard to follow in retrospect.

For clarity, let's call "ignoring singleton dimensions in logical indexing" the behavior that allows A[:, mean(A, 1) .> 0.5]. This was permitted in 0.4. In 0.5, alongside APL indexing, we restricted logical indices such that they either must be vectors or match the entire size of the array.

In response to this issue, I restored the 0.4 behavior to allow singleton dimensions in logical indices in #18401, but it was merged after 0.5 was branched and didn't make the cut for a backport. And so it was subsequently reverted in #18573, once again disallowing singleton dimensions in logical indexing.

The plus side is that breaking this behavior allows us to move towards a much more general and more consistent interpretation of multidimensional logical indices, such that the additional dimensions in a logical mask actually represent indexing into additional dimensions into the source array. This generalizes the whole-array masking operations A[mask] to multiple- and single-dimensions. In short, in this scheme logical indices are always transformed by findcartesian as I describe above.

Thanks for the update @mbauman!
So would things like A[A[:, :, 1] .> 0, 1] += 1 be allowed in Julia?

Yes, that's my vision here, and I think it's the consensus.

happy to hear that! :+1:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wilburtownsend picture wilburtownsend  Â·  3Comments

tkoolen picture tkoolen  Â·  3Comments

m-j-w picture m-j-w  Â·  3Comments

yurivish picture yurivish  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments