Attempting to index a 2D array of shape [N, M] with two 1D boolean masks, shapes N and M, certain combinations of True and False lead to a broadcasting error (particularly when one is all false). I'm not sure if this behaviour is expected but it seems highly surprising and undesirable.
In the example below, x[[False, True, True], [True, True, True]]
errors, while x[[False, True, True], True]
and x[[False, True, True]]
have the expected behavior.
import numpy as np
from itertools import product
x = np.zeros((3,3))
mask_1d = [*product([True, False], repeat=3)]
for row_mask, col_mask in product(mask_1d, mask_1d):
try:
x[row_mask, col_mask]
except IndexError as e:
print(row_mask, col_mask)
print(e)
(True, True, True) (True, True, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,)
(True, True, True) (True, False, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,)
(True, True, True) (False, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,)
(True, True, True) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (0,)
(True, True, False) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)
(True, True, False) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,)
(True, False, True) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)
(True, False, True) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,)
(False, True, True) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)
(False, True, True) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,)
(False, False, False) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (3,)
(False, False, False) (True, True, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (2,)
(False, False, False) (True, False, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (2,)
(False, False, False) (False, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (2,)
1.16.2 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0]
With boolean arrays, the code assumes you are trying to index either a single dimension or all elements at the same time - with the choice somewhat unfortunately guessed in a way that allows a single True
to be removed. I.e., it turns your row_mask, col_mask
into a (2,3) boolean array and then finds that it cannot index the (3,3) array.
Part of the problem is that tuples and lists are treated as equivalent, something we're trying to move away from. Eventually, you'd handle the boolean array index by ensuring the mask was a double list.
For now, though, I fear the only solution is to do x[row_mask][:, col_mask]
.
cc @eric-wieser, who has been working to deprecate the "treat tuple as list" for indexing operations.
p.s. Most annoying I find this difference:
x = np.arange(9).reshape(3, 3)
# x[[False, True, True], True]
# array([[3, 4, 5],
# [6, 7, 8]])
x[[False, True, True], False]
# IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,)
Yes, x[row_mask][:, col_mask]
is what I ended up doing. Thanks for the explanation, I'm glad it's something that's being looked into.
I think arr[np.ix_(index)]
is what you want/are expecting here, or in outher words on outer indexing logic as is in NEP 21: https://github.com/numpy/numpy/blob/master/doc/neps/nep-0021-advanced-indexing.rst
Maybe that will be picked up some time. The NEP also says that at least for the current indexing multiple boolean indices should just be deprecated (I think whether to allow this specific use case may still have been contested – it is consistent, but may not have much use case and be pretty confusing in any case).
Most helpful comment
With boolean arrays, the code assumes you are trying to index either a single dimension or all elements at the same time - with the choice somewhat unfortunately guessed in a way that allows a single
True
to be removed. I.e., it turns yourrow_mask, col_mask
into a (2,3) boolean array and then finds that it cannot index the (3,3) array.Part of the problem is that tuples and lists are treated as equivalent, something we're trying to move away from. Eventually, you'd handle the boolean array index by ensuring the mask was a double list.
For now, though, I fear the only solution is to do
x[row_mask][:, col_mask]
.cc @eric-wieser, who has been working to deprecate the "treat tuple as list" for indexing operations.
p.s. Most annoying I find this difference: