Pandas: BUG: "IndexingError: Too many indexers" when accessing a None value using .loc through a MultiIndex

Created on 22 May 2020  路  7Comments  路  Source: pandas-dev/pandas

Steps to reproduce

print(pd.Series(
    [None],
    pd.MultiIndex.from_arrays([['Level1'], ['Level2']]))
    .loc[('Level1', 'Level2')])

Expected output

None

Actual output

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in __getitem__(self, key)
   1760                 except (KeyError, IndexError, AttributeError):
   1761                     pass
-> 1762             return self._getitem_tuple(key)
   1763         else:
   1764             # we by definition only have the 0th axis

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1275 
   1276         # no multi-index, so validate all of the indexers
-> 1277         self._has_valid_tuple(tup)
   1278 
   1279         # ugly hack for GH #836

/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
    699         for i, k in enumerate(key):
    700             if i >= self.ndim:
--> 701                 raise IndexingError("Too many indexers")
    702             try:
    703                 self._validate_key(k, i)

IndexingError: Too many indexers

Additional information

If any value other than None (even np.nan) is used, the code behaves correctly.

If a single index level is used, the code behaves correctly.

Workaround

Seems to work if .loc(axis=0)[('Level1', 'Level2')] is used instead.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.8.3.candidate.1
python-bits : 64
OS : Linux
OS-release : 5.6.0-1-amd64
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.0.3
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3
Cython : None
pytest : 4.6.9
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.6.9
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

Bug Indexing MultiIndex

All 7 comments

@dechamps thanks for the report!

Using an example that is not of length-1 (that doesn't seem to matter):

In [32]: midx = pd.MultiIndex.from_product([['Level1'], ['Level2_a', 'Level2_b']])

In [33]: s = pd.Series([None]*len(midx), dtype=object, index=midx) 

In [35]: s.loc[('Level1', 'Level2_a')]  
...
IndexingError: Too many indexers

In [36]: s = pd.Series([1]*len(midx), dtype=object, index=midx) 

In [37]: s.loc[('Level1', 'Level2_a')] 
Out[37]: 1

So it is indeed only when the Series value that gets accessed is actually None

The problem lies here:

https://github.com/pandas-dev/pandas/blob/a9ad632159278701a3efd440bf4a30f1ab4d72f9/pandas/core/indexing.py#L819-L821

If the result is None, we continue with other code (that then incorrectly raises), and this _handle_lowerdim_multi_index_axis0 method returns None to indicate an error happened. However, it can also return None when that was the result of indexing (as in the example here). So we will probably need another way to communicate between those methods.

Contributions to fix this are always welcome!

Can i work on this?

@pedrooa Sure, that would be very welcome!

The problem lies here:

https://github.com/pandas-dev/pandas/blob/a9ad632159278701a3efd440bf4a30f1ab4d72f9/pandas/core/indexing.py#L819-L821

If the result is None, we continue with other code (that then incorrectly raises), and this _handle_lowerdim_multi_index_axis0 method returns None to indicate an error happened. However, it can also return None when that was the result of indexing (as in the example here). So we will probably need another way to communicate between those methods.

Contributions to fix this are always welcome!

I've managed to get it to work by changing the deafult return of _handle_lowerdim_multi_index_axis0(tup) to False instead of None, and then i check it as such:

 result = self._handle_lowerdim_multi_index_axis0(tup) 
 if result is not False: 
     return result 

Is this a valid solution?

The problem, I think, is that False in principle can also be a valid result (so similar problem as with None).

So I think we need another way: either by raising an error and catching that in the layer above, or either with a custom object like no_result = object() and then if result != no_result: return result

Ah yes, you are correct.

I think that the cleanest solution would be raising an exception. If it catches something it simply doesn't return anything. As such:

try:
     result = self._handle_lowerdim_multi_index_axis0(tup)
     return result
except Exception:
      pass

It seems to solve this problem.

Was this page helpful?
0 / 5 - 0 ratings