Pandas: IndexSlice cannot infer missing values when an Index object is passed in

Created on 16 Jun 2017  路  4Comments  路  Source: pandas-dev/pandas

Here is the code snippet:

import pandas as pd

df = pd.DataFrame({'id': [1,2,3,4,5], 
                   'age':[22,21,23,24,25], 
                   'sex':['M','F','M','F','M'],
                   'height':[165,152,166,154,176]
                  })

df = df.set_index(['id','age','sex'])
i = pd.Index([1,2,3])

df.loc[pd.IndexSlice[i, 21,], ['height']] # TypeError: unhashable type: 'Int64Index'
df.loc[pd.IndexSlice[i, 21,:], ['height']] # works
df.loc[pd.IndexSlice[[1,2,3], 21,], ['height']] # works

If an Index object is passed into the IndexSlice object, it cannot infer missing columns pd.IndexSlice[i, 21,]. A colon has to be used pd.IndexSlice[i, 21,:] or pass an array instead of an Index object [1,2,3]

I don't know if this is intended.

Bug Indexing MultiIndex Needs Tests

Most helpful comment

I think this should work, in general we seem to unpack anything list or array-like, but isn't happening on this path. PR welcome!

A couple more questionable cases:


In [38]: df.loc[i, :]
TypeError: unhashable type: 'Int64Index'

In [40]: df.loc[pd.IndexSlice[i.values, 21,], ['height']]
TypeError: unhashable type: 'numpy.ndarray'

# strangely, this works
In [39]: df.loc[i.values, :]
Out[39]: 
            height
id age sex        
1  22  M       165
2  21  F       152
3  23  M       166

All 4 comments

I think this should work, in general we seem to unpack anything list or array-like, but isn't happening on this path. PR welcome!

A couple more questionable cases:


In [38]: df.loc[i, :]
TypeError: unhashable type: 'Int64Index'

In [40]: df.loc[pd.IndexSlice[i.values, 21,], ['height']]
TypeError: unhashable type: 'numpy.ndarray'

# strangely, this works
In [39]: df.loc[i.values, :]
Out[39]: 
            height
id age sex        
1  22  M       165
2  21  F       152
3  23  M       166

Agree this should work (to regard an Index as list-like, so do the same as if you would replace i with i.tolist(): df.loc[pd.IndexSlice[i.tolist(), 21,], ['height']]).

Even df.loc[i] will fail, it's not necessary to pass any kind of slice. Instead df.loc[[1,2,3]] or df.loc[pd.Series([1,2,3])] work fine.

The bug is fixed, adding test

Was this page helpful?
0 / 5 - 0 ratings