Pandas: Slicing Datetime MultiIndex with string or datetime.date slices

Created on 5 Apr 2017 · 6Comments · Source: pandas-dev/pandas

Basically Indexing a MultiIndex with a DatetimeIndex seems only to be working if you use slices with datetime.datetime or pandas.Timestamp. One would expect it to work also with strings as well as with 'datetime.date' slices as it does for usual indizes.

This seems to be related to #3843.

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
import time
import datetime as dt

min_ts = time.mktime(dt.date(2016,10,1).timetuple())
timestamps = [min_ts + offset*86400 + noise*3600 
                  for offset, noise in enumerate(np.random.rand(100))]
time_idx = pd.to_datetime(sorted(timestamps), unit='s')
id_idx = np.random.choice(np.arange(10), 100)
df = pd.DataFrame(np.identity(100),
                 index=pd.MultiIndex.from_arrays(
                     [time_idx, id_idx]
                 ))

df.loc[dt.datetime(2016, 10, 1):]  # works
df.loc[dt.date(2016, 10, 1):]  # fails :/

Problem description

The above code raises the following exception on the last line, which is quite unexpected when one is used to indexing on single index dataframes.

Traceback (most recent call last):
  File "<input>", line 11, in <module>
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/core/indexing.py", line 1312, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/core/indexing.py", line 1453, in _getitem_axis
    return self._get_slice_axis(key, axis=axis)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/core/indexing.py", line 1334, in _get_slice_axis
    slice_obj.step, kind=self.name)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/base.py", line 2997, in slice_indexer
    kind=kind)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/multi.py", line 1578, in slice_locs
    return super(MultiIndex, self).slice_locs(start, end, step, kind=kind)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/base.py", line 3176, in slice_locs
    start_slice = self.get_slice_bound(start, 'left', kind)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/multi.py", line 1549, in get_slice_bound
    return self._partial_tup_index(label, side=side)
  File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/multi.py", line 1594, in _partial_tup_index
    raise TypeError('Level type mismatch: %s' % lab)
TypeError: Level type mismatch: 2016-10-01

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 34.2.0
Cython: 0.25.2
numpy: 1.12.0
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

Bug Indexing MultiIndex Timeseries

Source

kayibal

👍2

Most helpful comment

reopening as doesn't work on MI as indicated above.

jreback on 28 Apr 2017

👍2

All 6 comments

datetime.date is not a first class type, so we barely support it, almost always it is coerced (and we certainly don't support indexing via it).

and strings work fine.

In [10]: df.loc['2016-10-01']
Out[10]: 
                               0    1    2    3    4    5    6    7    8    9    10   11   12   13   14   15   16   17   18   19   20   21   22   23   24 ...    75   76   77   78   79   80   81  \
2016-10-01 04:18:59.654717 4  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0 ...   0.0  0.0  0.0  0.0  0.0  0.0  0.0   

                               82   83   84   85   86   87   88   89   90   91   92   93   94   95   96   97   98   99  
2016-10-01 04:18:59.654717 4  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  

[1 rows x 100 columns]

jreback on 5 Apr 2017

all that said, if you want to do a PR to support indexing via datetime.date prob would take it.

jreback on 5 Apr 2017

we certainly don't support indexing via it

We actually do support it:

In [50]: df = pd.DataFrame({'A': np.random.randn(10)}, index=pd.date_range("2012-01-01", periods=10))

In [51]: df
Out[51]: 
                   A
2012-01-01  0.457670
2012-01-02 -0.871395
2012-01-03  1.878444
2012-01-04  1.106600
2012-01-05 -0.282401
2012-01-06  0.057577
2012-01-07  0.567901
2012-01-08 -0.213132
2012-01-09  0.264840
2012-01-10 -0.994729

In [54]: df.loc['2012-01-03']
Out[54]: 
A    1.878444
Name: 2012-01-03 00:00:00, dtype: float64

In [55]: df.loc[datetime.date(2012, 1, 3)]
Out[55]: 
A    1.878444
Name: 2012-01-03 00:00:00, dtype: float64

Above is on normal DatetimeIndex (the key is just coerced to a Timestamp). MultiIndex has always lagged somewhat behind on that.
So I think having it consistent would be good.

Not that I recommend using a datetime.date!

jorisvandenbossche on 6 Apr 2017

reopening as doesn't work on MI as indicated above.

jreback on 28 Apr 2017

👍2

I see this issue is still open. I have a similar problem on pandas 1.1.1:

df = pd.DataFrame({'a': ['2020-01-01', '2020-02-01', '2020-03-01'], 'b': [1, 2, 3], 'c': [10, 20, 30]})
df['a'] = pd.to_datetime(df['a'])
df = df.set_index(['a', 'b'])
df.loc[('2020-01-01', 1)]  # This works.
df.loc[[('2020-01-01', 1), ('2020-02-01', 2)]]  # This fails.
df.loc[[(pd.Timestamp(2020, 1, 1), 1), (pd.Timestamp(2020, 2, 1), 2)]]  # This works.

So, if you have a multiindex dataframe with a datetime index, you can access the datetime index as a string only if you extract one row. If you try to extract more than one, dates have to be given as timestamps.

I know this is not a very common scenario, but is there any plan to solve it, or is there any workaround? Thank you!

pabloarosado on 14 Oct 2020

👍1

I see this issue is still open. I have a similar problem on pandas 1.1.1:
df = pd.DataFrame({'a': ['2020-01-01', '2020-02-01', '2020-03-01'], 'b': [1, 2, 3], 'c': [10, 20, 30]})
df['a'] = pd.to_datetime(df['a'])
df = df.set_index(['a', 'b'])
df.loc[('2020-01-01', 1)]  # This works.
df.loc[[('2020-01-01', 1), ('2020-02-01', 2)]]  # This fails.
df.loc[[(pd.Timestamp(2020, 1, 1), 1), (pd.Timestamp(2020, 2, 1), 2)]]  # This works.
So, if you have a multiindex dataframe with a datetime index, you can access the datetime index as a string only if you extract one row. If you try to extract more than one, dates have to be given as timestamps.

I know this is not a very common scenario, but is there any plan to solve it, or is there any workaround? Thank you!

Just to build on what @pabloarosado has commented, by using _datetime.datetime_ or _datetime.date_ the multindex slicing also seems to work:

df.loc[[(datetime.datetime(2020, 1, 1), 1), (datetime.datetime(2020, 2, 1), 2)]]  # This works
df.loc[[(datetime.date(2020, 1, 1), 1), (datetime.date(2020, 2, 1), 2)]]  # This also works