Basically Indexing a MultiIndex with a DatetimeIndex seems only to be working if you use slices with datetime.datetime or pandas.Timestamp. One would expect it to work also with strings as well as with 'datetime.date' slices as it does for usual indizes.
This seems to be related to #3843.
import numpy as np
import pandas as pd
import time
import datetime as dt
min_ts = time.mktime(dt.date(2016,10,1).timetuple())
timestamps = [min_ts + offset*86400 + noise*3600
for offset, noise in enumerate(np.random.rand(100))]
time_idx = pd.to_datetime(sorted(timestamps), unit='s')
id_idx = np.random.choice(np.arange(10), 100)
df = pd.DataFrame(np.identity(100),
index=pd.MultiIndex.from_arrays(
[time_idx, id_idx]
))
df.loc[dt.datetime(2016, 10, 1):] # works
df.loc[dt.date(2016, 10, 1):] # fails :/
The above code raises the following exception on the last line, which is quite unexpected when one is used to indexing on single index dataframes.
Traceback (most recent call last):
File "<input>", line 11, in <module>
File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/core/indexing.py", line 1312, in __getitem__
return self._getitem_axis(key, axis=0)
File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/core/indexing.py", line 1453, in _getitem_axis
return self._get_slice_axis(key, axis=axis)
File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/core/indexing.py", line 1334, in _get_slice_axis
slice_obj.step, kind=self.name)
File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/base.py", line 2997, in slice_indexer
kind=kind)
File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/multi.py", line 1578, in slice_locs
return super(MultiIndex, self).slice_locs(start, end, step, kind=kind)
File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/base.py", line 3176, in slice_locs
start_slice = self.get_slice_bound(start, 'left', kind)
File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/multi.py", line 1549, in get_slice_bound
return self._partial_tup_index(label, side=side)
File "/Users/kayibal/virtualenvs/traildb-sparse/lib/python3.5/site-packages/pandas/indexes/multi.py", line 1594, in _partial_tup_index
raise TypeError('Level type mismatch: %s' % lab)
TypeError: Level type mismatch: 2016-10-01
pd.show_versions()commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 34.2.0
Cython: 0.25.2
numpy: 1.12.0
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
datetime.date is not a first class type, so we barely support it, almost always it is coerced (and we certainly don't support indexing via it).
and strings work fine.
In [10]: df.loc['2016-10-01']
Out[10]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ... 75 76 77 78 79 80 81 \
2016-10-01 04:18:59.654717 4 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
2016-10-01 04:18:59.654717 4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[1 rows x 100 columns]
all that said, if you want to do a PR to support indexing via datetime.date prob would take it.
we certainly don't support indexing via it
We actually do support it:
In [50]: df = pd.DataFrame({'A': np.random.randn(10)}, index=pd.date_range("2012-01-01", periods=10))
In [51]: df
Out[51]:
A
2012-01-01 0.457670
2012-01-02 -0.871395
2012-01-03 1.878444
2012-01-04 1.106600
2012-01-05 -0.282401
2012-01-06 0.057577
2012-01-07 0.567901
2012-01-08 -0.213132
2012-01-09 0.264840
2012-01-10 -0.994729
In [54]: df.loc['2012-01-03']
Out[54]:
A 1.878444
Name: 2012-01-03 00:00:00, dtype: float64
In [55]: df.loc[datetime.date(2012, 1, 3)]
Out[55]:
A 1.878444
Name: 2012-01-03 00:00:00, dtype: float64
Above is on normal DatetimeIndex (the key is just coerced to a Timestamp). MultiIndex has always lagged somewhat behind on that.
So I think having it consistent would be good.
Not that I recommend using a datetime.date!
reopening as doesn't work on MI as indicated above.
I see this issue is still open. I have a similar problem on pandas 1.1.1:
df = pd.DataFrame({'a': ['2020-01-01', '2020-02-01', '2020-03-01'], 'b': [1, 2, 3], 'c': [10, 20, 30]})
df['a'] = pd.to_datetime(df['a'])
df = df.set_index(['a', 'b'])
df.loc[('2020-01-01', 1)] # This works.
df.loc[[('2020-01-01', 1), ('2020-02-01', 2)]] # This fails.
df.loc[[(pd.Timestamp(2020, 1, 1), 1), (pd.Timestamp(2020, 2, 1), 2)]] # This works.
So, if you have a multiindex dataframe with a datetime index, you can access the datetime index as a string only if you extract one row. If you try to extract more than one, dates have to be given as timestamps.
I know this is not a very common scenario, but is there any plan to solve it, or is there any workaround? Thank you!
I see this issue is still open. I have a similar problem on pandas 1.1.1:
df = pd.DataFrame({'a': ['2020-01-01', '2020-02-01', '2020-03-01'], 'b': [1, 2, 3], 'c': [10, 20, 30]}) df['a'] = pd.to_datetime(df['a']) df = df.set_index(['a', 'b']) df.loc[('2020-01-01', 1)] # This works. df.loc[[('2020-01-01', 1), ('2020-02-01', 2)]] # This fails. df.loc[[(pd.Timestamp(2020, 1, 1), 1), (pd.Timestamp(2020, 2, 1), 2)]] # This works.So, if you have a multiindex dataframe with a datetime index, you can access the datetime index as a string only if you extract one row. If you try to extract more than one, dates have to be given as timestamps.
I know this is not a very common scenario, but is there any plan to solve it, or is there any workaround? Thank you!
Just to build on what @pabloarosado has commented, by using _datetime.datetime_ or _datetime.date_ the multindex slicing also seems to work:
df.loc[[(datetime.datetime(2020, 1, 1), 1), (datetime.datetime(2020, 2, 1), 2)]] # This works
df.loc[[(datetime.date(2020, 1, 1), 1), (datetime.date(2020, 2, 1), 2)]] # This also works
Most helpful comment
reopening as doesn't work on MI as indicated above.