# Your code here
鈥媅in]data.index
[out]
MultiIndex(levels=[[2017-11-01 00:00:00, 2017-11-02 00:00:00, 2017-11-03 00:00:00, 2017-11-06 00:00:00, 2017-11-07 00:00:00, 2017-11-08 00:00:00, 2017-11-09 00:00:00, 2017-11-10 00:00:00, 2017-11-13 00:00:00, 2017-11-14 00:00:00, 2017-11-15 00:00:00, 2017-11-16 00:00:00, 2017-11-17 00:00:00, 2017-11-20 00:00:00, 2017-11-21 00:00:00, 2017-11-22 00:00:00, 2017-11-23 00:00:00, 2017-11-24 00:00:00, 2017-11-27 00:00:00, 2017-11-28 00:00:00, 2017-11-29 00:00:00, 2017-11-30 00:00:00, 2017-12-01 00:00:00, 2017-12-04 00:00:00, 2017-12-05 00:00:00, 2017-12-06 00:00:00, 2017-12-07 00:00:00, 2017-12-08 00:00:00, 2017-12-11 00:00:00, 2017-12-12 00:00:00, 2017-12-13 00:00:00, 2017-12-14 00:00:00, 2017-12-15 00:00:00, 2017-12-18 00:00:00, 2017-12-19 00:00:00, 2017-12-20 00:00:00, 2017-12-21 00:00:00, 2017-12-22 00:00:00, 2017-12-25 00:00:00, 2017-12-26 00:00:00, 2017-12-27 00:00:00, 2017-12-28 00:00:00, 2017-12-29 00:00:00], ['000001', '000002', '000004']],
labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]],
names=['date', 'code'])
[in] data.index.levels[0]
[out]
DatetimeIndex(['2017-11-01', '2017-11-02', '2017-11-03', '2017-11-06',
'2017-11-07', '2017-11-08', '2017-11-09', '2017-11-10',
'2017-11-13', '2017-11-14', '2017-11-15', '2017-11-16',
'2017-11-17', '2017-11-20', '2017-11-21', '2017-11-22',
'2017-11-23', '2017-11-24', '2017-11-27', '2017-11-28',
'2017-11-29', '2017-11-30', '2017-12-01', '2017-12-04',
'2017-12-05', '2017-12-06', '2017-12-07', '2017-12-08',
'2017-12-11', '2017-12-12', '2017-12-13', '2017-12-14',
'2017-12-15', '2017-12-18', '2017-12-19', '2017-12-20',
'2017-12-21', '2017-12-22', '2017-12-25', '2017-12-26',
'2017-12-27', '2017-12-28', '2017-12-29'],
dtype='datetime64[ns]', name='date', freq=None)
[in]data.xs('2017-11',0)
[out]
聽 | code | open | high | low | close | volume | date | high_limit | low_limit
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
000001 | 11.56 | 11.59 | 11.32 | 11.40 | 692617.0 | 2017-11-01 | 12.69 | 10.39
000001 | 11.36 | 11.58 | 11.26 | 11.54 | 604308.0 | 2017-11-02 | 12.53 | 10.25
000001 | 11.49 | 11.68 | 11.35 | 11.39 | 743343.0 | 2017-11-03 | 12.41 | 10.15
000001 | 11.42 | 11.42 | 11.09 | 11.28 | 1029902.0 | 2017-11-06 | 13.11 | 10.73
000001 | 11.27 | 12.09 | 11.25 | 11.92 | 2477163.0 | 2017-11-07 | 13.34 | 10.92
000001 | 12.00 | 12.59 | 11.93 | 12.13 | 4262825.0 | 2017-11-08 | 13.56 | 11.10
000001 | 12.20 | 12.57 | 12.15 | 12.33 | 2295289.0 | 2017-11-09 | 13.53 | 11.07
000001 | 12.37 | 12.55 | 12.15 | 12.30 | 1757552.0 | 2017-11-10 | 14.19 | 11.61
000001 | 12.35 | 13.10 | 12.35 | 12.90 | 2566906.0 | 2017-11-13 | 14.25 | 11.66
000001 | 12.95 | 13.26 | 12.81 | 12.95 | 1780302.0 | 2017-11-14 | 14.19 | 11.61
000001 | 12.90 | 13.13 | 12.77 | 12.90 | 1263052.0 | 2017-11-15 | 14.41 | 11.79
000001 | 12.90 | 13.12 | 12.67 | 13.10 | 1200814.0 | 2017-11-16 | 14.50 | 11.86
000001 | 13.17 | 13.46 | 13.03 | 13.18 | 2064149.0 | 2017-11-17 | 15.68 | 12.83
000001 | 13.13 | 14.25 | 13.05 | 14.25 | 2843925.0 | 2017-11-20 | 15.90 | 13.01
000001 | 14.07 | 14.79 | 14.01 | 14.45 | 2495327.0 | 2017-11-21 | 16.61 | 13.59
000001 | 14.48 | 15.24 | 14.48 | 15.10 | 2570231.0 | 2017-11-22 | 15.79 | 12.92
000001 | 15.15 | 15.24 | 14.07 | 14.35 | 2429142.0 | 2017-11-23 | 16.02 | 13.10
000001 | 14.29 | 14.59 | 13.78 | 14.56 | 2630189.0 | 2017-11-24 | 15.32 | 12.54
000001 | 14.30 | 14.30 | 13.75 | 13.93 | 2064085.0 | 2017-11-27 | 15.07 | 12.33
000001 | 13.85 | 13.85 | 13.40 | 13.70 | 1766432.0 | 2017-11-28 | 15.20 | 12.44
000001 | 13.73 | 13.93 | 13.47 | 13.82 | 1564094.0 | 2017-11-29 | 14.72 | 12.04
000001 | 13.70 | 13.73 | 13.26 | 13.38 | 1379635.0 | 2017-11-30 | 14.30 | 11.70
I got a multiindex-dataframe, and trying to select time by month
while i only use 'date' as the singal index, using data['2017-11']I can get output of 3 securities in 2017-11
horever while I use dataframe.xs for index selecting I could only get output of 1 security
seemed like dataframe.multiindex.get_loc('2017-11',level=0) could only get the same result of 1 security
Expected to get 3 security output
pd.show_versions()INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.3
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.5.0
</details>
you would need to post a copy pastable example that reproduces
I upload a hdf5 file, and store it in 'http://osnhakmay.bkt.clouddn.com/data.h5'
sorry for inconvenience and thanks for your time @jreback ~~
after download the h5 file using this code to restore:
import pandas as pd
data=pd.read_hdf('data.h5')
pls make a minimal example
well I give a simple example to reproduce @jreback
import pandas as pd
data=pd.DataFrame([['2017-01-01','000001','a',4,5],['2017-01-02','000001','b',4,5],['2017-01-01','000002','b',4,5],['2017-01-02','000002','b',4,5],['2017-02-02','000001','b',4,5],['2017-02-03','000002','b',4,5]],columns=['date','code','c','d','e'])
data=data.assign(date=pd.to_datetime(data.date)).set_index(['date','code'])
this dataframe contains multiindex with a datetime-series index('date') and a object-series index('code')
while we using data.xs('2017-01',level=0) it should get a result of both '000001' and '000002' in 2017-01 month however it just return a result of 000001, like below
data.xs('2017-01',level=0)
by the way:
date , like thisimport pandas as pd
data2=pd.DataFrame([['2017-01-01','000001','a',4,5],['2017-01-02','000001','b',4,5],['2017-01-01','000002','b',4,5],['2017-01-02','000002','b',4,5],['2017-02-02','000001','b',4,5],['2017-02-03','000002','b',4,5]],columns=['date','code','c','d','e'])
data2=data2.assign(date=pd.to_datetime(data2.date)).set_index('date')
and using data2['2017-01'] could get the month result of 2 securities
data.xsmethod could also get the 2-securities resultdata.xs('2017-01-01',level=0
try on a more recent version of pandas
you have a duplicated MI as well
I've just update the pandas to 0.22.0 and the problem still exist
pd.show_version()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
and in the example I gave you, it doesn't contain duplicated MI
data.index.has_duplicates
False
@jreback
@yutiansut can you update your original post with the minimal example?
In [22]: import pandas as pd
...: data=pd.DataFrame([['2017-01-01','000001','a',4,5],['2017-01-02','000001','b',4,5],['2017-01-01','000002','b',4,5],['2017-01-02','000002','b',4,5],['2017-02-02','000001','b',4,5],['2017-02-03','000002','b',4,5]],columns=['date','code','c','d','e'])
...:
...: data=data.assign(date=pd.to_datetime(data.date)).set_index(['date','code']).sort_index()
...:
In [23]: data.loc['2017-01', slice(None)]
Out[23]:
c d e
date code
2017-01-01 000001 a 4 5
000002 b 4 5
2017-01-02 000001 b 4 5
000002 b 4 5
In [24]: data.xs('2017-01', level=0)
Out[24]:
c d e
code
000001 a 4 5
000002 b 4 5
Maybe clean up the formatting a bit, so that it's easier to read.
@TomAugspurger could i give you a sample with hdf5 format? cause i get the data from my project and data management maybe a little difficult ths~
I zipped it for github could only support zip format not hdf5 file
with a lot of thanks
using:
unzip the file and execute:
import pandas as pd
data=pd.read_hdf('data.h5')
this is a mulit-index dataframe with date and code, however:
data.loc['2017-01',slice(None)]
@TomAugspurger @jreback ths I found the solution~
this is caused by not index-sort error
add by:
data=data.sort_index()
ths for @TomAugspurger solutions
ths @jreback
@TomAugspurger however after sort_index , the xs method and ```loc```` method give diff result, like below
data=pd.read_hdf('data.h5')
data=data.sort_index()
data.loc['2017-01',slice(None)]
```
code open high low close volume date high_limit low_limit
date code
2017-01-25 000001 000001 9.27 9.28 9.25 9.26 304401.0 2017-01-25 NaN NaN
000002 000002 20.68 20.73 20.52 20.61 154647.0 2017-01-25 10.19 8.33
000004 000004 38.54 39.00 37.62 38.25 11086.0 2017-01-25 22.75 18.61
2017-01-26 000001 000001 9.27 9.34 9.26 9.33 420712.0 2017-01-26 10.19 8.33
000002 000002 20.65 20.77 20.65 20.68 141248.0 2017-01-26 22.67 18.55
000004 000004 38.23 38.99 38.01 38.29 9294.0 2017-01-26 42.08 34.43
### data.xs exec
```python
data.xs('2017-01',level=0)
聽 | code | open | high | low | close | volume | date | high_limit | low_limit
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
000001 | 9.27 | 9.28 | 9.25 | 9.26 | 304401.0 | 2017-01-25 | NaN | NaN
000002 | 20.68 | 20.73 | 20.52 | 20.61 | 154647.0 | 2017-01-25 | 10.19 | 8.33
the result of loc method is exactly what i need, and I really don't know why xs result have only '000001' and '000002' while missing the '000004' code and the day after '2017-01-25'