Consider the simple example as follows
import pandas as pd
dataframe = pd.DataFrame({'time' : [pd.to_datetime('2016-06-06'),
pd.to_datetime('2016-06-06'),
pd.to_datetime('2016-06-07'),
pd.to_datetime('2016-06-08')],
'value' : [1,2,3,4],
'group' : ['A','B','A','B']})
dataframe.set_index('time', inplace = True)
dataframe
Out[13]:
group value
time
2016-06-06 A 1
2016-06-06 B 2
2016-06-07 A 3
2016-06-08 B 4
Now I want to use all the cool partial string indexing functions, but also be able to filter on other variables. The solution I come up with is the syntax
dataframe.loc[dataframe['group'] == 'A'].loc['2016-06-06']
which looks really horrible with the two loc combined. Is that the correct pandonic way to do so (while keeping the index?) I have read and read again the documentation but I cannot find the answer.
Thanks!
@randomgambit : Thanks for the question! I think you might be better served emailing this question out to the pandas-dev mailing list or reaching out on our Gitter channel.
I will close this issue for now unless there turns out to be a real issue (or enhancement) with the pandas codebase that can be well addressed with the example you have provided.
thanks @gfyoung but can you at least tell me if that's the correct syntax here? Its not easy to send en example via email, and there is nobody on gitter
We don't directly have an exposed partial string indexer. Instead .get_loc will do this, but it can return various things (e.g. a single index, array of booleans, or slice), so it is not directly user facing.
In [15]: dfi = dataframe.index
In [16]: dataframe.loc[(dataframe['group'] == 'A') & dfi.isin(dfi[dfi.get_loc('2016-06-06')])]
Out[16]:
group value
time
2016-06-06 A 1
@jreback thanks a lot. I think this would be a really nice feature to have. partial string indexing is so useful, but 90% of the time I need to combine if with some other logical condition so I end up doing some df.reset_index().loc[] which is super annoying.
When you say its not user facing, you mean I should not use it?
i mean there is no direct way to do this aside from the verbose code that i posted
i suppose u could make an index function to wrap this as a convenience
if u want to post a signature / doc string can take a look
Given where the discussion has gone, I'll reopen this.
but can you at least tell me if that's the correct syntax here? Its not easy to send en example via email, and there is nobody on gitter
@randomgambit : Sorry about that! I was responding from my phone, so I was hoping email or Gitter could provide you with a more comprehensive response.
@gfyoung @jreback I tried to find more conventional solutions and actually I found this funny thing:
dataframe.loc[(dataframe.reset_index()['time'] == pd.to_datetime('2016-06-06')) & (dataframe['group'] == 'A')]
FAILS
dataframe.reset_index(inplace = True)
dataframe.loc[(dataframe['time'] == pd.to_datetime('2016-06-06')) & (dataframe['group'] == 'A')]
WORKS
in purely logical terms, I dont see why 1. does not work here
@randomgambit : What indices do you get from each call (i.e. compare what is being passed in-between the outermost brackets)?
@gfyoung
dataframe.set_index('time', inplace = True)
dataframe.reset_index()['time']
Out[35]:
0 2016-06-06
1 2016-06-06
2 2016-06-07
3 2016-06-08
Name: time, dtype: datetime64[ns]
dataframe.reset_index(inplace = True)
dataframe['time']
Out[37]:
0 2016-06-06
1 2016-06-06
2 2016-06-07
3 2016-06-08
Name: time, dtype: datetime64[ns]
And in your first example, what does the index look like? The DataFrame in your two examples are not the same because one had its index reset (second example) but not the other (first example).
@gfyoung I am sorry I dont understand what you want me to do here. I get that they are not the same, but I dont see why dataframe.loc[(dataframe.reset_index()['time'] == pd.to_datetime('2016-06-06')) & (dataframe['group'] == 'A')] does not work
(dataframe.reset_index()['time'] == pd.to_datetime('2016-06-06')) is simply a boolean test
Right, but you get a Series returned from that boolean test. The indices in that Series won't match those of the dataframe.
@gfyoung got it, that works
dataframe.loc[(dataframe.reset_index()['time'] == pd.to_datetime('2016-06-06')).tolist() & (dataframe['group'] == 'A')]
Awesome! Glad we could resolve that. Feel free to post a PR to wrap the functionality you shared above so that we don't have to write this out again :smile:
Most helpful comment
Right, but you get a
Seriesreturned from that boolean test. The indices in thatSerieswon't match those of thedataframe.