Pandas: Option to return all idxmax rows in case of ties

Created on 16 May 2020  路  8Comments  路  Source: pandas-dev/pandas

What do you think of adding an option to 芦聽idmax聽禄 in order to return not just the first idmax but all of them in case of ties.

I think this would be really useful, especially when used with groupby. For exemple when you want the most recent yearly published data for an entity - say a company.

We could simply add a Boolean to the function like returnTies that would be false by default.

API - Consistency API Design DataFrame Enhancement Series

Most helpful comment

The return value is explicitly documented as "Indices of the maximum values."

latest 1.1.0 docs are much clearer.

All 8 comments

emm, sounds fair.

maybe add keep option similar to nlargest or nsmallest?

take

maybe add keep option similar to nlargest or nsmallest?

so for api consistency we may also want to add a keep='last' option and also add it to argmax/argmin

Should any tests be written for this?

@MatthewMoye yep, tests are the first thing for PRs (especially for new feature implementation and bug fixes), and i don't think maintainers are gonna review if there is no tests in the PR.

Honestly, I think the documentation sounds like it currently does this already! (Except that it doesn't.)

When I first read the documentation here, I assumed that what you're describing here is the default behavior. The return value is explicitly documented as "Indices of the maximum values."

I'm still not sure what the docs actually mean. They're either incorrect or they're referring to some higher-dimensional NumPy situation that simply doesn't arise in normal pandas use.

Either way, I strongly suggest fixing the docs for clarity and correctness as part of this issue.

The return value is explicitly documented as "Indices of the maximum values."

latest 1.1.0 docs are much clearer.

+1 here, I was expecting the behavior described by @nathancarter

Was this page helpful?
0 / 5 - 0 ratings