Pandas: Drop rows based on condition

Created on 3 May 2018 · 10Comments · Source: pandas-dev/pandas

Here are 2 ways to drop rows from a pandas data-frame based on a condition:

df = df[condition]
df.drop(df[condition].index, axis=0, inplace=True)

The first one does not do it inplace, right?

The second one does not work as expected when the index is not unique, so the user would need to reset_index() then set_index() back.

Question
Would it be possible to have column dropping based directly on the condition?
e.g.
df.drop(condition, axis=0, inplace=True)

API Design Usage Question

Source

sursu

👍8

All 10 comments

you could probably look at DataFrame.where https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.where.html#pandas.DataFrame.where, it has inplace=True flag so it would work for you needs

example:

condition= df.a==1
df.where(cond=condition,inplace=True)

Ashishjshetty on 4 May 2018

👎6

That returns a dataframe with the value of other for the rows which do not meet the condition.

sursu on 4 May 2018

The first one does not do it inplace, right?

Neither does the second one really, if you're talking about memory usage. Those should both be about the same.

TomAugspurger on 4 May 2018

@TomAugspurger so, is inplace in (2) more of a placebo there?
You recommend using (1) as I understand.

sursu on 14 May 2018

Correct.

On Mon, May 14, 2018 at 4:28 AM, Sandu Ursu notifications@github.com
wrote:

@TomAugspurger https://github.com/TomAugspurger so, is inplace in (2)
more of a placebo there?
You recommend using (1) as I understand.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/20944#issuecomment-388754362,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIo5xP3TD3rNAA8EHf0uQjH8B5D6vks5tyU40gaJpZM4TxOXa
.

TomAugspurger on 16 May 2018

I have the same question, but it looks like this isn't solved.

zhuhuren on 29 Jun 2018

Here's my solution. (I can't get the format of the code to work properly here.)

def drop_rows_by_condition(data, cond, temp_col='temp_col'):
"""When there are duplicates in index, dropping rows by condition
is not straight forward. If 2 rows have the same index, one meets the
condition and the other doesn't, the 2 rows tend to be both dropped.
This function solves this problem by using a temporary column.
return: dropped rows
"""
data_meet_criteria = data[cond]
data[temp_col] = cond
data.loc[cond, temp_col] = np.nan
data.dropna(subset=[temp_col], inplace=True, axis=0)
data.drop(temp_col, axis=1, inplace=True)
return data_meet_criteria

zhuhuren on 29 Jun 2018

👎4

I would intuitively use @Ashishjshetty's solution, but suggest a new keyword drop=True (with "False" as default), regardless of inplace. Otherwise it is not a solution.

Stercator on 19 Apr 2019

df = df.query('col_name [==, !=, >, <, ...] "something"')

GPezzuti on 30 Mar 2020

👎1 👍1

From what I can tell there's nothing that needs to be done here (besides, there are talks of deprecating inplace https://github.com/pandas-dev/pandas/issues/16529 ) and so am closing, please ping if I've misunderstood and I'll reopen