Here are 2 ways to drop rows from a pandas data-frame based on a condition:
df = df[condition]
df.drop(df[condition].index, axis=0, inplace=True)
The first one does not do it inplace, right?
The second one does not work as expected when the index is not unique, so the user would need to reset_index()
then set_index()
back.
Question
Would it be possible to have column dropping based directly on the condition?
e.g.
df.drop(condition, axis=0, inplace=True)
you could probably look at DataFrame.where https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.where.html#pandas.DataFrame.where
, it has inplace=True flag so it would work for you needs
example:
condition= df.a==1
df.where(cond=condition,inplace=True)
That returns a dataframe with the value of other
for the rows which do not meet the condition.
The first one does not do it inplace, right?
Neither does the second one really, if you're talking about memory usage. Those should both be about the same.
@TomAugspurger so, is inplace
in (2) more of a placebo there?
You recommend using (1) as I understand.
Correct.
On Mon, May 14, 2018 at 4:28 AM, Sandu Ursu notifications@github.com
wrote:
@TomAugspurger https://github.com/TomAugspurger so, is inplace in (2)
more of a placebo there?
You recommend using (1) as I understand.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pandas-dev/pandas/issues/20944#issuecomment-388754362,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIo5xP3TD3rNAA8EHf0uQjH8B5D6vks5tyU40gaJpZM4TxOXa
.
I have the same question, but it looks like this isn't solved.
Here's my solution. (I can't get the format of the code to work properly here.)
def drop_rows_by_condition(data, cond, temp_col='temp_col'):
"""When there are duplicates in index, dropping rows by condition
is not straight forward. If 2 rows have the same index, one meets the
condition and the other doesn't, the 2 rows tend to be both dropped.
This function solves this problem by using a temporary column.
return: dropped rows
"""
data_meet_criteria = data[cond]
data[temp_col] = cond
data.loc[cond, temp_col] = np.nan
data.dropna(subset=[temp_col], inplace=True, axis=0)
data.drop(temp_col, axis=1, inplace=True)
return data_meet_criteria
I would intuitively use @Ashishjshetty's solution, but suggest a new keyword drop=True
(with "False" as default), regardless of inplace
. Otherwise it is not a solution.
df = df.query('col_name [==, !=, >, <, ...] "something"')
From what I can tell there's nothing that needs to be done here (besides, there are talks of deprecating inplace
https://github.com/pandas-dev/pandas/issues/16529 ) and so am closing, please ping if I've misunderstood and I'll reopen