Pandas: More tolerant dataframe drop method for multiple columns deletion

Created on 23 Oct 2013 · 5Comments · Source: pandas-dev/pandas

Current dataframe.drop will raise error for below code for 'non_exist_in_df_col' :

df = df.drop(['col_1', 'col_2', 'non_exist_in_df_col'], axis=1)

But below is better for it can accept it.

def drop_cols(df, del_cols):   
    for col in (set(del_cols) & set(df.columns)):
        df = df.drop([col], axis=1)
    return df
DataFrame.drop_cols = drop_cols

And it will be better to add an 'inplace' option to speed up the repeatly df self-assignment.

API Design Indexing

Source

halleygithub

Most helpful comment

For manytimes, just want to make sure certain columns not exist any more in a dataframe after the drop, no matter the cols exists or not in the input df, just don't want to check.

halleygithub on 23 Oct 2013

👍11

All 5 comments

not sure this is a good idea because then you can easily have silent errors when you just say misspelled something, e.g.

In [8]: df = DataFrame(randn(10,2),columns=['foo','bar'])

In [9]: df.drop('bah')
ValueError: labels ['bah'] not contained in axis

If drop is silent then this woulld be ok, but a no-op.

as for the inplace suggestion, that is being added in 0.14, however, this does not speed anything up,
just makes the method work inplace. Most operations require a copy to avoid data aliasing.

jreback on 23 Oct 2013

Or add an "tolerant_option=True|False" in dataframe.drop method ?

halleygithub on 23 Oct 2013

For manytimes, just want to make sure certain columns not exist any more in a dataframe after the drop, no matter the cols exists or not in the input df, just don't want to check.

halleygithub on 23 Oct 2013

👍11

Given that index is basically a hash table, you could wrap like this for the moment:

cols = ['a', 'b', 'c']
cols = [c for c in cols if c in df.index]
df.drop(cols)