Current dataframe.drop will raise error for below code for 'non_exist_in_df_col' :
df = df.drop(['col_1', 'col_2', 'non_exist_in_df_col'], axis=1)
But below is better for it can accept it.
def drop_cols(df, del_cols):
for col in (set(del_cols) & set(df.columns)):
df = df.drop([col], axis=1)
return df
DataFrame.drop_cols = drop_cols
And it will be better to add an 'inplace' option to speed up the repeatly df self-assignment.
not sure this is a good idea because then you can easily have silent errors when you just say misspelled something, e.g.
In [8]: df = DataFrame(randn(10,2),columns=['foo','bar'])
In [9]: df.drop('bah')
ValueError: labels ['bah'] not contained in axis
If drop
is silent then this woulld be ok, but a no-op.
as for the inplace suggestion, that is being added in 0.14, however, this does not speed anything up,
just makes the method work inplace. Most operations require a copy to avoid data aliasing.
Or add an "tolerant_option=True|False" in dataframe.drop method ?
For manytimes, just want to make sure certain columns not exist any more in a dataframe after the drop, no matter the cols exists or not in the input df, just don't want to check.
Given that index is basically a hash table, you could wrap like this for the moment:
cols = ['a', 'b', 'c']
cols = [c for c in cols if c in df.index]
df.drop(cols)
You can also perform set operations with two indices like df.index - Index(['a','b'])
Most helpful comment
For manytimes, just want to make sure certain columns not exist any more in a dataframe after the drop, no matter the cols exists or not in the input df, just don't want to check.