# Your code here
df = pd.DataFrame({'City':['LA', 'NYC', 'NYC', 'LA', 'Chicago', 'NYC'],
'isFraud':[1, 0, 0, 1, 0, 1]})
df.groupby(df['City'])['isFraud'].agg({'Fraud':sum, 'Non-Fraud': lambda x: len(x)-sum(x), 'Squared': lambda x: (sum(x))**2})
I just learnt using a dictionary for renaming in agg is going to be deprecated in the latest version. My question is what's the alternative to achieve the above, i.e. using multiple lambda functions within agg?
Non-Fraud Fraud Squared
City
Chicago 1 0 0
LA 0 2 4
NYC 2 1 1
pd.show_versions()
See the what's new docs for some pointers as alternatives using renaming afterwards: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#deprecate-groupby-agg-with-a-dictionary-when-renaming (although it will be a bit more convoluted)
Another option is have 'named' functions instead of lambda's:
In [14]: def Fraud(group):
...: return group.sum()
...:
In [15]: def NonFraud(group):
...: return len(group)-sum(group)
...:
In [16]: def Squared(group):
...: return (sum(group))**2
...:
In [20]: df.groupby(df['City'])['isFraud'].agg([Fraud, NonFraud, Squared])
Out[20]:
Fraud NonFraud Squared
City
Chicago 0 1 0
LA 2 0 4
NYC 1 2 1
Another way, which I personally would do (this will also be more performant)
In [18]: df = pd.DataFrame({'City':['LA', 'NYC', 'NYC', 'LA', 'Chicago', 'NYC'],
...: 'isFraud':[1, 0, 0, 1, 0, 1]})
...:
...: result = df.groupby(df['City'])['isFraud'].agg(['sum', 'size'])
...: result = pd.DataFrame({'Fraud': result['sum'],
...: 'NonFraud': result['size']-result['sum'],
...: 'Squared': result['sum']**2})
...: result
...:
...:
...:
...:
Out[18]:
Fraud NonFraud Squared
City
Chicago 0 1 0
LA 2 0 4
NYC 1 2 1
Thanks @jorisvandenbossche and @jreback for your answers. It seems there is no short alternative way to do this in one line. It's sad to see such a handy feature to get removed.
So it looks like using a list of tuples, rather than a dict, is still supported?
df.groupby(df['City'])['isFraud'].agg([
('Fraud', sum),
('Non-Fraud', lambda x: len(x)-sum(x)),
('Squared', lambda x: (sum(x))**2)
])
Fraud Non-Fraud Squared
City
Chicago 0 1 0
LA 2 0 4
NYC 1 2 1
This even looks to be supported when applying different functions to different columns, similar to the dict approach:
df['severity'] = np.arange(len(df))
df.groupby(df['City']).agg({
'isFraud': [
('Fraud', sum),
('Non-Fraud', lambda x: len(x)-sum(x)),
('Squared', lambda x: (sum(x))**2)
],
'severity': [('avg severity', 'mean')]
})
isFraud severity
Fraud Non-Fraud Squared avg severity
City
Chicago 0 1 0 4.000000
LA 2 0 4 1.500000
NYC 1 2 1 2.666667
Will that continue to be supported?
^^ this is even what is recommended in wes's last Pandas book. a bit confusing indeed
How is the deprecation status of the tuple naming method that @bridwell mentions?
Will it continue to be supported?
How does the documentation account for @bridwell's non-warning example?
Please clarify in the documentation and add more examples as needed.
Most helpful comment
Thanks @jorisvandenbossche and @jreback for your answers. It seems there is no short alternative way to do this in one line. It's sad to see such a handy feature to get removed.