Pandas: Query re Deprecation of groupby.agg() with a dictionary when renaming

Created on 12 May 2017 · 7Comments · Source: pandas-dev/pandas

Code Sample, a copy-pastable example if possible

# Your code here
df = pd.DataFrame({'City':['LA', 'NYC', 'NYC', 'LA', 'Chicago', 'NYC'],
       'isFraud':[1, 0, 0, 1, 0, 1]})

df.groupby(df['City'])['isFraud'].agg({'Fraud':sum, 'Non-Fraud': lambda x: len(x)-sum(x), 'Squared': lambda x: (sum(x))**2})

Problem description

I just learnt using a dictionary for renaming in agg is going to be deprecated in the latest version. My question is what's the alternative to achieve the above, i.e. using multiple lambda functions within agg?

Expected Output

         Non-Fraud  Fraud  Squared
City                              
Chicago          1      0        0
LA               0      2        4
NYC              2      1        1

Output of `pd.show_versions()`

Paste the output here pd.show_versions() here

Groupby Usage Question

Source

allen-q

Most helpful comment

Thanks @jorisvandenbossche and @jreback for your answers. It seems there is no short alternative way to do this in one line. It's sad to see such a handy feature to get removed.

allen-q on 15 May 2017

👍8

All 7 comments

See the what's new docs for some pointers as alternatives using renaming afterwards: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#deprecate-groupby-agg-with-a-dictionary-when-renaming (although it will be a bit more convoluted)

Another option is have 'named' functions instead of lambda's:

In [14]: def Fraud(group):
    ...:     return group.sum()
    ...: 

In [15]: def NonFraud(group):
    ...:     return len(group)-sum(group)
    ...: 

In [16]: def Squared(group):
    ...:     return (sum(group))**2
    ...: 

In [20]: df.groupby(df['City'])['isFraud'].agg([Fraud, NonFraud, Squared])
Out[20]: 
         Fraud  NonFraud  Squared
City                             
Chicago      0         1        0
LA           2         0        4
NYC          1         2        1

jorisvandenbossche on 12 May 2017

Another way, which I personally would do (this will also be more performant)

In [18]: df = pd.DataFrame({'City':['LA', 'NYC', 'NYC', 'LA', 'Chicago', 'NYC'],
    ...:        'isFraud':[1, 0, 0, 1, 0, 1]})
    ...: 
    ...: result = df.groupby(df['City'])['isFraud'].agg(['sum', 'size'])
    ...: result = pd.DataFrame({'Fraud': result['sum'], 
    ...:                        'NonFraud': result['size']-result['sum'], 
    ...:                        'Squared': result['sum']**2})
    ...: result
    ...: 
    ...: 
    ...: 
    ...: 
Out[18]: 
         Fraud  NonFraud  Squared
City                             
Chicago      0         1        0
LA           2         0        4
NYC          1         2        1

jreback on 12 May 2017

Thanks @jorisvandenbossche and @jreback for your answers. It seems there is no short alternative way to do this in one line. It's sad to see such a handy feature to get removed.

allen-q on 15 May 2017

👍8

So it looks like using a list of tuples, rather than a dict, is still supported?

df.groupby(df['City'])['isFraud'].agg([
    ('Fraud', sum),
    ('Non-Fraud', lambda x: len(x)-sum(x)),
    ('Squared', lambda x: (sum(x))**2)
])
        Fraud   Non-Fraud   Squared
City            
Chicago     0           1          0
LA          2           0          4
NYC         1           2          1

This even looks to be supported when applying different functions to different columns, similar to the dict approach:

df['severity'] = np.arange(len(df))

df.groupby(df['City']).agg({
    'isFraud': [
        ('Fraud', sum),
        ('Non-Fraud', lambda x: len(x)-sum(x)),
        ('Squared', lambda x: (sum(x))**2)
    ], 
    'severity': [('avg severity', 'mean')]    
})
            isFraud                         severity
        Fraud   Non-Fraud   Squared     avg severity
City                
Chicago     0    1          0           4.000000
LA          2    0          4           1.500000
NYC         1    2          1           2.666667

Will that continue to be supported?