Pandas: BUG: groupby.agg with more than one lambda is not allowed?

Created on 20 May 2014 · 11Comments · Source: pandas-dev/pandas

consider the following example

 a = rand(100)
 b = np.floor(rand(100)*100)

 df = pd.DataFrame({'a' : a , 'b' : b})

 grp = df.groupby(df.b)

I have grouped the values in a by b.

Now, if I want to plot the trend over the groups with mean and std I can do

grp.a.agg([np.mean, lambda x : np.mean(x) + np.std(x) , lambda x :  np.mean(x) - np.std(x) ]).plot()

which gives me

SpecificationError: Function names must be unique, found multiple named <lambda>

while

  grp.a.agg([np.mean, lambda x : np.mean(x) + np.std(x) ]).plot()

which has just one lambda works ok.

Is this a bug?

In order to make the thing work I had to define real functions (i.e. in terms of def), to be put in agg.

Groupby

Source

acorbe

Most helpful comment

This has caused me huge frustration and I believe this should be updated to allow passing the same function and then providing the desired name of the output column. I'm working with a custom aggregation function that takes an additional argument by using functool's partial or simply using multiple lambda functions. I was hoping to avoid 6 separate named functions, but with the current method I have to do that, even though each function is only slightly different than the other. The "workarounds" here don't save any time compared to just having separately defined functions that are all very similar.

neilaronson on 11 Jan 2018

👍8

All 11 comments

You can specify a dictionary; this requires named columns. I suppose it could work, not 100% sure why it was done this way (it needs unique functions as the results are returned as a dictionary; they could in theory be returned as a list I think that could simply create columns).

In [27]: grp.a.agg({'one' : np.mean, 'two' : lambda x : np.mean(x) + np.std(x) , 'three' : lambda x :  np.mean(x) - np.std(x) })
Out[27]: 
         three       two       one
b                                 
-253  0.156897  0.156897  0.156897
-216  0.452120  0.452120  0.452120
-191  0.893074  0.893074  0.893074
-178  1.170801  1.170801  1.170801
-177 -1.324476 -1.324476 -1.324476
-162  0.835708  1.241353  1.038531
-156 -1.220583 -1.220583 -1.220583
-147 -2.301474 -2.301474 -2.301474
-136 -1.125749 -1.125749 -1.125749
-133 -0.398064 -0.398064 -0.398064
-132  0.011879  0.011879  0.011879
-129 -0.257017 -0.257017 -0.257017
-114  0.795851  0.795851  0.795851
-113 -1.697932 -1.697932 -1.697932
-111 -0.309536 -0.309536 -0.309536
-110 -0.031828 -0.031828 -0.031828
-94  -0.391354 -0.391354 -0.391354
-87  -0.010518  0.551286  0.270384
-85  -0.711772 -0.711772 -0.711772
-77  -0.147718 -0.106666 -0.127192
-73  -0.796055  0.985810  0.094878
-68  -0.249214 -0.249214 -0.249214
-65   0.897349  0.897349  0.897349
-64  -0.151405 -0.014542 -0.082973
-60  -0.305136 -0.305136 -0.305136
-52   0.084092  0.084092  0.084092
-51  -0.821255 -0.619251 -0.720253
-48  -0.542030  1.237966  0.347968
-44   0.822566  0.822566  0.822566
-43   0.165354  0.165354  0.165354
-38   1.052166  1.052166  1.052166
-33   0.649841  0.649841  0.649841
-32  -0.020592 -0.020592 -0.020592
-31  -1.340543  0.886358 -0.227093
-30   0.278267  0.278267  0.278267
-15   0.220145  0.220145  0.220145
-12  -0.247523 -0.247523 -0.247523
-9   -1.017454 -1.017454 -1.017454
-5    2.230568  2.230568  2.230568
-3   -1.258155 -1.258155 -1.258155
 1   -0.310485 -0.310485 -0.310485
 2   -0.265832 -0.265832 -0.265832
 3   -0.008983 -0.008983 -0.008983
 5   -0.320702 -0.320702 -0.320702
 13  -0.634021 -0.634021 -0.634021
 14   0.588749  0.588749  0.588749
 16  -0.843814 -0.843814 -0.843814
 18  -0.534178 -0.534178 -0.534178
 19  -0.246229 -0.246229 -0.246229
 20  -0.095204 -0.095204 -0.095204
 21  -1.586995  0.941961 -0.322517
 27  -0.054841 -0.054841 -0.054841
 38   0.108338  0.108338  0.108338
 39  -0.924176 -0.924176 -0.924176
 57  -0.562416 -0.144378 -0.353397
 60   1.074620  1.074620  1.074620
 64  -1.302721  0.358431 -0.472145
 71   0.033022  0.033022  0.033022
 75   1.088710  1.088710  1.088710
 78  -0.300983 -0.300983 -0.300983
           ...       ...       ...

jreback on 20 May 2014

@jreback

Thanks!

acorbe on 20 May 2014

going to close this; if you fee that this really should be implemented, pls reopen (and if you can submit a PR!)

jreback on 20 May 2014

The proposed workaround throws a FutureWarning in the current version of pandas. Should this bug be reopened?

BenDundee on 18 Oct 2017

👍5

That's indeed an unfortunate side effect of the deprecation.
I think the easiest solution is to use actual named functions instead of lambda's:

In [79]: def mean_plus_std(x): return np.mean(x) + np.std(x)

In [80]: def mean_minus_std(x): return np.mean(x) - np.std(x)

In [81]: grp.a.agg([np.mean, mean_plus_std, mean_minus_std])
Out[81]: 
          mean  mean_plus_std  mean_minus_std
b                                            
0.0   0.468446       0.696463        0.240430
2.0   0.032308       0.032308        0.032308
3.0   0.704209       0.874344        0.534075
...

Something else we have been discussing is to allow kwargs to be different functions, something like:

grp.a.agg(one=np.mean, two=lambda x : np.mean(x) + np.std(x) , three=lambda x :  np.mean(x) - np.std(x) ])

but this has not been implemented (and has some additional difficulties, as how to deal with kwargs that could be passed to the function)

jorisvandenbossche on 19 Oct 2017

👍5

I found a workaround.
def p(x):
    return (1,2)
#will return two values in one function

df.groupby(col).apply(lambda x:p(x))
#will convert the new column into two columns of different values
df[[newCol1,newCol2]] = df[df.columns.values[-1]].apply(pd.Series)

thebeancounter on 16 Nov 2017

👍1

neilaronson on 11 Jan 2018

👍8

I have found a more satisfactory workaround, specifically for the case where you want to apply multiple similar functions to the same column. You can create a function factory like so:

def ip_is(ip):
    def ipf(x):
        return (x==ip).mean()
    ipf.__name__ = 'ipf {}'.format(str(ip))
    return ipf 

ip_by_day = dfp.groupby('day').agg({'ip': [ip_is('123'), ip_is('456), ip_is('789')]})

Here I'm checking how many records per day have a certain IP. Basically you can alter the name of the function returned manually and avoid the Specification Error.

neilaronson on 7 Mar 2018

👍6 ❤1

I am doing something like this and I run into similar error

fs = [lambda x: np.percentile(x, p) for p in ptiles] + [np.sum] off_smry = gb_off['delivery_time'].agg(fs)

Here is the error I get
SpecificationError: Function names must be unique, found multiple named <lambda>

I think it should be allowed to do something like this. In practical scenarios people could be generating multiple lambda functions to apply.

chandanshikhar1 on 27 Mar 2018

I am doing something like this and I run into similar error

fs = [lambda x: np.percentile(x, p) for p in ptiles] + [np.sum] off_smry = gb_off['delivery_time'].agg(fs)

Here is the error I get
SpecificationError: Function names must be unique, found multiple named <lambda>

I think it should be allowed to do something like this. In practical scenarios people could be generating multiple lambda functions to apply.

I'm experiencing the same problem