Pandas: TypeError: unhashable type: 'dict' when using apply/transform?

Created on 22 Aug 2017 · 29Comments · Source: pandas-dev/pandas

Hello!

I am quite puzzled by some inconsistencies when using apply. Consider this simple example

idx=[pd.to_datetime('2012-02-01 14:00:00') , 
     pd.to_datetime('2012-02-01 14:01:00'),
     pd.to_datetime('2012-03-05 14:04:00'),
     pd.to_datetime('2012-03-05 14:01:00'),
     pd.to_datetime('2012-03-10 14:02:00'),
     pd.to_datetime('2012-03-11 14:07:50')
     ]

test=pd.DataFrame({'value1':[1,2,3,4,5,6],
                   'value2':[10,20,30,40,50,60],
                   'groups' : ['A','A','A','B','B','B']},
    index=idx)

test
Out[22]: 
                    groups  value1  value2
2012-02-01 14:00:00      A       1      10
2012-02-01 14:01:00      A       2      20
2012-03-05 14:04:00      A       3      30
2012-03-05 14:01:00      B       4      40
2012-03-10 14:02:00      B       5      50
2012-03-11 14:07:50      B       6      60

Now, this WORKS

test.groupby('groups').apply(lambda x: x.resample('1 T', label='left', closed='left').apply(
        {'value1' : 'mean',
         'value2' : 'mean'}))

but this FAILS

test.groupby('groups').apply(
        {'value1' : 'mean',
         'value2' : 'mean'})

Traceback (most recent call last):

  File "<ipython-input-24-741304ecf105>", line 3, in <module>
    'value2' : 'mean'})

  File "C:\Users\\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 696, in apply
    func = self._is_builtin_func(func)

  File "C:\Users\\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\base.py", line 730, in _is_builtin_func
    return self._builtin_table.get(arg, arg)

TypeError: unhashable type: 'dict'

This worked in prior versions of Pandas. What is the new syntax then? Some very useful variant of the code above I used to use was:

test.groupby('groups').apply(
        {'newname1' : {'value1' : 'mean'},
         'newname2' : {'value2' : 'mean'}})

to rename the new variables on the fly. Is this still possible now? Is this a bug?

Many thanks!

Enhancement Error Reporting Groupby

Source

randomgambit

👍2

Most helpful comment

@WillAyd
Just so I'm clear, you're suggesting something like:
test.groupby('groups').transform({'value1': [np.mean, max], 'value2': max}) which should return something like:

                    value1     value2    
                      mean max   max
2012-02-01 14:00:00      2   3    30
2012-02-01 14:01:00      2   3    30
2012-03-05 14:04:00      2   3    30
2012-03-05 14:01:00      5   6    60
2012-03-10 14:02:00      5   6    60
2012-03-11 14:07:50      5   6    60

brianhuey on 15 Oct 2018

👍2

All 29 comments

@jorisvandenbossche @jreback same bug with transform

test.groupby('groups').transform(
        {'value1' : 'mean',
         'value2' : 'mean'})

only agg works

test.groupby('groups').agg(
        {'value1' : 'mean',
         'value2' : 'mean'})

is this a nasty bug?
thanks again!

randomgambit on 22 Aug 2017

agg is more general that apply

In [7]: test.groupby('groups').agg(
   ...:         {'value1' : 'mean',
   ...:          'value2' : 'mean'})
   ...: 
Out[7]: 
        value1  value2
groups                
A            2      20
B            5      50

i guess it should work

jreback on 22 Aug 2017

@jreback yes, thanks, that's correct this is what I am saying as well: it works with agg.

However, I do not want to aggregate, I want to use a transform. The documentation https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transform.html says we should be able to feed a dict of column-functions..

What do you think? Thanks again!

randomgambit on 22 Aug 2017

if you want to submit a PR to fix it, by all means. (your example didn not indicate transform)

jreback on 24 Aug 2017

Has this been fixed yet? I think transform after groupby is a very useful feature to have.

znwang25 on 26 Mar 2018

Still open. Please let us know if you want to start a PR to fix this.

TomAugspurger on 27 Mar 2018

Is there any reason the documentation says that transform takes a dictionary, when it doesn't?

zeromh on 7 Jun 2018

😕2

Transform also doesn't take a list, as the documentation says it does. To use the above example:

test.groupby('groups').value1.transform(['cumsum', 'cummax'])

...returns "TypeError: unhashable type: 'list'"

zeromh on 7 Jun 2018

Would like to see this fixed too as an aggregate variant of transform would be very handy

xx396 on 4 Aug 2018

I'm also confused by the documentation. Isn't there an easy way to transform just one column of a grouped DataFrame?

gsmafra on 8 Aug 2018

In pandas version 0.23.4, after group by a dataframe, it can not pass transform method a list of functions and can not rename the field name of a transformed dataframe using a nested dictionary, but it is very useful !!

Alxe1 on 18 Sep 2018

@zeromh The referenced documentation where transform accepts lists and dictionaries is for the dataframe method of transform, not its groupby cousin version. The doc string for the groupby version correctly states that it accepts a function:

Signature: gb.transform(func, *args, **kwargs)
Docstring:
Call function producing a like-indexed DataFrame on each group and
return a DataFrame having the same indexes as the original object
filled with the transformed values

Parameters
----------
f : function
    Function to apply to each subframe

colinalexander on 18 Sep 2018

Can this then be taken as a feature request, so that the same kind of apply/transform usage be used on both DataFrame and GroupBy objects?

sainathadapa on 18 Sep 2018

Can this then be taken as a feature request, so that the same kind of apply/transform usage be used on both DataFrame and GroupBy objects?

Vote it! It is very useful

Alxe1 on 18 Sep 2018

@colin1alexander
Ah, my bad. Thanks for the clarification.

zeromh on 19 Sep 2018

@jreback @TomAugspurger
I'm interested in tackling this, my understanding is that NDFrameGroupBy.transform() and SeriesGroupBy.transform() would need to be rewritten to accept a dict with column names as keys and functions as values, similar to NDFrameGroupBy.aggregate(). It seems like usingSeriesGroupBy._aggregate_multiple_funcs()` as a guideline for writing a multiple func transform method might be a good idea?

brianhuey on 13 Oct 2018

Yeah, that sounds about right. @WillAyd may have better thoughts on how to start.

Keep in mind, doing this for .apply may be difficult / impossible because it doesn't place any restrictions on the output shape.

With .agg and .transform we at least know what the return shape should be, so we can know ahead of time what the output shape of a dict of functions will be.

TomAugspurger on 13 Oct 2018

Reading through the comments here I think there have been quite a few things talked about, but just so we are on the same page I assume we are explicitly talking about changing transform to allow a dict where the key is the column name and the value(s) are the functions to be applied.

Not objected to it though I think it makes more sense if we updated transform to accept a sequence first, as I don't think users will expect the values of a dict to be limited to just one function. @brianhuey if you wanted to try your hand at that would make sense to open as a separate PR first, get that one through and then come back to this

WillAyd on 14 Oct 2018

guys, as the original OP and lifelong pandas supporter, let me reiterate that it would be very useful to have apply, transform, and agg be able to work like this:

test.groupby('groups').transform(
        {'value1' : {'value1_mean' : 'mean', 'value1_max' : 'max'},
         'value2' : {'value2_mean' : 'mean'}})

This used to work back in the days with the good old agg. It does not anymore.

This is very unfortunate because in one go I was able to use multiple functions on a single column (here mean and max on value1) as well as rename them on the fly (so that these variables have the names I have chosen and the dataframe does not have some weird multicolumn index)

Do you think that syntax could be used in apply, transform and agg? This syntax was just a great idea.

Thanks!!

randomgambit on 14 Oct 2018

We have a separate issue for an alternative to the deprecated dict of dicts in agg. Hoping to have that for 0.24.

From: Olaf notifications@github.com
Sent: Saturday, October 13, 2018 10:43:47 PM
To: pandas-dev/pandas
Cc: Tom Augspurger; Mention
Subject: Re: [pandas-dev/pandas] TypeError: unhashable type: 'dict' when using apply/transform? (#17309)

guys, as the original OP and lifelong pandas supporter, let me reiterate that it would be very useful to have apply, transform, and agg be able to work like this:

test.groupby('groups').transform(
{'value1' : {'value1_mean' : 'mean', 'value1_max' : 'max'},
'value2' : {'value2_mean' : 'mean'}})

This used to work back in the days with the good old agg. It does not anymore.

This is very unfortunate because in one go I was able to use multiple functions on a single column (here mean and max on value1) as well as rename them on the fly (so that these variables have the names I have chosen and the dataframe does not have some weird multicolumn index)

Do you think that syntax could be used in apply, transform and agg? This syntax was just a great idea.

Thanks!!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/17309#issuecomment-429594051, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABQHIiMk5mlzkrV61ONerWwlPkNE6EN3ks5ukrLzgaJpZM4O-ueX.

TomAugspurger on 14 Oct 2018

@TomAugspurger thanks but we re talking about extending that to apply, transform and agg right?

randomgambit on 14 Oct 2018

@WillAyd
Just so I'm clear, you're suggesting something like:
test.groupby('groups').transform({'value1': [np.mean, max], 'value2': max}) which should return something like:

                    value1     value2    
                      mean max   max
2012-02-01 14:00:00      2   3    30
2012-02-01 14:01:00      2   3    30
2012-03-05 14:04:00      2   3    30
2012-03-05 14:01:00      5   6    60
2012-03-10 14:02:00      5   6    60
2012-03-11 14:07:50      5   6    60

brianhuey on 15 Oct 2018

👍2

My point is that it would make more sense to make sure this works:

test.groupby('groups').transform([np.mean, max])

Before attempting:

test.groupby('groups').transform({'value1': [np.mean, max])

Because the mechanisms to ensure that the list of functions are acceptable will probably be "reused" when it comes time to accepting a value from a dictionary which is a list

Somewhat of a side note but the hierarchical column structure of the result is going to be entangled somewhat in the https://github.com/pandas-dev/pandas/issues/18366#issuecomment-425212844. I don't believe that should be a blocker but just a consideration point for devs

WillAyd on 15 Oct 2018

Hi everyone,
I just stumbled upon the same issue. It would be very important imo to cover this in the documentation. At least I have been very confused by it, since the only entry in the docs regarding transform clearly says that lists and dicts of functions can be passed as an argument. It was not clear to me that the same syntax does not apply to grouped objects.

FelixAntonSchneider on 4 Dec 2018

I just stumbled upon this and after checking the docs at padas 0.24.2 DataFrame.transform I see that it still says that dict is supported as func value. I'm guessing from this discussion that it's because the DataFrame.transform does accept it but the GroupBy.transform does not. I't very confusing, is there any quick fix for this (documentation issue).

elpablete on 28 Jun 2019

Also, is there any advance on getting the desired feature into a next release? I'm been using pandas for a while now but never actually attempted to contribute. I can try to implement this with a little guidance if someone is willing to help me out.

elpablete on 28 Jun 2019

@elpablete you linked to DataFrame.transform. That would be a different issue. This is about DataFrameGroupBy.transform.

TomAugspurger on 8 Jul 2019

@TomAugspurger I cannot find the docs for "DataFrameGroupBy.transform". I found pandas.core.groupby.GroupBy.transform which I would think are the same, but still, those are empty and thus, one would be inclined to think they have the same interface as pandas.DataFrame.transform.

That's my point when I say it's very confusing.

elpablete on 11 Jul 2019

maybe could provide a more helpful error message (with link to groupby.transform/apply docs) and maybe raise NotImplementedError in the short term