Hello!
I am quite puzzled by some inconsistencies when using apply. Consider this simple example
idx=[pd.to_datetime('2012-02-01 14:00:00') ,
pd.to_datetime('2012-02-01 14:01:00'),
pd.to_datetime('2012-03-05 14:04:00'),
pd.to_datetime('2012-03-05 14:01:00'),
pd.to_datetime('2012-03-10 14:02:00'),
pd.to_datetime('2012-03-11 14:07:50')
]
test=pd.DataFrame({'value1':[1,2,3,4,5,6],
'value2':[10,20,30,40,50,60],
'groups' : ['A','A','A','B','B','B']},
index=idx)
test
Out[22]:
groups value1 value2
2012-02-01 14:00:00 A 1 10
2012-02-01 14:01:00 A 2 20
2012-03-05 14:04:00 A 3 30
2012-03-05 14:01:00 B 4 40
2012-03-10 14:02:00 B 5 50
2012-03-11 14:07:50 B 6 60
Now, this WORKS
test.groupby('groups').apply(lambda x: x.resample('1 T', label='left', closed='left').apply(
{'value1' : 'mean',
'value2' : 'mean'}))
but this FAILS
test.groupby('groups').apply(
{'value1' : 'mean',
'value2' : 'mean'})
Traceback (most recent call last):
File "<ipython-input-24-741304ecf105>", line 3, in <module>
'value2' : 'mean'})
File "C:\Users\\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 696, in apply
func = self._is_builtin_func(func)
File "C:\Users\\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\base.py", line 730, in _is_builtin_func
return self._builtin_table.get(arg, arg)
TypeError: unhashable type: 'dict'
This worked in prior versions of Pandas. What is the new syntax then? Some very useful variant of the code above I used to use was:
test.groupby('groups').apply(
{'newname1' : {'value1' : 'mean'},
'newname2' : {'value2' : 'mean'}})
to rename the new variables on the fly. Is this still possible now? Is this a bug?
Many thanks!
@jorisvandenbossche @jreback same bug with transform
test.groupby('groups').transform(
{'value1' : 'mean',
'value2' : 'mean'})
only agg works
test.groupby('groups').agg(
{'value1' : 'mean',
'value2' : 'mean'})
is this a nasty bug?
thanks again!
agg is more general that apply
In [7]: test.groupby('groups').agg(
...: {'value1' : 'mean',
...: 'value2' : 'mean'})
...:
Out[7]:
value1 value2
groups
A 2 20
B 5 50
i guess it should work
@jreback yes, thanks, that's correct this is what I am saying as well: it works with agg.
However, I do not want to aggregate, I want to use a transform. The documentation https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transform.html says we should be able to feed a dict of column-functions..
What do you think? Thanks again!
if you want to submit a PR to fix it, by all means. (your example didn not indicate transform)
Has this been fixed yet? I think transform after groupby is a very useful feature to have.
Still open. Please let us know if you want to start a PR to fix this.
Is there any reason the documentation says that transform takes a dictionary, when it doesn't?
Transform also doesn't take a list, as the documentation says it does. To use the above example:
test.groupby('groups').value1.transform(['cumsum', 'cummax'])
...returns "TypeError: unhashable type: 'list'"
Would like to see this fixed too as an aggregate variant of transform would be very handy
I'm also confused by the documentation. Isn't there an easy way to transform just one column of a grouped DataFrame?
In pandas version 0.23.4, after group by a dataframe, it can not pass transform method a list of functions and can not rename the field name of a transformed dataframe using a nested dictionary, but it is very useful !!
@zeromh The referenced documentation where transform accepts lists and dictionaries is for the dataframe method of transform, not its groupby cousin version. The doc string for the groupby version correctly states that it accepts a function:
Signature: gb.transform(func, *args, **kwargs)
Docstring:
Call function producing a like-indexed DataFrame on each group and
return a DataFrame having the same indexes as the original object
filled with the transformed values
Parameters
----------
f : function
Function to apply to each subframe
Can this then be taken as a feature request, so that the same kind of apply/transform usage be used on both DataFrame and GroupBy objects?
Can this then be taken as a feature request, so that the same kind of
apply/transformusage be used on bothDataFrameandGroupByobjects?
Vote it! It is very useful
@colin1alexander
Ah, my bad. Thanks for the clarification.
@jreback @TomAugspurger
I'm interested in tackling this, my understanding is that NDFrameGroupBy.transform() and SeriesGroupBy.transform() would need to be rewritten to accept a dict with column names as keys and functions as values, similar to NDFrameGroupBy.aggregate(). It seems like usingSeriesGroupBy._aggregate_multiple_funcs()` as a guideline for writing a multiple func transform method might be a good idea?
Yeah, that sounds about right. @WillAyd may have better thoughts on how to start.
Keep in mind, doing this for .apply may be difficult / impossible because it doesn't place any restrictions on the output shape.
With .agg and .transform we at least know what the return shape should be, so we can know ahead of time what the output shape of a dict of functions will be.
Reading through the comments here I think there have been quite a few things talked about, but just so we are on the same page I assume we are explicitly talking about changing transform to allow a dict where the key is the column name and the value(s) are the functions to be applied.
Not objected to it though I think it makes more sense if we updated transform to accept a sequence first, as I don't think users will expect the values of a dict to be limited to just one function. @brianhuey if you wanted to try your hand at that would make sense to open as a separate PR first, get that one through and then come back to this
guys, as the original OP and lifelong pandas supporter, let me reiterate that it would be very useful to have apply, transform, and agg be able to work like this:
test.groupby('groups').transform(
{'value1' : {'value1_mean' : 'mean', 'value1_max' : 'max'},
'value2' : {'value2_mean' : 'mean'}})
This used to work back in the days with the good old agg. It does not anymore.
This is very unfortunate because in one go I was able to use multiple functions on a single column (here mean and max on value1) as well as rename them on the fly (so that these variables have the names I have chosen and the dataframe does not have some weird multicolumn index)
Do you think that syntax could be used in apply, transform and agg? This syntax was just a great idea.
Thanks!!
We have a separate issue for an alternative to the deprecated dict of dicts in agg. Hoping to have that for 0.24.
From: Olaf notifications@github.com
Sent: Saturday, October 13, 2018 10:43:47 PM
To: pandas-dev/pandas
Cc: Tom Augspurger; Mention
Subject: Re: [pandas-dev/pandas] TypeError: unhashable type: 'dict' when using apply/transform? (#17309)
guys, as the original OP and lifelong pandas supporter, let me reiterate that it would be very useful to have apply, transform, and agg be able to work like this:
test.groupby('groups').transform(
{'value1' : {'value1_mean' : 'mean', 'value1_max' : 'max'},
'value2' : {'value2_mean' : 'mean'}})
This used to work back in the days with the good old agg. It does not anymore.
This is very unfortunate because in one go I was able to use multiple functions on a single column (here mean and max on value1) as well as rename them on the fly (so that these variables have the names I have chosen and the dataframe does not have some weird multicolumn index)
Do you think that syntax could be used in apply, transform and agg? This syntax was just a great idea.
Thanks!!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/17309#issuecomment-429594051, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABQHIiMk5mlzkrV61ONerWwlPkNE6EN3ks5ukrLzgaJpZM4O-ueX.
@TomAugspurger thanks but we re talking about extending that to apply, transform and agg right?
@WillAyd
Just so I'm clear, you're suggesting something like:
test.groupby('groups').transform({'value1': [np.mean, max], 'value2': max}) which should return something like:
value1 value2
mean max max
2012-02-01 14:00:00 2 3 30
2012-02-01 14:01:00 2 3 30
2012-03-05 14:04:00 2 3 30
2012-03-05 14:01:00 5 6 60
2012-03-10 14:02:00 5 6 60
2012-03-11 14:07:50 5 6 60
My point is that it would make more sense to make sure this works:
test.groupby('groups').transform([np.mean, max])
Before attempting:
test.groupby('groups').transform({'value1': [np.mean, max])
Because the mechanisms to ensure that the list of functions are acceptable will probably be "reused" when it comes time to accepting a value from a dictionary which is a list
Somewhat of a side note but the hierarchical column structure of the result is going to be entangled somewhat in the https://github.com/pandas-dev/pandas/issues/18366#issuecomment-425212844. I don't believe that should be a blocker but just a consideration point for devs
Hi everyone,
I just stumbled upon the same issue. It would be very important imo to cover this in the documentation. At least I have been very confused by it, since the only entry in the docs regarding transform clearly says that lists and dicts of functions can be passed as an argument. It was not clear to me that the same syntax does not apply to grouped objects.
I just stumbled upon this and after checking the docs at padas 0.24.2 DataFrame.transform I see that it still says that dict is supported as func value. I'm guessing from this discussion that it's because the DataFrame.transform does accept it but the GroupBy.transform does not. I't very confusing, is there any quick fix for this (documentation issue).
Also, is there any advance on getting the desired feature into a next release? I'm been using pandas for a while now but never actually attempted to contribute. I can try to implement this with a little guidance if someone is willing to help me out.
@elpablete you linked to DataFrame.transform. That would be a different issue. This is about DataFrameGroupBy.transform.
@TomAugspurger I cannot find the docs for "DataFrameGroupBy.transform". I found pandas.core.groupby.GroupBy.transform which I would think are the same, but still, those are empty and thus, one would be inclined to think they have the same interface as pandas.DataFrame.transform.
That's my point when I say it's very confusing.
maybe could provide a more helpful error message (with link to groupby.transform/apply docs) and maybe raise NotImplementedError in the short term
Most helpful comment
@WillAyd
Just so I'm clear, you're suggesting something like:
test.groupby('groups').transform({'value1': [np.mean, max], 'value2': max})which should return something like: