Why is it that
df.groupby('A')[['B','C']].apply(lambda x: x.to_records())
throws: TypeError: Series.name must be a hashable type while
df.groupby('A').apply(lambda x: x[['B','C']].to_records())
works?
pls show a copy pastable example
df = pd.DataFrame({'A':{'a':1,'b':2},'B':{'a':1,'b':2},'C':{'a':1,'b':2}})
df.groupby('A')[['B','C']].apply(lambda x: x.to_records()) # => TypeError
df.groupby('A').apply(lambda x: x[['B','C']].to_records()) # => Succeeds
I had some time this morning so I looked at this. Here's what I found out.
This looks like the line where things head south.
https://github.com/pandas-dev/pandas/blob/ee9c7e9b19a1fa3c4499d10885dec72f1d651832/pandas/core/series.py#L248
The successful line below has name = None
df.groupby('A').apply(lambda x: x[['B','C']].to_records()) # => Succeeds
and the line that errors has a name = ['B', 'C']
df.groupby('A')[['B','C']].apply(lambda x: x.to_records()) # => TypeError
ps. swapping out to_records for something similar like to_dict does work
df.groupby('A')['B','C'].apply(lambda x: x.to_dict())
hmm, it maybe that the rec-array is not handled correctly.
Here's what I'm thinking
Modify the NDFrameGroupBy class method called _wrap_applied_output to deal with recarrays
Maybe something like the following which fixes this issue and all current tests in test_groupby.py still pass.
Any feedback?
git diff ..upstream/master -- pandas/core/groupby.py
diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
index 6a2f49a89..041239ed0 100644
--- a/pandas/core/groupby.py
+++ b/pandas/core/groupby.py
@@ -3977,8 +3977,6 @@ class NDFrameGroupBy(GroupBy):
return self._concat_objects(
keys, values, not_indexed_same=True,
)
- if isinstance(v, np.recarray):
- return Series(values, index=key_index)
try:
if self.axis == 0:
NO. don't specifically deal with recarrays, you can possibly handle an array separaly though (recarray is a sub-class).
This is all above my pay-grade, but I'd like to suggest that part of the fix involves adding a test which covers this call pattern. From what I'm reading here, it sounds like this behavior isn't covered.
Hi Jeff, thanks for the quick reply. Let me attempt to add some context. I was trying to follow the existing logic found here
if isinstance(v, (np.ndarray, Index, Series)):
if isinstance(v, Series):
...
if isinstance(v, np.recarray): #proposed addition
return Series(values, index=key_index)
Hi @zfrenchee probably above mine too :-)
but yes, it's typical to add tests to cover additions. Thanks for submitting this by the way.
Most helpful comment
I had some time this morning so I looked at this. Here's what I found out.
This looks like the line where things head south.
https://github.com/pandas-dev/pandas/blob/ee9c7e9b19a1fa3c4499d10885dec72f1d651832/pandas/core/series.py#L248
The successful line below has name = None
df.groupby('A').apply(lambda x: x[['B','C']].to_records()) # => Succeedsand the line that errors has a name = ['B', 'C']
df.groupby('A')[['B','C']].apply(lambda x: x.to_records()) # => TypeErrorps. swapping out
to_recordsfor something similar liketo_dictdoes workdf.groupby('A')['B','C'].apply(lambda x: x.to_dict())