+1
I was just trying to do similar. Would be nice if rolling_apply, expanding_apply had an option to work over the whole DataFrame. It doesn't even have to pass frames, but rather just roll over the whole 0 axis instead of one series at a time.
That sounds equivalent to the split-apply(-combine) approach of groupby, only pandas doesn't
currently provide that sort of split function.
related #4059
Just ran into the same issue.
same issue here
@jreback What's the best way to do this?
If I try and change the _apply
method on _Rolling
to take pandas objects rather than numpy arrays, a few of the standard functions fail (e.g. _zsqrt
):
...
return _zsqrt(algos.roll_var(arg, window, minp, ddof))
TypeError: Argument 'input' has incorrect type (expected numpy.ndarray, got Series)
Could this be done in roll_generic
? Or with an additional path other that the standard _apply
for user-supplied functions? Neither seem that compelling
So just to have an example
In [32]: df = DataFrame({'A' : np.random.randn(5), 'B' : np.random.randint(0,10,size=5)})
In [33]: def f(x):
print type(x)
return x.sum()
....:
In [34]: df.rolling(2).apply(f)
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
Out[34]:
A B
0 NaN NaN
1 -0.414646 15.0
2 1.007150 8.0
3 1.822979 2.0
4 0.884894 4.0
The issue is that you need to pass a constructed object to algos.roll_generic
(or maybe a new function) which does the windowing.
Is this do-able with roll_generic
? It seems that requires an array:
In [28]: series=pd.Series(range(10),dtype='float64')
In [29]: roll_generic(series, win=2, minp=2, offset=0, func=lambda x: x.sum(), args=[], kwargs={})
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-29-3ec0f9465dad> in <module>()
----> 1 roll_generic(series, win=2, minp=2, offset=0, func=lambda x: x.sum(), args=[], kwargs={})
TypeError: Argument 'input' has incorrect type (expected numpy.ndarray, got Series)
Does that mean we need a parallel function which operates on Series
?
I could imagine having a function that generated the groups - then it would actually be a groupby. But haven't thought through it enough and performance may be an issue.
no u have to change roll_generic to take an object
doing with GroupBy is a whole separate idea - I may do that but it's orthogonal (and the reason is different than this)
OK, I haven't worked with Cython before, and not sure how it handles non-numpy arrays, but I can have a go. Probably won't have immediate results.
Almost 3 years and it's still an issue :'(
`
import pandas as pd
import numpy as np
def distance_sum(df):
print df
df['norm1']=df.ix[:,0]/df.ix[0,0]
df['norm2']=df.ix[:,1]/df.ix[0,1]
return np.sum(np.square(df['norm1']-df['norm2']))
df=pd.DataFrame({'a':np.array([1,2,3]),'b':np.array([10,20,30])})
df.rolling(center=False,window=2).apply(distance_sum)
AttributeError Traceback (most recent call last)
9
10 df=pd.DataFrame({'a':np.array([1,2,3]),'b':np.array([10,20,30])})
---> 11 df.rolling(center=False,window=2).apply(distance_sum)
/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in getattr(self, name)
2358 return self[name]
2359 raise AttributeError("'%s' object has no attribute '%s'" %
-> 2360 (type(self).name, name))
2361
2362 def setattr(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'rolling'
OR
AttributeError Traceback (most recent call last)
14
15 t=pd.DataFrame({'a':a,'b':b})
---> 16 t.rolling(center=False,window=2).apply(test_distance_sum)
/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)
/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)
/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in _apply(self, func, name, window, center, check_minp, how, **kwargs)
/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.pyc in apply_along_axis(func1d, axis, arr, _args, *_kwargs)
89 outshape = asarray(arr.shape).take(indlist)
90 i.put(indlist, ind)
---> 91 res = func1d(arr[tuple(i.tolist())], _args, *_kwargs)
92 # if res is a number, then we have a smaller output array
93 if isscalar(res):
/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in calc(x)
/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in f(arg, window, min_periods)
pandas/algos.pyx in pandas.algos.roll_generic (pandas/algos.c:51577)()
9 def test_distance_sum(df):
10 print df
---> 11 df['pxnorm1']=df.ix[:,0]/df.ix[0,0]
12 df['pxnorm2']=df.ix[:,1]/df.ix[0,1]
13 return np.mean(df)#np.sum(np.square(df['pxnorm1']-df['pxnorm2']))
AttributeError: 'numpy.ndarray' object has no attribute 'ix'
Most helpful comment
Almost 3 years and it's still an issue :'(
`
`
AttributeError Traceback (most recent call last)
in ()
9
10 df=pd.DataFrame({'a':np.array([1,2,3]),'b':np.array([10,20,30])})
---> 11 df.rolling(center=False,window=2).apply(distance_sum)
/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in getattr(self, name)
2358 return self[name]
2359 raise AttributeError("'%s' object has no attribute '%s'" %
-> 2360 (type(self).name, name))
2361
2362 def setattr(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'rolling'
OR
AttributeError Traceback (most recent call last)
in ()
14
15 t=pd.DataFrame({'a':a,'b':b})
---> 16 t.rolling(center=False,window=2).apply(test_distance_sum)
/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)
/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)
/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in _apply(self, func, name, window, center, check_minp, how, **kwargs)
/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.pyc in apply_along_axis(func1d, axis, arr, _args, *_kwargs)
89 outshape = asarray(arr.shape).take(indlist)
90 i.put(indlist, ind)
---> 91 res = func1d(arr[tuple(i.tolist())], _args, *_kwargs)
92 # if res is a number, then we have a smaller output array
93 if isscalar(res):
/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in calc(x)
/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in f(arg, window, min_periods)
pandas/algos.pyx in pandas.algos.roll_generic (pandas/algos.c:51577)()
9 def test_distance_sum(df):
10 print df
---> 11 df['pxnorm1']=df.ix[:,0]/df.ix[0,0]
12 df['pxnorm2']=df.ix[:,1]/df.ix[0,1]
13 return np.mean(df)#np.sum(np.square(df['pxnorm1']-df['pxnorm2']))
AttributeError: 'numpy.ndarray' object has no attribute 'ix'