Pandas: ENH/API: rolling_apply to pass frames to the rolled function (rather than ndarrays)

Created on 1 Oct 2013  路  10Comments  路  Source: pandas-dev/pandas

Most helpful comment

Almost 3 years and it's still an issue :'(
`

import pandas as pd
import numpy as np

def distance_sum(df):
    print df
    df['norm1']=df.ix[:,0]/df.ix[0,0]
    df['norm2']=df.ix[:,1]/df.ix[0,1]
    return np.sum(np.square(df['norm1']-df['norm2']))

df=pd.DataFrame({'a':np.array([1,2,3]),'b':np.array([10,20,30])})
df.rolling(center=False,window=2).apply(distance_sum)

`

AttributeError Traceback (most recent call last)
in ()
9
10 df=pd.DataFrame({'a':np.array([1,2,3]),'b':np.array([10,20,30])})
---> 11 df.rolling(center=False,window=2).apply(distance_sum)

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in getattr(self, name)
2358 return self[name]
2359 raise AttributeError("'%s' object has no attribute '%s'" %
-> 2360 (type(self).name, name))
2361
2362 def setattr(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'rolling'

OR


AttributeError Traceback (most recent call last)
in ()
14
15 t=pd.DataFrame({'a':a,'b':b})
---> 16 t.rolling(center=False,window=2).apply(test_distance_sum)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in _apply(self, func, name, window, center, check_minp, how, **kwargs)

/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.pyc in apply_along_axis(func1d, axis, arr, _args, *_kwargs)
89 outshape = asarray(arr.shape).take(indlist)
90 i.put(indlist, ind)
---> 91 res = func1d(arr[tuple(i.tolist())], _args, *_kwargs)
92 # if res is a number, then we have a smaller output array
93 if isscalar(res):

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in calc(x)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in f(arg, window, min_periods)

pandas/algos.pyx in pandas.algos.roll_generic (pandas/algos.c:51577)()

in test_distance_sum(df)
9 def test_distance_sum(df):
10 print df
---> 11 df['pxnorm1']=df.ix[:,0]/df.ix[0,0]
12 df['pxnorm2']=df.ix[:,1]/df.ix[0,1]
13 return np.mean(df)#np.sum(np.square(df['pxnorm1']-df['pxnorm2']))

AttributeError: 'numpy.ndarray' object has no attribute 'ix'

All 10 comments

+1

I was just trying to do similar. Would be nice if rolling_apply, expanding_apply had an option to work over the whole DataFrame. It doesn't even have to pass frames, but rather just roll over the whole 0 axis instead of one series at a time.

That sounds equivalent to the split-apply(-combine) approach of groupby, only pandas doesn't
currently provide that sort of split function.

related #4059

Just ran into the same issue.

same issue here

@jreback What's the best way to do this?

If I try and change the _apply method on _Rolling to take pandas objects rather than numpy arrays, a few of the standard functions fail (e.g. _zsqrt):

...
return _zsqrt(algos.roll_var(arg, window, minp, ddof))
TypeError: Argument 'input' has incorrect type (expected numpy.ndarray, got Series)

Could this be done in roll_generic? Or with an additional path other that the standard _apply for user-supplied functions? Neither seem that compelling

So just to have an example

In [32]: df = DataFrame({'A' : np.random.randn(5), 'B' : np.random.randint(0,10,size=5)})

In [33]: def f(x):
    print type(x)
    return x.sum()
   ....: 

In [34]: df.rolling(2).apply(f)
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
<type 'numpy.ndarray'>
Out[34]: 
          A     B
0       NaN   NaN
1 -0.414646  15.0
2  1.007150   8.0
3  1.822979   2.0
4  0.884894   4.0

The issue is that you need to pass a constructed object to algos.roll_generic (or maybe a new function) which does the windowing.

here

Is this do-able with roll_generic? It seems that requires an array:

In [28]: series=pd.Series(range(10),dtype='float64')

In [29]: roll_generic(series, win=2, minp=2, offset=0, func=lambda x: x.sum(), args=[], kwargs={})
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-3ec0f9465dad> in <module>()
----> 1 roll_generic(series, win=2, minp=2, offset=0, func=lambda x: x.sum(), args=[], kwargs={})

TypeError: Argument 'input' has incorrect type (expected numpy.ndarray, got Series)

Does that mean we need a parallel function which operates on Series?

I could imagine having a function that generated the groups - then it would actually be a groupby. But haven't thought through it enough and performance may be an issue.

no u have to change roll_generic to take an object

doing with GroupBy is a whole separate idea - I may do that but it's orthogonal (and the reason is different than this)

OK, I haven't worked with Cython before, and not sure how it handles non-numpy arrays, but I can have a go. Probably won't have immediate results.

Almost 3 years and it's still an issue :'(
`

import pandas as pd
import numpy as np

def distance_sum(df):
    print df
    df['norm1']=df.ix[:,0]/df.ix[0,0]
    df['norm2']=df.ix[:,1]/df.ix[0,1]
    return np.sum(np.square(df['norm1']-df['norm2']))

df=pd.DataFrame({'a':np.array([1,2,3]),'b':np.array([10,20,30])})
df.rolling(center=False,window=2).apply(distance_sum)

`

AttributeError Traceback (most recent call last)
in ()
9
10 df=pd.DataFrame({'a':np.array([1,2,3]),'b':np.array([10,20,30])})
---> 11 df.rolling(center=False,window=2).apply(distance_sum)

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in getattr(self, name)
2358 return self[name]
2359 raise AttributeError("'%s' object has no attribute '%s'" %
-> 2360 (type(self).name, name))
2361
2362 def setattr(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'rolling'

OR


AttributeError Traceback (most recent call last)
in ()
14
15 t=pd.DataFrame({'a':a,'b':b})
---> 16 t.rolling(center=False,window=2).apply(test_distance_sum)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in apply(self, func, args, kwargs)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in _apply(self, func, name, window, center, check_minp, how, **kwargs)

/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.pyc in apply_along_axis(func1d, axis, arr, _args, *_kwargs)
89 outshape = asarray(arr.shape).take(indlist)
90 i.put(indlist, ind)
---> 91 res = func1d(arr[tuple(i.tolist())], _args, *_kwargs)
92 # if res is a number, then we have a smaller output array
93 if isscalar(res):

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in calc(x)

/usr/local/lib/python2.7/dist-packages/pandas/core/window.pyc in f(arg, window, min_periods)

pandas/algos.pyx in pandas.algos.roll_generic (pandas/algos.c:51577)()

in test_distance_sum(df)
9 def test_distance_sum(df):
10 print df
---> 11 df['pxnorm1']=df.ix[:,0]/df.ix[0,0]
12 df['pxnorm2']=df.ix[:,1]/df.ix[0,1]
13 return np.mean(df)#np.sum(np.square(df['pxnorm1']-df['pxnorm2']))

AttributeError: 'numpy.ndarray' object has no attribute 'ix'

Was this page helpful?
0 / 5 - 0 ratings