Dask: Compute forward difference with Dask DataFrame

Created on 8 Nov 2016  路  5Comments  路  Source: dask/dask

I'm really excited about Dask. A huge THANK YOU to all the contributors :)

I'd like to make a feature request for a new method: dask.DataFrame.diff(). It would work like pandas.DataFrame.diff().

Mathematically, the operation is very simple: subtract a column vector from a copy of itself shifted by one or more rows.

I have tried implementing diff() using Dask in the following ways, none of which works (yet):

  • df - df.shift(periods=1) works in Pandas. But Dask DataFrame doesn't have a shift() method.
  • df.values[:-1] - df.values[1:] works in Pandas. But I can't see how to index into a Dask DataFrame by position.

My current best idea for implementing diff would be to wrap some custom code in dask.dataframe.rolling.wrap_rolling, as suggested in this stack overflow answer (although I haven't been able to find any documentation on how to do this). Or wrap some custom code using Dask Delayed? Any other thoughts?

Most helpful comment

It would be nice to have a higher level function around rolling that accepted user defined functions, much in the same way we do for reduction today.

All 5 comments

It would be nice to have a higher level function around rolling that accepted user defined functions, much in the same way we do for reduction today.

That would be amazing! The project I'm working on (NILMTK) requires quite a few custom 'rolling' functions. So it would be amazing to be able to supply these custom functions to Dask in a simple way, without having to reach into Dask's innards for each new function.

Looking at the current implementation now would also be a good time to minimize communication by having two separate tasks, one for slicing off the tail end of the last partition and another for applying the function and removing the head.

cc @jcrist , this seems like the sort of problem that might interest him.

Sure, I can take this on. Definitely agree that a general purpose function would be good here.

See #1769.

Was this page helpful?
0 / 5 - 0 ratings