Xarray: Shape preserving `diff` via new keywords

Created on 27 Mar 2017  Â·  10Comments  Â·  Source: pydata/xarray

Currently, an operation such as ds.diff('x') will result in a smaller size dimension, e.g.,

In [1]: import xarray as xr

In [2]: ds = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3]})

In [3]: ds
Out[3]: 
<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 1 2 3
Data variables:
    foo      (x) int64 1 2 3

In [4]: ds.diff('x')
Out[4]: 
<xarray.Dataset>
Dimensions:  (x: 2)
Coordinates:
  * x        (x) int64 2 3
Data variables:
    foo      (x) int64 1 1

However, there are cases where the same size would be beneficial to keep so that you would get

In [1]: import xarray as xr

In [2]: ds = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3]})

In [3]: ds.diff('x', preserve_shape=True, empty_value=0)
Out[3]: 
<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 1 2 3
Data variables:
    foo      (x) int64 0 1 1

Is there interest in addition of a preserve_shape=True keyword such that it results in this shape-preserving behavior? I'm proposing you could use this with label='upper' and label='lower'.

empty_value could be a value or empty_index could be an index for the fill value. If empty_value=None and empty_index=None, it would produce a nan.

The reason I'm asking the community is because this is at least the second time I've encountered an application where this behavior would be helpful, e.g., computing ocean layer thicknesses from bottom depths. A previous application was computation of a time step from time slice output and the desire to use this product in an approximated integral, e.g.,

y*diff(t, label='lower', preserve_shape=True)

where y and t are both of size n, which is effectively a left-sided Riemann sum.

Most helpful comment

As I mentioned in #1288, I think basic functionality like integrate and gradient is totally within appropriate scope for xarray.

I recall now that people have requested similar functionality for numpy.diff: https://github.com/numpy/numpy/issues/8132. It would be nice to resolve this upstream in NumPy first (e.g., with https://github.com/numpy/numpy/pull/8206), and then simply copy the API design in xarray.

All 10 comments

Would it fill the same need to add a wrapped version of np.gradient, which already works similarly to diff but preserves input shape?

FYI @pwolfram, if you are interested in more "grid-aware" finite differencing, you might have a look at xgcm. We are working on it again and have implemented some basic difference and interpolation operators:
http://xgcm.readthedocs.io/en/latest/grids.html#grid-objects

@shoyer, I'm not sure we want to wrap np.gradient. It seems like other approaches like @rabernat 's xgcm would be more appropriate as a superset of xarray.

Fundamentally, I want something that is like an inverse of cumsum and the proposed change could be used in that context. It is just super inconvenient to do array resizing following the diff of a time vector to get timesteps, but maybe this use case is too niche to be useful for the community.

It is just super inconvenient to do array resizing following the diff of a time vector to get timesteps, but maybe this use case is too niche to be useful for the community.

Maybe for the xarray community, but not the xgcm community ;) We definitely want round-trip-compatibly diff and cumsum operations (see xgcm/xgcm#49, which is very close to implemented)

@rabernat, do you think that the proposed keyword additions should be included in xarray or not? I personally would like to see them in xarray but don't know if it is just me or not. If you think they should be in xarray, are you ok with the api above?

I'm not sure we want to wrap np.gradient. It seems like other approaches like @rabernat 's xgcm would be more appropriate as a superset of xarray.

Certainly grid-aware differencing and integral operators are preferred when the grid information is known and available, but I'm not sure that therefore a more naive version akin to np.gradient would not be useful. It's quite likely that there are xarray users (e.g. in non climate/weather/ocean-related fields) wherein a 'c' grid is meaningless to them, yet they still would appreciate being able to easily compute derivatives via xarray operations.

But then we're back to the valid questions raised before re: what is the appropriate scope of xarray functionality, c.f. https://github.com/pydata/xarray/issues/1288#issuecomment-283062107 and subsequent in that thread

As I mentioned in #1288, I think basic functionality like integrate and gradient is totally within appropriate scope for xarray.

I recall now that people have requested similar functionality for numpy.diff: https://github.com/numpy/numpy/issues/8132. It would be nice to resolve this upstream in NumPy first (e.g., with https://github.com/numpy/numpy/pull/8206), and then simply copy the API design in xarray.

@rabernat, do you think that the proposed keyword additions should be included in xarray or not?

If you are going to have diff return a same-length array, then the keywords should allow the user to specify the boundary condition, e.g. extend, reflect, periodic, blank, etc. I agree this would be useful to have in xarray.

I also agree we should copy numpy wherever possible.

I'm not sure where we stand on this issue, but I think since numpy.gradient already exists, it makes more sense to wrap that function for the sake of simplicity instead of making xr.diff differ from the original premise of np.diff.

On the same topic, it bothers me that xr.diff only accepts "upper" and "lower" arguments for the label. The most obvious (and useful) value in my opinion would be "middle", which would correspond to a centered finite differences. Is there any special reason why that option isn't there?

I agree that we should have a "middle" or "centered" option for xarray.diff.
On Mon, Sep 3, 2018 at 9:14 AM T. Chor notifications@github.com wrote:

I'm not sure where we stand on this issue, but I think since
numpy.gradient already exists, it makes more sense to wrap that function
for the sake of simplicity instead of making xr.diff differ from the
original premise of np.diff.

On the same topic, it bothers me that xr.diff only accepts "upper" and
"lower" arguments for the label. The most obvious (and useful) value in
my opinion would be "middle", which would correspond to a centered finite
differences. Is there any special reason why that option isn't there?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pydata/xarray/issues/1332#issuecomment-418154705, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABKS1pWdcdpNtLK18J8v6z8AimuBryT_ks5uXVV3gaJpZM4Mq3x-
.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

phausamann picture phausamann  Â·  3Comments

benbovy picture benbovy  Â·  3Comments

duncanwp picture duncanwp  Â·  4Comments

tfurf picture tfurf  Â·  4Comments

Zac-HD picture Zac-HD  Â·  3Comments