Xarray: Improving documentation on `apply_ufunc`

Created on 13 Mar 2019  路  4Comments  路  Source: pydata/xarray

This is just a suggestion to improve the documentation on apply_ufunc. The way I see it, this is one of the most powerful functions that xarray has to offer but (IMHO) the documentation is really small and with only a few examples.

From personal experience, every time I have to use it I get confused and it takes me a long time to figure out what important keywords like input_core_dims and output_core_dims actually do. After talking to some colleagues of mine I found that they share the same opinion so it appears that I'm not the only one.

PS: I honestly still don't quite understand what most (if not all) of the keywords do, so I'm not the man for the job.

__EDIT__

An example of something that could be improved upon is this function taken from the documentation:

def mean(obj, dim):
    # note: apply always moves core dimensions to the end
    return apply_ufunc(np.mean, obj,
                       input_core_dims=[[dim]],
                       kwargs={'axis': -1})

It's really easy to understand, but if I want to use it with more than one axis it doesn't work:

a=xr.DataArray(np.random.randn(3,3,3), dims=("x", "y", "z"))
def mean(obj, dims):
    # note: apply always moves core dimensions to the end
    return xr.apply_ufunc(np.mean, obj,
                       input_core_dims=[dims],
                       kwargs={'axis': -1})
mean(a, "x") # works
mean(a, ("x", "y")) # returns ValueError: applied function returned data with unexpected number of dimensions: 2 vs 1, for dimensions ('z',)

And I have no idea how to make the second, more general example work.

documentation help wanted

Most helpful comment

I agree, this is a powerful but complex function. Probably the best approach is a longer tutorial (e.g., on a dedicated docs page), including even more examples.

Contributions would be very welcome here!

All 4 comments

I agree, this is a powerful but complex function. Probably the best approach is a longer tutorial (e.g., on a dedicated docs page), including even more examples.

Contributions would be very welcome here!

I have ended up using apply_ufunc at several occasions and have developed a love/hate relationship with it. Often it turned out to be the simplest and most powerful option ... once I figured how to use it.

So thumbs up for an improved documentation.

Undertaking this task seems like a daunting one to me however, mostly because there are many different ways of using apply_ufunc I am not familiar with.
Maybe it's the case for other users as well ...?

If this is the case, shouldn't we 1/ gather clean versions of our examples in a temporary place, 2/ sort these examples, and 3/ consider pushing it as a doc ?

I'd be interested in contributing an example on how to apply a function to each image in a time series within a DataArray, but I can't get my function to be applied. Details are in https://stackoverflow.com/questions/57419541/how-to-calculate-histogram-bins-for-each-image-in-an-xarray-dataarray-time-serie

Maybe we could include apply_ufunc examples on this issue or another github issue?

Ryan Abernathey gave a helpful answer for how to apply a pixel wise function using dask and apply_ufunc: https://stackoverflow.com/questions/57419541/how-to-use-apply-ufunc-with-numpy-digitize-for-each-image-along-time-dimension-o/57513184#57513184

I think the docs could improve on showing how to use apply_ufunc if we have a function that needs to be applied image-wise, like an image filter or segmentation, if we are chunking by time. Or, if the function needs to be applied window-wise, in which case the chunks are spatial (maybe DataArray.rolling and DataArray.reduce solve this case, but DataArray.reduce lacks an example).

Having examples that speak to these 2 specific use cases would, I think, help newcomers (like myself) that are coming from any domain that works with 2D ('x', 'y') or 3D ('x', 'y', 'time') arrays.

Currently the two examples in the docs show how to apply_ufunc with a 1D array
http://xarray.pydata.org/en/stable/computation.html#comput-wrapping-custom

And two 2D arrays ('place', 'time')
http://xarray.pydata.org/en/stable/dask.html#automatic-parallelization

Some other comments on my, and possibly others', points of confusion.

  1. I'm not sure what a gufunc is, and if this is different than a ufunc (see the spearman_correlation function)
  2. After rereading both pages and numpy docs to understand universal functions, I have some intuition about what input_core_dims does, but I still don't have a great enough understanding to know how to use apply_ufunc to operate across 3D arrays that are chunked by time or space.
  3. The api reference for apply_ufunc renders such that some arg names have no whitespace between the arg type. http://xarray.pydata.org/en/stable/generated/xarray.apply_ufunc.html
    I can submit a PR to take care of this later.
  4. apply_ufunc seems to have the flexibility to support operations that output DataArrays of reduced shape, with arguments named like output_core_dims and exclude_dims. However, I tried to use it with a custom function that takes as input a single 3D image ('x', 'y', 'band') in my time series and returns a tuple of an intercept and slope computed from regressing the blue and red bands of that image. I tried various arguments but kept running into errors. I think an example that shows how to use apply_ufunc where the output has a different, reduced shape than any of the inputs would be valuable.
Was this page helpful?
0 / 5 - 0 ratings