Xarray: rolling with periodic boundary conditions

Created on 22 Mar 2018  路  13Comments  路  Source: pydata/xarray

Code Sample, a copy-pastable example if possible

import numpy as np
import xarray as xr

x = np.arange(1, 366)
y = np.random.randn(365)
ds = xr.DataArray(y, dims=dict(dayofyear=x))

ds.rolling(center=True, dayofyear=31).mean()

Problem description

rolling cannot directly handle periodic boundary conditions (lon, dayofyear, ...), but could be very helpful to e.g. calculate climate indices. Also I cannot really think of an easy way to append the first elements to the end of the dataset and then calculate rolling.

Is there a way to do this? Should xarray support this feature?

This might also belong to SO...

All 13 comments

Very useful suggestion.

Also I cannot really think of an easy way to append the first elements to the end of the dataset and then calculate rolling.

We already support a different type of "rolling" with periodicity
http://xarray.pydata.org/en/latest/generated/xarray.DataArray.roll.html?highlight=roll
and it is straightforward to apply roll operations at the variable level:
https://github.com/pydata/xarray/blob/master/xarray/core/variable.py#L1007-L1026

I suspect this would not be too hard to implement.

ds.rolling(center=True, dayofyear=31).mean()

What do you mean by rolling here? As a reduce operation over a rolling window, like a moving average (rolling in xarray)? Or rolling around the end of a dataarray when shifting (roll in xarray)? Or a mix of both?

Specifically, what's the dayofyear=31 doing?

Probably a mix of both - I want to compute a moving average, but with periodic boundaries. rolling sets window_length - 1 elements to nan. However I want to calculate these like so:

running_mean_0 = xr.concat([ds[-15:], ds[:16]]).mean()
running_mean_1 = xr.concat([ds[-14:], ds[:17]]).mean()

and so on...

I think what I want is like the filter function in R with circular=True.

I found two possibilities but they are quite "hand made" and certainly not very efficient

Solution with slicing:

# take the last and first elements and append/ prepend them
first = ds[:15]
last = ds[-15:]
extended = xr.concat([last, ds, first], 'dayofyear')
# do the rolling on the extended ds and get rid of NaNs
sol1 = extended.rolling(dayofyear=31, center=True).mean().dropna('dayofyear')

Solution with roll:

roll1 = ds.roll(dayofyear=150).rolling(dayofyear=31, center=True).mean()
roll2 = ds.rolling(dayofyear=31, center=True).mean()
sol2 = xr.concat([roll1, roll2], dim='r').mean('r')

Call rolling on original and rolled dataset, and put them together again.

I think the implementation would be not so difficult by supporting more flexible fill_value option in xr.DataArrayRolling.construct method.

Maybe fill_value='periodic' would be a possible API,

da.rolling(dayofyear=31).construct('window', fill_value='periodic').mean('window')

@mathause, any interest in contributing?

Though I'm not sure you need the construct machinery.

IIUC you need to copy a window-sized amount of data from the front of the array onto the back.

You could do that with construct-like machinery, which would save a copy - though not a large copy

@maxim-lian , do you agree to add this feature?
Although the same behavior can be realized by adding head/tail values to the original array and truncate them after the computation, the periodic option would significantly simplify this.

@fujiisoup Yes for sure - I think it would be good. I think there are two salient questions:

  • Where this lives in the API: Should this be under construct? I don't think the kwarg should be called fill_value - that traditionally has a specific meaning of "the value to replace NaN with". I don't understand the periodic reference, but likely I'm missing something. Could be wrap=True, or roll=True?
  • How it's implemented - do you have a view here? Can you use the construct machinery? Or we make a new array and run rolling over it?

I don't think the kwarg should be called fill_value - that traditionally has a specific meaning of "the value to replace NaN with".

Agreed. dask.ghost has boundaries keyword, for which we can choose between periodic, reflect, and any constant.
I think this would be a good reference.
Maybe we can deprecate fill_value keyword and replace it by boundaries?
(I slightly regret that I choose fill_value keyword in construct).

How it's implemented - do you have a view here?

Only a slight modification of construct machinery realizes this (see #2011).
I think this option should be available only in construct method (not in the traditional rolling constructor) for the sake of simplicity (according to this comment).

2011 looks good - I didn't realize numpy already had pad

Agree with your other comments. Thanks as ever @fujiisoup

I was going to suggest this feature so glad others are interested.

In my use case I would like to smooth a daily climatology. My colleague uses matlab and uses https://www.mathworks.com/matlabcentral/fileexchange/52688-nan-tolerant-fast-smooth

Using the slice solution as @mathause showed above, it would look something like (using code from http://xarray.pydata.org/en/stable/examples/weather-data.html#toy-weather-data)

import numpy as np
import pandas as pd
import xarray as xr

times = pd.date_range('2000-01-01', '2010-12-31', name='time')
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 366 - 0.28))
noise = 15 * np.random.rand(annual_cycle.size)
data = 10 + (15 * annual_cycle) + noise
da = xr.DataArray(data, coords=[times], dims='time')
#da.plot()
#Check variability at one day
#da.groupby('time.dayofyear').std('time')[0]
da_clim = da.groupby('time.dayofyear').mean('time')
_da_clim = xr.concat([da_clim[-15:], da_clim, da_clim[:15]], 'dayofyear')
da_clim_smooth = _da_clim.rolling(dayofyear=31, center=True).mean().dropna('dayofyear')
#da_clim_smooth.plot()

I am coming back to @shoyer suggestion in #2011 - your idea would be to do first a pad and then a rolling operation as e.g.:

import numpy as np
import xarray as xr

x = np.arange(1, 366)
y = np.random.randn(365)
ds = xr.DataArray(y, dims=dict(dayofyear=x))

ds.pad(dayofyear=15, mode='wrap').rolling(center=True, dayofyear=31).mean()

I think you may need to do cropping afterwards, too, before taking the mean.

Was this page helpful?
0 / 5 - 0 ratings