Xarray: rolling with periodic boundary conditions

Created on 22 Mar 2018 · 13Comments · Source: pydata/xarray

Code Sample, a copy-pastable example if possible

import numpy as np
import xarray as xr

x = np.arange(1, 366)
y = np.random.randn(365)
ds = xr.DataArray(y, dims=dict(dayofyear=x))

ds.rolling(center=True, dayofyear=31).mean()

Problem description

rolling cannot directly handle periodic boundary conditions (lon, dayofyear, ...), but could be very helpful to e.g. calculate climate indices. Also I cannot really think of an easy way to append the first elements to the end of the dataset and then calculate rolling.

Is there a way to do this? Should xarray support this feature?

This might also belong to SO...

Source

mathause

👍2

All 13 comments

Very useful suggestion.

Also I cannot really think of an easy way to append the first elements to the end of the dataset and then calculate rolling.

We already support a different type of "rolling" with periodicity
http://xarray.pydata.org/en/latest/generated/xarray.DataArray.roll.html?highlight=roll
and it is straightforward to apply roll operations at the variable level:
https://github.com/pydata/xarray/blob/master/xarray/core/variable.py#L1007-L1026

I suspect this would not be too hard to implement.

rabernat on 22 Mar 2018

ds.rolling(center=True, dayofyear=31).mean()

What do you mean by rolling here? As a reduce operation over a rolling window, like a moving average (rolling in xarray)? Or rolling around the end of a dataarray when shifting (roll in xarray)? Or a mix of both?

Specifically, what's the dayofyear=31 doing?

max-sixty on 22 Mar 2018

Probably a mix of both - I want to compute a moving average, but with periodic boundaries. rolling sets window_length - 1 elements to nan. However I want to calculate these like so:

running_mean_0 = xr.concat([ds[-15:], ds[:16]]).mean()
running_mean_1 = xr.concat([ds[-14:], ds[:17]]).mean()

and so on...

mathause on 22 Mar 2018

I think what I want is like the filter function in R with circular=True.

I found two possibilities but they are quite "hand made" and certainly not very efficient

Solution with slicing:

# take the last and first elements and append/ prepend them
first = ds[:15]
last = ds[-15:]
extended = xr.concat([last, ds, first], 'dayofyear')
# do the rolling on the extended ds and get rid of NaNs
sol1 = extended.rolling(dayofyear=31, center=True).mean().dropna('dayofyear')

Solution with roll:

roll1 = ds.roll(dayofyear=150).rolling(dayofyear=31, center=True).mean()
roll2 = ds.rolling(dayofyear=31, center=True).mean()
sol2 = xr.concat([roll1, roll2], dim='r').mean('r')

Call rolling on original and rolled dataset, and put them together again.

mathause on 22 Mar 2018

I think the implementation would be not so difficult by supporting more flexible fill_value option in xr.DataArrayRolling.construct method.

Maybe fill_value='periodic' would be a possible API,

da.rolling(dayofyear=31).construct('window', fill_value='periodic').mean('window')

@mathause, any interest in contributing?

fujiisoup on 22 Mar 2018

Though I'm not sure you need the construct machinery.

IIUC you need to copy a window-sized amount of data from the front of the array onto the back.

You could do that with construct-like machinery, which would save a copy - though not a large copy

max-sixty on 22 Mar 2018

👍1

@maxim-lian , do you agree to add this feature?
Although the same behavior can be realized by adding head/tail values to the original array and truncate them after the computation, the periodic option would significantly simplify this.

fujiisoup on 23 Mar 2018

@fujiisoup Yes for sure - I think it would be good. I think there are two salient questions:

Where this lives in the API: Should this be under construct? I don't think the kwarg should be called fill_value - that traditionally has a specific meaning of "the value to replace NaN with". I don't understand the periodic reference, but likely I'm missing something. Could be wrap=True, or roll=True?
How it's implemented - do you have a view here? Can you use the construct machinery? Or we make a new array and run rolling over it?

max-sixty on 23 Mar 2018

I don't think the kwarg should be called fill_value - that traditionally has a specific meaning of "the value to replace NaN with".

Agreed. dask.ghost has boundaries keyword, for which we can choose between periodic, reflect, and any constant.
I think this would be a good reference.
Maybe we can deprecate fill_value keyword and replace it by boundaries?
(I slightly regret that I choose fill_value keyword in construct).

How it's implemented - do you have a view here?

Only a slight modification of construct machinery realizes this (see #2011).
I think this option should be available only in construct method (not in the traditional rolling constructor) for the sake of simplicity (according to this comment).

fujiisoup on 24 Mar 2018

2011 looks good - I didn't realize numpy already had `pad`

Agree with your other comments. Thanks as ever @fujiisoup

max-sixty on 24 Mar 2018

👍1

I was going to suggest this feature so glad others are interested.

In my use case I would like to smooth a daily climatology. My colleague uses matlab and uses https://www.mathworks.com/matlabcentral/fileexchange/52688-nan-tolerant-fast-smooth

Using the slice solution as @mathause showed above, it would look something like (using code from http://xarray.pydata.org/en/stable/examples/weather-data.html#toy-weather-data)

import numpy as np
import pandas as pd
import xarray as xr

times = pd.date_range('2000-01-01', '2010-12-31', name='time')
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 366 - 0.28))
noise = 15 * np.random.rand(annual_cycle.size)
data = 10 + (15 * annual_cycle) + noise
da = xr.DataArray(data, coords=[times], dims='time')
#da.plot()
#Check variability at one day
#da.groupby('time.dayofyear').std('time')[0]
da_clim = da.groupby('time.dayofyear').mean('time')
_da_clim = xr.concat([da_clim[-15:], da_clim, da_clim[:15]], 'dayofyear')
da_clim_smooth = _da_clim.rolling(dayofyear=31, center=True).mean().dropna('dayofyear')
#da_clim_smooth.plot()

raybellwaves on 25 May 2018

I am coming back to @shoyer suggestion in #2011 - your idea would be to do first a pad and then a rolling operation as e.g.:

import numpy as np
import xarray as xr

x = np.arange(1, 366)
y = np.random.randn(365)
ds = xr.DataArray(y, dims=dict(dayofyear=x))

ds.pad(dayofyear=15, mode='wrap').rolling(center=True, dayofyear=31).mean()

mathause on 6 Jun 2019

👍1

I think you may need to do cropping afterwards, too, before taking the mean.