import numpy as np
import xarray as xr
x = np.arange(1, 366)
y = np.random.randn(365)
ds = xr.DataArray(y, dims=dict(dayofyear=x))
ds.rolling(center=True, dayofyear=31).mean()
rolling cannot directly handle periodic boundary conditions (lon, dayofyear, ...), but could be very helpful to e.g. calculate climate indices. Also I cannot really think of an easy way to append the first elements to the end of the dataset and then calculate rolling.
Is there a way to do this? Should xarray support this feature?
This might also belong to SO...
Very useful suggestion.
Also I cannot really think of an easy way to append the first elements to the end of the dataset and then calculate rolling.
We already support a different type of "rolling" with periodicity
http://xarray.pydata.org/en/latest/generated/xarray.DataArray.roll.html?highlight=roll
and it is straightforward to apply roll operations at the variable level:
https://github.com/pydata/xarray/blob/master/xarray/core/variable.py#L1007-L1026
I suspect this would not be too hard to implement.
ds.rolling(center=True, dayofyear=31).mean()
What do you mean by rolling here? As a reduce operation over a rolling window, like a moving average (rolling in xarray)? Or rolling around the end of a dataarray when shifting (roll in xarray)? Or a mix of both?
Specifically, what's the dayofyear=31 doing?
Probably a mix of both - I want to compute a moving average, but with periodic boundaries. rolling sets window_length - 1 elements to nan. However I want to calculate these like so:
running_mean_0 = xr.concat([ds[-15:], ds[:16]]).mean()
running_mean_1 = xr.concat([ds[-14:], ds[:17]]).mean()
and so on...
I think what I want is like the filter function in R with circular=True.
I found two possibilities but they are quite "hand made" and certainly not very efficient
Solution with slicing:
# take the last and first elements and append/ prepend them
first = ds[:15]
last = ds[-15:]
extended = xr.concat([last, ds, first], 'dayofyear')
# do the rolling on the extended ds and get rid of NaNs
sol1 = extended.rolling(dayofyear=31, center=True).mean().dropna('dayofyear')
Solution with roll:
roll1 = ds.roll(dayofyear=150).rolling(dayofyear=31, center=True).mean()
roll2 = ds.rolling(dayofyear=31, center=True).mean()
sol2 = xr.concat([roll1, roll2], dim='r').mean('r')
Call rolling on original and rolled dataset, and put them together again.
I think the implementation would be not so difficult by supporting more flexible fill_value option in xr.DataArrayRolling.construct method.
Maybe fill_value='periodic' would be a possible API,
da.rolling(dayofyear=31).construct('window', fill_value='periodic').mean('window')
@mathause, any interest in contributing?
Though I'm not sure you need the construct machinery.
IIUC you need to copy a window-sized amount of data from the front of the array onto the back.
You could do that with construct-like machinery, which would save a copy - though not a large copy
@maxim-lian , do you agree to add this feature?
Although the same behavior can be realized by adding head/tail values to the original array and truncate them after the computation, the periodic option would significantly simplify this.
@fujiisoup Yes for sure - I think it would be good. I think there are two salient questions:
construct? I don't think the kwarg should be called fill_value - that traditionally has a specific meaning of "the value to replace NaN with". I don't understand the periodic reference, but likely I'm missing something. Could be wrap=True, or roll=True?construct machinery? Or we make a new array and run rolling over it?I don't think the kwarg should be called fill_value - that traditionally has a specific meaning of "the value to replace NaN with".
Agreed. dask.ghost has boundaries keyword, for which we can choose between periodic, reflect, and any constant.
I think this would be a good reference.
Maybe we can deprecate fill_value keyword and replace it by boundaries?
(I slightly regret that I choose fill_value keyword in construct).
How it's implemented - do you have a view here?
Only a slight modification of construct machinery realizes this (see #2011).
I think this option should be available only in construct method (not in the traditional rolling constructor) for the sake of simplicity (according to this comment).
padAgree with your other comments. Thanks as ever @fujiisoup
I was going to suggest this feature so glad others are interested.
In my use case I would like to smooth a daily climatology. My colleague uses matlab and uses https://www.mathworks.com/matlabcentral/fileexchange/52688-nan-tolerant-fast-smooth
Using the slice solution as @mathause showed above, it would look something like (using code from http://xarray.pydata.org/en/stable/examples/weather-data.html#toy-weather-data)
import numpy as np
import pandas as pd
import xarray as xr
times = pd.date_range('2000-01-01', '2010-12-31', name='time')
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 366 - 0.28))
noise = 15 * np.random.rand(annual_cycle.size)
data = 10 + (15 * annual_cycle) + noise
da = xr.DataArray(data, coords=[times], dims='time')
#da.plot()
#Check variability at one day
#da.groupby('time.dayofyear').std('time')[0]
da_clim = da.groupby('time.dayofyear').mean('time')
_da_clim = xr.concat([da_clim[-15:], da_clim, da_clim[:15]], 'dayofyear')
da_clim_smooth = _da_clim.rolling(dayofyear=31, center=True).mean().dropna('dayofyear')
#da_clim_smooth.plot()
I am coming back to @shoyer suggestion in #2011 - your idea would be to do first a pad and then a rolling operation as e.g.:
import numpy as np
import xarray as xr
x = np.arange(1, 366)
y = np.random.randn(365)
ds = xr.DataArray(y, dims=dict(dayofyear=x))
ds.pad(dayofyear=15, mode='wrap').rolling(center=True, dayofyear=31).mean()
I think you may need to do cropping afterwards, too, before taking the mean.