# Your code here
import xarray as xr
ds = xr.tutorial.load_dataset("air_temperature")
ds2 = xr.tutorial.load_dataset("air_temperature")
xr.testing.assert_identical(ds, ds2) # passes
ds.coarsen(lat=5).mean()
xr.testing.assert_identical(ds, ds2) # fails
Bug: coarsen reductions are modifying the original dataset by deleting all attributes
AssertionErrorTraceback (most recent call last)
<ipython-input-21-b0a179f01c99> in <module>
48 xr.testing.assert_identical(ds, ds2)
49 ds.coarsen(lat=5).mean()
---> 50 xr.testing.assert_identical(ds, ds2)
~/work/python/xarray/xarray/testing.py in assert_identical(a, b)
87 assert a.identical(b), formatting.diff_array_repr(a, b, "identical")
88 elif isinstance(a, (Dataset, Variable)):
---> 89 assert a.identical(b), formatting.diff_dataset_repr(a, b, "identical")
90 else:
91 raise TypeError("{} not supported by assertion comparison".format(type(a)))
AssertionError: Left and right Dataset objects are not identical
Differing coordinates:
L * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
R * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
standard_name: latitude
long_name: Latitude
units: degrees_north
axis: Y
Differing data variables:
L air (time, lat, lon) float32 241.2 242.5 243.5 ... 296.49 296.19 295.69
R air (time, lat, lon) float32 241.2 242.5 243.5 ... 296.49 296.19 295.69
long_name: 4xDaily Air temperature at sigma level 995
units: degK
precision: 2
GRIB_id: 11
GRIB_name: TMP
var_desc: Air temperature
dataset: NMC Reanalysis
level_desc: Surface
statistic: Individual Obs
parent_stat: Other
actual_range: [185.16 322.1 ]
After some testing, I discovered that:
# Your code here
import xarray as xr
xr.set_options(keep_attrs=True) # NOTE GLOBAL OPTION!!!!
ds = xr.tutorial.load_dataset("air_temperature")
ds2 = xr.tutorial.load_dataset("air_temperature")
xr.testing.assert_identical(ds, ds2) # passes
ds.coarsen(lat=5).mean()
xr.testing.assert_identical(ds, ds2) # passes
makes your example pass. So, it seems that somewhere along the chain of functions the keep_attrs parameter is being lost or modified.
@dcherian @jukent: After a little walk through the code, I think the problem is in xarray/core/variable.py. If you look at the first part of the coarsen function in this file:
you will see that **kwargs is not being pass into the _coarsen_reshape function:
And down at the bottom of that function:
it retrieves the keep_attrs parameter from global options, and doesn't check for a passed-in argument.
@jukent, why don't you draft a PR to fix this problem?
Also, while I was walking through the logic in this problem, I found that in the _reduce_method functions of the DataArrayCoarsen and DatasetCoarsen classes, the kwargs variable is being shadowed:
So, you can see that the skipna option is just being ignored. That's another pretty easy fix, but it will change existing behavior. It might be prudent to look to see if there are any known bug reports referencing the skipna parameter being ignored.
Thanks @kmpaul
These are two issues. But the third one is that the _original_ dataset ds is being modified inplace by ds.coarsen. This should never happen.
Yeah. That's true. I did overlook that. Thanks!
@dcherian Could this be because the return of variable.py/coarsen or is it likely happening earlier in the fx?
return self._replace(data=func(reshaped, axis=axes, **kwargs))
That could be the issue. If it is, you should be able to fix that using
return self.copy(data=func(reshaped, axis=axes, **kwargs))
~I think you are right. _replace calls the constructor with self.variable, ..., fastpath=True which does a direct assignment to DataArray._variable; so you have a new DataArray but with the same underlying data.~
EDIT: Ignore I was looking at dataarray.py; not variable.py
My doubts on this are because self._replace are used elsewhere in the code.
Changing from _replace to copy causes the tests on coarsen to fail, I am looking more into this now.
It seems that a condition of copy is that the data shapes match (variable.py line 947).
If I run
ds = xr.tutorial.load_dataset("air_temperature")
ds.air.coarsen(lat=5)
there is no problem, but once I add a .mean() to the end
ds = xr.tutorial.load_dataset("air_temperature")
ds.air.coarsen(lat=5).mean()
The error is
ValueError: Data shape (2920, 5, 53) must match shape of object (2920, 25, 53)
These lines are suspicious.
Maybe we should copy attrs here not geting its reference.
My last post was wrong.
I think this part overwrites the attrs,
https://github.com/pydata/xarray/blob/43a2a4bdf3a492d89aae9f2c5b0867932ff51cef/xarray/core/variable.py#L2028
https://github.com/pydata/xarray/blob/43a2a4bdf3a492d89aae9f2c5b0867932ff51cef/xarray/core/variable.py#L2073-L2076
The first line should be replaced by variable = self.copy(deep=False)
I just checked, and I think the issue can indeed be solved by replacing variable = self with variable = self.copy() (deep=False is the default, so no need to explicitly set it).
Also, I think _replace is a bit confusing: the name seems to imply a inplace operation, but it returns a new object without changing the original object.
Also, I think
_replaceis a bit confusing: the name seems to imply a inplace operation, but it returns a new object without changing the original object.
I think this mea culpa IIRC... I would be very open to consolidating this and .copy into something that returns a (generally shallow) copy of the object with some optional replacements. (The general approach is better than calling a constructor with every attribute, even if the names are not great)
Most helpful comment
I just checked, and I think the issue can indeed be solved by replacing
variable = selfwithvariable = self.copy()(deep=Falseis the default, so no need to explicitly set it).Also, I think
_replaceis a bit confusing: the name seems to imply a inplace operation, but it returns a new object without changing the original object.