Xarray: coarsen deletes attrs on original object

Created on 3 Jun 2020  路  13Comments  路  Source: pydata/xarray

MCVE Code Sample

# Your code here
import xarray as xr

ds = xr.tutorial.load_dataset("air_temperature")
ds2 = xr.tutorial.load_dataset("air_temperature")

xr.testing.assert_identical(ds, ds2) # passes
ds.coarsen(lat=5).mean()
xr.testing.assert_identical(ds, ds2) # fails

Bug: coarsen reductions are modifying the original dataset by deleting all attributes

AssertionErrorTraceback (most recent call last)
<ipython-input-21-b0a179f01c99> in <module>
     48 xr.testing.assert_identical(ds, ds2)
     49 ds.coarsen(lat=5).mean()
---> 50 xr.testing.assert_identical(ds, ds2)

~/work/python/xarray/xarray/testing.py in assert_identical(a, b)
     87         assert a.identical(b), formatting.diff_array_repr(a, b, "identical")
     88     elif isinstance(a, (Dataset, Variable)):
---> 89         assert a.identical(b), formatting.diff_dataset_repr(a, b, "identical")
     90     else:
     91         raise TypeError("{} not supported by assertion comparison".format(type(a)))

AssertionError: Left and right Dataset objects are not identical

Differing coordinates:
L * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
R * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
    standard_name: latitude
    long_name: Latitude
    units: degrees_north
    axis: Y
Differing data variables:
L   air      (time, lat, lon) float32 241.2 242.5 243.5 ... 296.49 296.19 295.69
R   air      (time, lat, lon) float32 241.2 242.5 243.5 ... 296.49 296.19 295.69
    long_name: 4xDaily Air temperature at sigma level 995
    units: degK
    precision: 2
    GRIB_id: 11
    GRIB_name: TMP
    var_desc: Air temperature
    dataset: NMC Reanalysis
    level_desc: Surface
    statistic: Individual Obs
    parent_stat: Other
    actual_range: [185.16 322.1 ]
bug metadata

Most helpful comment

I just checked, and I think the issue can indeed be solved by replacing variable = self with variable = self.copy() (deep=False is the default, so no need to explicitly set it).

Also, I think _replace is a bit confusing: the name seems to imply a inplace operation, but it returns a new object without changing the original object.

All 13 comments

After some testing, I discovered that:

# Your code here
import xarray as xr

xr.set_options(keep_attrs=True) # NOTE GLOBAL OPTION!!!!

ds = xr.tutorial.load_dataset("air_temperature")
ds2 = xr.tutorial.load_dataset("air_temperature")

xr.testing.assert_identical(ds, ds2) # passes
ds.coarsen(lat=5).mean()
xr.testing.assert_identical(ds, ds2) # passes

makes your example pass. So, it seems that somewhere along the chain of functions the keep_attrs parameter is being lost or modified.

@dcherian @jukent: After a little walk through the code, I think the problem is in xarray/core/variable.py. If you look at the first part of the coarsen function in this file:

https://github.com/pydata/xarray/blob/df7b2eae3a26c1e86bd5f1dd7dab9cc8c4e53914/xarray/core/variable.py#L1945-L1953

you will see that **kwargs is not being pass into the _coarsen_reshape function:

https://github.com/pydata/xarray/blob/df7b2eae3a26c1e86bd5f1dd7dab9cc8c4e53914/xarray/core/variable.py#L1961

And down at the bottom of that function:

https://github.com/pydata/xarray/blob/df7b2eae3a26c1e86bd5f1dd7dab9cc8c4e53914/xarray/core/variable.py#L2024-L2025

it retrieves the keep_attrs parameter from global options, and doesn't check for a passed-in argument.

@jukent, why don't you draft a PR to fix this problem?

Also, while I was walking through the logic in this problem, I found that in the _reduce_method functions of the DataArrayCoarsen and DatasetCoarsen classes, the kwargs variable is being shadowed:

https://github.com/pydata/xarray/blob/df7b2eae3a26c1e86bd5f1dd7dab9cc8c4e53914/xarray/core/rolling.py#L692-L701

So, you can see that the skipna option is just being ignored. That's another pretty easy fix, but it will change existing behavior. It might be prudent to look to see if there are any known bug reports referencing the skipna parameter being ignored.

Thanks @kmpaul

These are two issues. But the third one is that the _original_ dataset ds is being modified inplace by ds.coarsen. This should never happen.

Yeah. That's true. I did overlook that. Thanks!

@dcherian Could this be because the return of variable.py/coarsen or is it likely happening earlier in the fx?

return self._replace(data=func(reshaped, axis=axes, **kwargs))

That could be the issue. If it is, you should be able to fix that using

return self.copy(data=func(reshaped, axis=axes, **kwargs))

~I think you are right. _replace calls the constructor with self.variable, ..., fastpath=True which does a direct assignment to DataArray._variable; so you have a new DataArray but with the same underlying data.~
EDIT: Ignore I was looking at dataarray.py; not variable.py

My doubts on this are because self._replace are used elsewhere in the code.

Changing from _replace to copy causes the tests on coarsen to fail, I am looking more into this now.


It seems that a condition of copy is that the data shapes match (variable.py line 947).

If I run

ds = xr.tutorial.load_dataset("air_temperature")
ds.air.coarsen(lat=5)

there is no problem, but once I add a .mean() to the end

ds = xr.tutorial.load_dataset("air_temperature")
ds.air.coarsen(lat=5).mean()

The error is

ValueError: Data shape (2920, 5, 53) must match shape of object (2920, 25, 53)

I just checked, and I think the issue can indeed be solved by replacing variable = self with variable = self.copy() (deep=False is the default, so no need to explicitly set it).

Also, I think _replace is a bit confusing: the name seems to imply a inplace operation, but it returns a new object without changing the original object.

Also, I think _replace is a bit confusing: the name seems to imply a inplace operation, but it returns a new object without changing the original object.

I think this mea culpa IIRC... I would be very open to consolidating this and .copy into something that returns a (generally shallow) copy of the object with some optional replacements. (The general approach is better than calling a constructor with every attribute, even if the names are not great)

Was this page helpful?
0 / 5 - 0 ratings