Xarray: Keep attributes across operations

Created on 29 Nov 2018  路  6Comments  路  Source: pydata/xarray

The Problem

When I have two DataArrays and I use a standard operation (+, - ,*, /) the attributes vanish. I think that should not be the case. Even when using as suggested the set_options

import numpy as np
import xarray as xr
a = xr.DataArray(np.random.randn(3,3), dims=('x','y'), name='temp', attrs={'units':'K'})
b = xr.DataArray(np.random.randn(3,3), dims=('x','y'), name='temp', attrs={'units':'K'})
print(a)
<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 1.207407, -1.9429  ,  3.168454],
       [-0.773912, -0.121835, -0.139538],
       [ 1.823002,  0.185846,  0.53569 ]])
Dimensions without coordinates: x, y
Attributes:
    units:    K
print(a-b)
<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 1.280892, -1.097781,  2.150318],
       [-0.208202, -0.03856 ,  0.805856],
       [ 2.192506,  1.049181,  2.277078]])
Dimensions without coordinates: x, y

with xr.set_options(keep_attrs=True):
    print(a-b)

<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 1.280892, -1.097781,  2.150318],
       [-0.208202, -0.03856 ,  0.805856],
       [ 2.192506,  1.049181,  2.277078]])
Dimensions without coordinates: x, y

Problem description

Attributes vanish when a normal operation is applied!
From docs of set_options:
keep_attrs: rule for whether to keep attributes on xarray
Datasets/dataarrays after operations. Either True to always keep
attrs, False to always discard them, or 'default' to use original
logic that attrs should only be kept in unambiguous circumstances.
Default: 'default'.

Expected Output

The Attributes should remain. Maybe keep only attributes from the left Array ?
Please adjust or advise me.

Output of xr.show_versions()


``
xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-39-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.11.0
pandas: 0.23.4
numpy: 1.15.4
scipy: 1.1.0
netCDF4: 1.4.2
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.2.1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.20.2
distributed: 1.24.2
matplotlib: 3.0.1
cartopy: 0.16.0
seaborn: 0.9.0
setuptools: 40.6.2
pip: 18.1
conda: 4.5.11
pytest: 4.0.0
IPython: 7.1.1
sphinx: 1.8.2
``

bug

Most helpful comment

Thanks for the quick reply.
Not sure what a PR is. (Sorry I'm not that advanced in coding)
I figure, from code you have been using at other places, something like that

@staticmethod
def _binary_op(f, reflexive=False, **ignored_kwargs):
    @functools.wraps(f)
    def func(self, other):
        if isinstance(other, (xr.DataArray, xr.Dataset)):
            return NotImplemented
        self_data, other_data, dims = _broadcast_compat_data(self, other)
        # Add Attributes here ?
        keep_attrs = _get_keep_attrs(default=False)
        attrs = self._attrs if keep_attrs else None

        with np.errstate(all='ignore'):
            new_data = (f(self_data, other_data)
                        if not reflexive
                        else f(other_data, self_data))
        result = Variable(dims, new_data, attrs=attrs)
        return result
    return func

should do the trick. Right.
I cloned the recent version and tried out the new code. It works! :)


xr.show_versions()

INSTALLED VERSIONS

commit: 0d6056e8816e3d367a64f36c7f1a5c4e1ce4ed4e
python: 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-39-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.1

xarray: 0.11.0+10.g0d6056e8.dirty
pandas: 0.23.4
numpy: 1.15.4
scipy: 1.1.0
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.2.1
PseudonetCDF: None
rasterio: None
cfgrib: installed
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.20.2
distributed: 1.24.2
matplotlib: 3.0.1
cartopy: 0.16.0
seaborn: 0.9.0
setuptools: 40.6.2
pip: 18.1
conda: 4.5.11
pytest: 4.0.0
IPython: 7.1.1
sphinx: 1.8.2

When the option is not set, same behavior as before

print(a-b)                                                           
<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 0.133102, -1.275794,  1.331784],
       [ 0.995555, -0.509624,  0.188597],
       [ 1.922048, -0.053253, -0.293245]])
Dimensions without coordinates: x, y

set the option:

with xr.set_options(keep_attrs=True): 
     print(a-b) 

<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 0.133102, -1.275794,  1.331784],
       [ 0.995555, -0.509624,  0.188597],
       [ 1.922048, -0.053253, -0.293245]])
Dimensions without coordinates: x, y
Attributes:
    units:    K

works. Hope that helps you.

All 6 comments

Thanks for the report! It looks like we definitely overlooked this in arithmetic operations. I agree that keep_attrs=True should mean that attributes are maintained in arithmetic.

Any interest in putting together a PR?

Thanks for the quick reply.
Not sure what a PR is. (Sorry I'm not that advanced in coding)
I figure, from code you have been using at other places, something like that

@staticmethod
def _binary_op(f, reflexive=False, **ignored_kwargs):
    @functools.wraps(f)
    def func(self, other):
        if isinstance(other, (xr.DataArray, xr.Dataset)):
            return NotImplemented
        self_data, other_data, dims = _broadcast_compat_data(self, other)
        # Add Attributes here ?
        keep_attrs = _get_keep_attrs(default=False)
        attrs = self._attrs if keep_attrs else None

        with np.errstate(all='ignore'):
            new_data = (f(self_data, other_data)
                        if not reflexive
                        else f(other_data, self_data))
        result = Variable(dims, new_data, attrs=attrs)
        return result
    return func

should do the trick. Right.
I cloned the recent version and tried out the new code. It works! :)


xr.show_versions()

INSTALLED VERSIONS

commit: 0d6056e8816e3d367a64f36c7f1a5c4e1ce4ed4e
python: 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-39-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.1

xarray: 0.11.0+10.g0d6056e8.dirty
pandas: 0.23.4
numpy: 1.15.4
scipy: 1.1.0
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.2.1
PseudonetCDF: None
rasterio: None
cfgrib: installed
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.20.2
distributed: 1.24.2
matplotlib: 3.0.1
cartopy: 0.16.0
seaborn: 0.9.0
setuptools: 40.6.2
pip: 18.1
conda: 4.5.11
pytest: 4.0.0
IPython: 7.1.1
sphinx: 1.8.2

When the option is not set, same behavior as before

print(a-b)                                                           
<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 0.133102, -1.275794,  1.331784],
       [ 0.995555, -0.509624,  0.188597],
       [ 1.922048, -0.053253, -0.293245]])
Dimensions without coordinates: x, y

set the option:

with xr.set_options(keep_attrs=True): 
     print(a-b) 

<xarray.DataArray 'temp' (x: 3, y: 3)>
array([[ 0.133102, -1.275794,  1.331784],
       [ 0.995555, -0.509624,  0.188597],
       [ 1.922048, -0.053253, -0.293245]])
Dimensions without coordinates: x, y
Attributes:
    units:    K

works. Hope that helps you.

Not sure what a PR is. (Sorry I'm not that advanced in coding)

PR is a pull-request! If you can open a PR with your code, we can merge it to the repo. Would be greatly appreciated from xarray, and you'd be an xarray contributor. Let us know if we can help guide you through the mechanics.

@MBlaschek This might help: https://help.github.com/articles/proposing-changes-to-your-work-with-pull-requests/ . You'd start by creating a fork, then a branch with your changes, push your changes to github and then initiate a pull request.

Hi @MBlaschek, almost there! You'll need to open your pull request in this repository :).

You'll also need to add some tests to make sure your changes keep working as the code is updated in the future. E.g. https://github.com/pydata/xarray/blob/0d6056e8816e3d367a64f36c7f1a5c4e1ce4ed4e/xarray/tests/test_variable.py#L1533

Hi.
Ok Sorry. Had no idea what I was doing. So I hope I fixed it, the way you wanted. I added a test-routine test_binary_ops_keep_attrs
Created a new pull request, as I could not reopen the old one

Was this page helpful?
0 / 5 - 0 ratings