Xarray: open_mfdataset fails on variable attributes with 'list' type

Created on 19 Jun 2019  路  6Comments  路  Source: pydata/xarray

Using open_mfdataset on a series of netcdf files having variable attributes with type list will fail with the following exception, when these attributes have different values from one file to another:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()


ncf = xarray.open_mfdataset(files)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/backends/api.py", line 658, in open_mfdataset
ids=ids)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 553, in _auto_combine
data_vars=data_vars, coords=coords)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 474, in _combine_nd
compat=compat)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 492, in _auto_combine_all_along_first_dim
data_vars, coords)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 510, in _auto_combine_1d
for id, ds_group in grouped_by_vars]
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 368, in _auto_concat
return concat(datasets, dim=dim, data_vars=data_vars, coords=coords)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 122, in concat
return f(objs, dim, data_vars, coords, compat, positions)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 307, in _dataset_concat
combined = concat_vars(vars, dim, positions)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/variable.py", line 1982, in concat
return Variable.concat(variables, dim, positions, shortcut)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/variable.py", line 1433, in concat
utils.remove_incompatible_items(attrs, var.attrs)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/utils.py", line 184, in remove_incompatible_items
not compat(first_dict[k], second_dict[k]))):
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/utils.py", line 133, in equivalent
(pd.isnull(first) and pd.isnull(second)))
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

An example of such variable is provided below:

    double sea_ice_fraction(time) ;
        sea_ice_fraction:least_significant_digit = 2LL ;
        sea_ice_fraction:_FillValue = 1.e+20 ;
        sea_ice_fraction:long_name = "sea ice fraction" ;
        sea_ice_fraction:standard_name = "sea_ice_fraction" ;
        sea_ice_fraction:authority = "CF 1.7" ;
        sea_ice_fraction:units = "1" ;
        sea_ice_fraction:coverage_content_type = "auxiliaryInformation" ;
        sea_ice_fraction:coordinates = "time lon lat" ;
        sea_ice_fraction:source = "CCI Sea Ice" ;
        sea_ice_fraction:institution = "ESA" ;
        string sea_ice_fraction:source_files = "ice_conc_nh_ease2-250_cdr-v2p0_199912011200.nc", "ice_conc_sh_ease2-250_cdr-v2p0_199912011200.nc" ;

The exception will occur when the source_files attribute have a different values in the file time series I am trying to concatenate. I had to use the preprocess argument to remove first this attribute to avoid this exception.

This is caused by the equivalent method in xarray/core/utils.py that does not account for this case:

def equivalent(first, second):
    """Compare two objects for equivalence (identity or equality), using
    array_equiv if either object is an ndarray
    """
    # TODO: refactor to avoid circular import
    from . import duck_array_ops
    if isinstance(first, np.ndarray) or isinstance(second, np.ndarray):
        return duck_array_ops.array_equiv(first, second)
    else:
        return ((first is second) or
                (first == second) or
                (pd.isnull(first) and pd.isnull(second)))
bug good first issue

All 6 comments

We could probably fix this by adding another case to equivalent for lists, e.g,.

elif isinstance(first, list) or isinstance(second, list):
    # verify both first and second are lists of the same length with the same elements
    # (by calling equivalent() on all elements)

Any interest in putting together a pull request for this? See http://xarray.pydata.org/en/stable/contributing.html

I will handle this issue this Saturday if no one else handle it by then.

Thanks @HasanAhmadQ7, that would be fantastic !

@jfpiolle Would you please share the files, or the code generating them?

As far as I understand we cannot have an attribute of type list. The closest thing I can think of is a variable length type as below

    from netCDF4 import Dataset
    f = Dataset("example.nc","w")
    vlen_t = f.createVLType(numpy.int32, "phony_vlen")
    x = f.createDimension("x",3)
    y = f.createDimension("y",4)
    vlvar = f.createVariable("phony_vlen_var", vlen_t, ("y","x"))
    data =[]
    for n in range(0, len(x)*len(y)):
        data.append(numpy.arange(random.randint(1,10),dtype="int32"))
    data = numpy.reshape(data,(len(y),len(x)))
    vlvar[:] = data
    f.close()

Using files generated this way, I did not get an error using mf_dataset.

@HasanAhmadQ7 You can have attributes of type list. See for instance this code:

from netCDF4 import Dataset
import numpy

f = Dataset("example.nc","w")
x = f.createDimension("x",3)
vlvar = f.createVariable("test_var", numpy.int32, ("x"))

# here create an attribute as a list
vlvar.test_attr = ["string a", "string b"]

vlvar[:] = numpy.arange(3)
f.close()

Having files with different values for the test_attr attribute will cause open_mfdataset to fail.

thanks for addressing this!

@jfpiolle Thank you very much for the clarification

Was this page helpful?
0 / 5 - 0 ratings