Using open_mfdataset on a series of netcdf files having variable attributes with type list will fail with the following exception, when these attributes have different values from one file to another:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
ncf = xarray.open_mfdataset(files)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/backends/api.py", line 658, in open_mfdataset
ids=ids)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 553, in _auto_combine
data_vars=data_vars, coords=coords)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 474, in _combine_nd
compat=compat)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 492, in _auto_combine_all_along_first_dim
data_vars, coords)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 510, in _auto_combine_1d
for id, ds_group in grouped_by_vars]
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 368, in _auto_concat
return concat(datasets, dim=dim, data_vars=data_vars, coords=coords)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 122, in concat
return f(objs, dim, data_vars, coords, compat, positions)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/combine.py", line 307, in _dataset_concat
combined = concat_vars(vars, dim, positions)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/variable.py", line 1982, in concat
return Variable.concat(variables, dim, positions, shortcut)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/variable.py", line 1433, in concat
utils.remove_incompatible_items(attrs, var.attrs)
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/utils.py", line 184, in remove_incompatible_items
not compat(first_dict[k], second_dict[k]))):
File "/home/ananda/jfpiolle/miniconda2/envs/cerbere/lib/python2.7/site-packages/xarray/core/utils.py", line 133, in equivalent
(pd.isnull(first) and pd.isnull(second)))
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
An example of such variable is provided below:
double sea_ice_fraction(time) ;
sea_ice_fraction:least_significant_digit = 2LL ;
sea_ice_fraction:_FillValue = 1.e+20 ;
sea_ice_fraction:long_name = "sea ice fraction" ;
sea_ice_fraction:standard_name = "sea_ice_fraction" ;
sea_ice_fraction:authority = "CF 1.7" ;
sea_ice_fraction:units = "1" ;
sea_ice_fraction:coverage_content_type = "auxiliaryInformation" ;
sea_ice_fraction:coordinates = "time lon lat" ;
sea_ice_fraction:source = "CCI Sea Ice" ;
sea_ice_fraction:institution = "ESA" ;
string sea_ice_fraction:source_files = "ice_conc_nh_ease2-250_cdr-v2p0_199912011200.nc", "ice_conc_sh_ease2-250_cdr-v2p0_199912011200.nc" ;
The exception will occur when the source_files attribute have a different values in the file time series I am trying to concatenate. I had to use the preprocess argument to remove first this attribute to avoid this exception.
This is caused by the equivalent method in xarray/core/utils.py that does not account for this case:
def equivalent(first, second):
"""Compare two objects for equivalence (identity or equality), using
array_equiv if either object is an ndarray
"""
# TODO: refactor to avoid circular import
from . import duck_array_ops
if isinstance(first, np.ndarray) or isinstance(second, np.ndarray):
return duck_array_ops.array_equiv(first, second)
else:
return ((first is second) or
(first == second) or
(pd.isnull(first) and pd.isnull(second)))
We could probably fix this by adding another case to equivalent for lists, e.g,.
elif isinstance(first, list) or isinstance(second, list):
# verify both first and second are lists of the same length with the same elements
# (by calling equivalent() on all elements)
Any interest in putting together a pull request for this? See http://xarray.pydata.org/en/stable/contributing.html
I will handle this issue this Saturday if no one else handle it by then.
Thanks @HasanAhmadQ7, that would be fantastic !
@jfpiolle Would you please share the files, or the code generating them?
As far as I understand we cannot have an attribute of type list. The closest thing I can think of is a variable length type as below
from netCDF4 import Dataset
f = Dataset("example.nc","w")
vlen_t = f.createVLType(numpy.int32, "phony_vlen")
x = f.createDimension("x",3)
y = f.createDimension("y",4)
vlvar = f.createVariable("phony_vlen_var", vlen_t, ("y","x"))
data =[]
for n in range(0, len(x)*len(y)):
data.append(numpy.arange(random.randint(1,10),dtype="int32"))
data = numpy.reshape(data,(len(y),len(x)))
vlvar[:] = data
f.close()
Using files generated this way, I did not get an error using mf_dataset.
@HasanAhmadQ7 You can have attributes of type list. See for instance this code:
from netCDF4 import Dataset
import numpy
f = Dataset("example.nc","w")
x = f.createDimension("x",3)
vlvar = f.createVariable("test_var", numpy.int32, ("x"))
# here create an attribute as a list
vlvar.test_attr = ["string a", "string b"]
vlvar[:] = numpy.arange(3)
f.close()
Having files with different values for the test_attr attribute will cause open_mfdataset to fail.
thanks for addressing this!
@jfpiolle Thank you very much for the clarification