import xarray
file = 'temp_048.nc'
# Works ok with open_dataset
ds = xarray.open_dataset(file, decode_cf=True)
ds = xarray.open_dataset(file, decode_cf=False)
ds = xarray.decode_cf(ds)
# Fails with open_mfdataset
ds = xarray.open_mfdataset(file, decode_cf=True)
ds = xarray.open_mfdataset(file, decode_cf=False)
# This line throws an exception
ds = xarray.decode_cf(ds)
Nothing
When opening data with open_mfdataset calling decode_cf throws an error, when called as a separate step, but works as part of the open_mfdataset call.
Error is:
Traceback (most recent call last):
File "tmp.py", line 11, in <module>
ds = xarray.decode_cf(ds)
File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 479, in decode_cf
decode_coords, drop_variables=drop_variables, use_cftime=use_cftime)
File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 401, in decode_cf_variables
stack_char_dim=stack_char_dim, use_cftime=use_cftime)
File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 306, in decode_cf_variable
var = coder.decode(var, name=name)
File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 419, in decode
self.use_cftime)
File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 90, in _decode_cf_datetime_dtype
last_item(values) or [0]])
File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/core/formatting.py", line 99, in last_item
return np.ravel(array[indexer]).tolist()
AttributeError: 'Array' object has no attribute 'tolist'
xr.show_versions()commit: None
python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.21.3.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_AU.utf8
LANG: C
LOCALE: en_AU.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2
xarray: 0.12.1
pandas: 0.25.0
numpy: 1.17.0
scipy: 1.2.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: 1.5.5
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: 1.2.0
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.7.1
iris: 2.2.1dev0
bottleneck: 1.2.1
dask: 2.2.0
distributed: 2.2.0
matplotlib: 2.2.4
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.0.1
pip: 19.1.1
conda: installed
pytest: 5.0.1
IPython: 7.7.0
sphinx: None
There is no error using an older version of numpy with the same xarray version:
commit: None
python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.21.3.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_AU.utf8
LANG: C
LOCALE: en_AU.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2
xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.2.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: 1.2.0
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.7
iris: 2.2.1dev0
bottleneck: 1.2.1
dask: 1.2.2
distributed: 1.28.1
matplotlib: 2.2.3
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.0.1
pip: 19.1.1
conda: installed
pytest: 4.6.3
IPython: 7.5.0
sphinx: None
Looks like the tollst() method has disappeared from something, but even in the debugger it isn't obvious to me exactly why this is happening. I can call list on np.ravel(array[indexer]) at the same point and it works.
The netcdf file I am using can be recreated from this CDL dump
netcdf temp_048 {
dimensions:
time = UNLIMITED ; // (5 currently)
nv = 2 ;
variables:
double average_T1(time) ;
average_T1:long_name = "Start time for average period" ;
average_T1:units = "days since 1958-01-01 00:00:00" ;
average_T1:missing_value = 1.e+20 ;
average_T1:_FillValue = 1.e+20 ;
double time(time) ;
time:long_name = "time" ;
time:units = "days since 1958-01-01 00:00:00" ;
time:cartesian_axis = "T" ;
time:calendar_type = "GREGORIAN" ;
time:calendar = "GREGORIAN" ;
time:bounds = "time_bounds" ;
double time_bounds(time, nv) ;
time_bounds:long_name = "time axis boundaries" ;
time_bounds:units = "days" ;
time_bounds:missing_value = 1.e+20 ;
time_bounds:_FillValue = 1.e+20 ;
// global attributes:
:filename = "ocean.nc" ;
:title = "MOM5" ;
:grid_type = "mosaic" ;
:grid_tile = "1" ;
:history = "Wed Aug 14 16:38:53 2019: ncks -O -v average_T1 /g/data3/hh5/tmp/cosima/access-om2/1deg_jra55v13_iaf_spinup1_B1_lastcycle/output048/ocean/ocean.nc temp_048.nc" ;
:NCO = "netCDF Operators version 4.7.7 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
data:
average_T1 = 87659, 88024, 88389, 88754, 89119 ;
time = 87841.5, 88206.5, 88571.5, 88936.5, 89301.5 ;
time_bounds =
87659, 88024,
88024, 88389,
88389, 88754,
88754, 89119,
89119, 89484 ;
}
I'm having the same issue with xarray 0.12.3, numpy 1.17.0, python 3.7.3.
I think this is being thrown by dask, here is an even more minimal example:
>>> import dask as da
>>> da.array.from_array([]).tolist()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Array' object has no attribute 'tolist'
Seems that this is by design, from here
Dask Array doesn鈥檛 implement operations like tolist that would be very inefficient for larger datasets. Likewise, it is very inefficient to iterate over a Dask array with for loops
So dask has never had a tolist method, so in one case the object is a dask array, but not in the other case.
I still don't understand why it fails when decode_cf is called separately. Suggests there is a different code path as all underlying packages are identical.
This is definitely due to NumPy 1.17 / __array_function__, which means that np.ravel now calls out to a dask function rather than coercing to NumPy.
I think the right fix is to add explicit casting to NumPy, e.g., np.ravel(np.asarray(array[indexer]))
A short term work around is to set the environment variable NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 before importing NumPy.
Thanks @shoyer.
I still don't understand the different code paths between decode_cf=True and decode_cf=False + explicit call to xarray.decode_cf()
I still don't understand the different code paths between
decode_cf=Trueanddecode_cf=False+ explicit call toxarray.decode_cf()
In the first case, CF conventions decoding is done on xarray's internal lazy BackendArray objects.
In the second case, CF conventions decoding is done on dask arrays. There are a few cases where this can be slower / require loading more data to look at a small slice of array data.
Thanks for the explanation.
Confirmed that NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 fixes this issue for me.
Have submitted a PR with your suggested fix https://github.com/pydata/xarray/pull/3220
Confirmed the submitted code fixes my issue.
Most helpful comment
A short term work around is to set the environment variable
NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0before importing NumPy.