Xarray: decode_cf called on mfdataset throws error: 'Array' object has no attribute 'tolist'

Created on 14 Aug 2019  路  9Comments  路  Source: pydata/xarray

MCVE Code Sample

import xarray

file = 'temp_048.nc'

# Works ok with open_dataset
ds = xarray.open_dataset(file, decode_cf=True)
ds = xarray.open_dataset(file, decode_cf=False)
ds = xarray.decode_cf(ds)

# Fails with open_mfdataset
ds = xarray.open_mfdataset(file, decode_cf=True)
ds = xarray.open_mfdataset(file, decode_cf=False)
# This line throws an exception
ds = xarray.decode_cf(ds)

Expected Output

Nothing

Problem Description


When opening data with open_mfdataset calling decode_cf throws an error, when called as a separate step, but works as part of the open_mfdataset call.
Error is:

Traceback (most recent call last):
  File "tmp.py", line 11, in <module>
    ds = xarray.decode_cf(ds)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 479, in decode_cf
    decode_coords, drop_variables=drop_variables, use_cftime=use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 401, in decode_cf_variables
    stack_char_dim=stack_char_dim, use_cftime=use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 306, in decode_cf_variable
    var = coder.decode(var, name=name)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 419, in decode
    self.use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 90, in _decode_cf_datetime_dtype
    last_item(values) or [0]])
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/core/formatting.py", line 99, in last_item
    return np.ravel(array[indexer]).tolist()
AttributeError: 'Array' object has no attribute 'tolist'

Output of xr.show_versions()

Paste the output here xr.show_versions() here

INSTALLED VERSIONS

commit: None
python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.21.3.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_AU.utf8
LANG: C
LOCALE: en_AU.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.12.1
pandas: 0.25.0
numpy: 1.17.0
scipy: 1.2.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: 1.5.5
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: 1.2.0
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.7.1
iris: 2.2.1dev0
bottleneck: 1.2.1
dask: 2.2.0
distributed: 2.2.0
matplotlib: 2.2.4
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.0.1
pip: 19.1.1
conda: installed
pytest: 5.0.1
IPython: 7.7.0
sphinx: None

There is no error using an older version of numpy with the same xarray version:

INSTALLED VERSIONS

commit: None
python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.21.3.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_AU.utf8
LANG: C
LOCALE: en_AU.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.2.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: 1.2.0
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.7
iris: 2.2.1dev0
bottleneck: 1.2.1
dask: 1.2.2
distributed: 1.28.1
matplotlib: 2.2.3
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.0.1
pip: 19.1.1
conda: installed
pytest: 4.6.3
IPython: 7.5.0
sphinx: None

Looks like the tollst() method has disappeared from something, but even in the debugger it isn't obvious to me exactly why this is happening. I can call list on np.ravel(array[indexer]) at the same point and it works.

The netcdf file I am using can be recreated from this CDL dump

netcdf temp_048 {
dimensions:
        time = UNLIMITED ; // (5 currently)
        nv = 2 ;
variables:
        double average_T1(time) ;
                average_T1:long_name = "Start time for average period" ;
                average_T1:units = "days since 1958-01-01 00:00:00" ;
                average_T1:missing_value = 1.e+20 ;
                average_T1:_FillValue = 1.e+20 ;
        double time(time) ;
                time:long_name = "time" ;
                time:units = "days since 1958-01-01 00:00:00" ;
                time:cartesian_axis = "T" ;
                time:calendar_type = "GREGORIAN" ;
                time:calendar = "GREGORIAN" ;
                time:bounds = "time_bounds" ;
        double time_bounds(time, nv) ;
                time_bounds:long_name = "time axis boundaries" ;
                time_bounds:units = "days" ;
                time_bounds:missing_value = 1.e+20 ;
                time_bounds:_FillValue = 1.e+20 ;

// global attributes:
                :filename = "ocean.nc" ;
                :title = "MOM5" ;
                :grid_type = "mosaic" ;
                :grid_tile = "1" ;
                :history = "Wed Aug 14 16:38:53 2019: ncks -O -v average_T1 /g/data3/hh5/tmp/cosima/access-om2/1deg_jra55v13_iaf_spinup1_B1_lastcycle/output048/ocean/ocean.nc temp_048.nc" ;
                :NCO = "netCDF Operators version 4.7.7 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
data:

 average_T1 = 87659, 88024, 88389, 88754, 89119 ;

 time = 87841.5, 88206.5, 88571.5, 88936.5, 89301.5 ;

 time_bounds =
  87659, 88024,
  88024, 88389,
  88389, 88754,
  88754, 89119,
  89119, 89484 ;
}
bug

Most helpful comment

A short term work around is to set the environment variable NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 before importing NumPy.

All 9 comments

I'm having the same issue with xarray 0.12.3, numpy 1.17.0, python 3.7.3.

I think this is being thrown by dask, here is an even more minimal example:

>>> import dask as da
>>> da.array.from_array([]).tolist()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Array' object has no attribute 'tolist'

Seems that this is by design, from here

Dask Array doesn鈥檛 implement operations like tolist that would be very inefficient for larger datasets. Likewise, it is very inefficient to iterate over a Dask array with for loops

So dask has never had a tolist method, so in one case the object is a dask array, but not in the other case.

I still don't understand why it fails when decode_cf is called separately. Suggests there is a different code path as all underlying packages are identical.

This is definitely due to NumPy 1.17 / __array_function__, which means that np.ravel now calls out to a dask function rather than coercing to NumPy.

I think the right fix is to add explicit casting to NumPy, e.g., np.ravel(np.asarray(array[indexer]))

A short term work around is to set the environment variable NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 before importing NumPy.

Thanks @shoyer.

I still don't understand the different code paths between decode_cf=True and decode_cf=False + explicit call to xarray.decode_cf()

I still don't understand the different code paths between decode_cf=True and decode_cf=False + explicit call to xarray.decode_cf()

In the first case, CF conventions decoding is done on xarray's internal lazy BackendArray objects.

In the second case, CF conventions decoding is done on dask arrays. There are a few cases where this can be slower / require loading more data to look at a small slice of array data.

Thanks for the explanation.

Confirmed that NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 fixes this issue for me.

Have submitted a PR with your suggested fix https://github.com/pydata/xarray/pull/3220

Confirmed the submitted code fixes my issue.

Was this page helpful?
0 / 5 - 0 ratings