Xarray: decode_cf called on mfdataset throws error: 'Array' object has no attribute 'tolist'

Created on 14 Aug 2019 · 9Comments · Source: pydata/xarray

MCVE Code Sample

import xarray

file = 'temp_048.nc'

# Works ok with open_dataset
ds = xarray.open_dataset(file, decode_cf=True)
ds = xarray.open_dataset(file, decode_cf=False)
ds = xarray.decode_cf(ds)

# Fails with open_mfdataset
ds = xarray.open_mfdataset(file, decode_cf=True)
ds = xarray.open_mfdataset(file, decode_cf=False)
# This line throws an exception
ds = xarray.decode_cf(ds)

Expected Output

Nothing

Problem Description

When opening data with open_mfdataset calling decode_cf throws an error, when called as a separate step, but works as part of the open_mfdataset call.
Error is:

Traceback (most recent call last):
  File "tmp.py", line 11, in <module>
    ds = xarray.decode_cf(ds)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 479, in decode_cf
    decode_coords, drop_variables=drop_variables, use_cftime=use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 401, in decode_cf_variables
    stack_char_dim=stack_char_dim, use_cftime=use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/conventions.py", line 306, in decode_cf_variable
    var = coder.decode(var, name=name)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 419, in decode
    self.use_cftime)
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/coding/times.py", line 90, in _decode_cf_datetime_dtype
    last_item(values) or [0]])
  File "/g/data3/hh5/public/apps/miniconda3/envs/analysis3-19.07/lib/python3.6/site-packages/xarray/core/formatting.py", line 99, in last_item
    return np.ravel(array[indexer]).tolist()
AttributeError: 'Array' object has no attribute 'tolist'

Output of `xr.show_versions()`

Paste the output here xr.show_versions() here

INSTALLED VERSIONS

commit: None
python: 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:18:42)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.21.3.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_AU.utf8
LANG: C
LOCALE: en_AU.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.12.1
pandas: 0.25.0
numpy: 1.17.0
scipy: 1.2.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: 1.5.5
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: 1.2.0
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.7.1
iris: 2.2.1dev0
bottleneck: 1.2.1
dask: 2.2.0
distributed: 2.2.0
matplotlib: 2.2.4
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.0.1
pip: 19.1.1
conda: installed
pytest: 5.0.1
IPython: 7.7.0
sphinx: None

There is no error using an older version of numpy with the same xarray version:

INSTALLED VERSIONS

commit: None
python: 3.6.7 | packaged by conda-forge | (default, Feb 28 2019, 09:07:38)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.21.3.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_AU.utf8
LANG: C
LOCALE: en_AU.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.2.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: 1.0.3.4
nc_time_axis: 1.2.0
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.7
iris: 2.2.1dev0
bottleneck: 1.2.1
dask: 1.2.2
distributed: 1.28.1
matplotlib: 2.2.3
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.0.1
pip: 19.1.1
conda: installed
pytest: 4.6.3
IPython: 7.5.0
sphinx: None

Looks like the tollst() method has disappeared from something, but even in the debugger it isn't obvious to me exactly why this is happening. I can call list on np.ravel(array[indexer]) at the same point and it works.

The netcdf file I am using can be recreated from this CDL dump

netcdf temp_048 {
dimensions:
        time = UNLIMITED ; // (5 currently)
        nv = 2 ;
variables:
        double average_T1(time) ;
                average_T1:long_name = "Start time for average period" ;
                average_T1:units = "days since 1958-01-01 00:00:00" ;
                average_T1:missing_value = 1.e+20 ;
                average_T1:_FillValue = 1.e+20 ;
        double time(time) ;
                time:long_name = "time" ;
                time:units = "days since 1958-01-01 00:00:00" ;
                time:cartesian_axis = "T" ;
                time:calendar_type = "GREGORIAN" ;
                time:calendar = "GREGORIAN" ;
                time:bounds = "time_bounds" ;
        double time_bounds(time, nv) ;
                time_bounds:long_name = "time axis boundaries" ;
                time_bounds:units = "days" ;
                time_bounds:missing_value = 1.e+20 ;
                time_bounds:_FillValue = 1.e+20 ;

// global attributes:
                :filename = "ocean.nc" ;
                :title = "MOM5" ;
                :grid_type = "mosaic" ;
                :grid_tile = "1" ;
                :history = "Wed Aug 14 16:38:53 2019: ncks -O -v average_T1 /g/data3/hh5/tmp/cosima/access-om2/1deg_jra55v13_iaf_spinup1_B1_lastcycle/output048/ocean/ocean.nc temp_048.nc" ;
                :NCO = "netCDF Operators version 4.7.7 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
data:

 average_T1 = 87659, 88024, 88389, 88754, 89119 ;

 time = 87841.5, 88206.5, 88571.5, 88936.5, 89301.5 ;

 time_bounds =
  87659, 88024,
  88024, 88389,
  88389, 88754,
  88754, 89119,
  89119, 89484 ;
}

bug

Source

aidanheerdegen

Most helpful comment

A short term work around is to set the environment variable NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 before importing NumPy.

shoyer on 15 Aug 2019

👍2

All 9 comments

I'm having the same issue with xarray 0.12.3, numpy 1.17.0, python 3.7.3.

brianpm on 14 Aug 2019

I think this is being thrown by dask, here is an even more minimal example:

>>> import dask as da
>>> da.array.from_array([]).tolist()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Array' object has no attribute 'tolist'

DocOtak on 14 Aug 2019

Seems that this is by design, from here

Dask Array doesn’t implement operations like tolist that would be very inefficient for larger datasets. Likewise, it is very inefficient to iterate over a Dask array with for loops

So dask has never had a tolist method, so in one case the object is a dask array, but not in the other case.

I still don't understand why it fails when decode_cf is called separately. Suggests there is a different code path as all underlying packages are identical.

aidanheerdegen on 15 Aug 2019

This is definitely due to NumPy 1.17 / __array_function__, which means that np.ravel now calls out to a dask function rather than coercing to NumPy.

I think the right fix is to add explicit casting to NumPy, e.g., np.ravel(np.asarray(array[indexer]))

shoyer on 15 Aug 2019

A short term work around is to set the environment variable NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 before importing NumPy.

shoyer on 15 Aug 2019

👍2

Thanks @shoyer.

I still don't understand the different code paths between decode_cf=True and decode_cf=False + explicit call to xarray.decode_cf()

aidanheerdegen on 15 Aug 2019

I still don't understand the different code paths between decode_cf=True and decode_cf=False + explicit call to xarray.decode_cf()

In the first case, CF conventions decoding is done on xarray's internal lazy BackendArray objects.

In the second case, CF conventions decoding is done on dask arrays. There are a few cases where this can be slower / require loading more data to look at a small slice of array data.

shoyer on 15 Aug 2019

Thanks for the explanation.

aidanheerdegen on 15 Aug 2019

Confirmed that NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 fixes this issue for me.

Have submitted a PR with your suggested fix https://github.com/pydata/xarray/pull/3220

Confirmed the submitted code fixes my issue.

aidanheerdegen on 15 Aug 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

assign_coords with datetime64[us] changes dtype to datetime64[ns]

andrewpauling · 3Comments

When reporting errors, note what value was invalid and why

Zac-HD · 3Comments

Support flexible DataArray shapes in Dataset

ray306 · 4Comments

Structured numpy arrays, xarray and netCDF(4)

tfurf · 4Comments

Awkward array backend?

benbovy · 3Comments

Xarray: decode_cf called on mfdataset throws error: 'Array' object has no attribute 'tolist'

MCVE Code Sample

Expected Output

Problem Description

Output of xr.show_versions()

Paste the output here xr.show_versions() here

INSTALLED VERSIONS

INSTALLED VERSIONS

Most helpful comment

All 9 comments

Related issues

Output of `xr.show_versions()`