Xarray: Documentation of DataArray does not warn that inferring dimension names is deprecated

Created on 2 Mar 2020  Â·  8Comments  Â·  Source: pydata/xarray

The documentation states:

coords...
...
Additionally, it is possible to define a coord whose name does not match the dimension name, or a coord based on multiple dimensions, with one of the following notations:

  • mapping {coord name: DataArray}
  • mapping {coord name: Variable}
  • mapping {coord name: (dimension name, array-like)}
  • mapping {coord name: (tuple of dimension names, array-like)}

dims (hashable or sequence of hashable, optional) – Name(s) of the data dimension(s). Must be either a hashable (only for 1D data) or a sequence of hashables with length equal to the number of dimensions. If this argument is omitted, dimension names are taken from coords (if possible) and otherwise default to ['dim_0', ... 'dim_n'].

Which seems to be no longer the case.

MCVE Code Sample

da = xr.DataArray(np.zeros((2, 2)), coords={'x': [1, 2], 'y': [1, 2]})                                                    
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-0d63eed9a72f> in <module>
----> 1 da = xr.DataArray(np.zeros((2, 2)), coords={'x': [1, 2], 'y': [1, 2]})

/scratch/local/lib/python3.8/site-packages/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, encoding, indexes, fastpath)
    366             data = _check_data_shape(data, coords, dims)
    367             data = as_compatible_data(data)
--> 368             coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
    369             variable = Variable(dims, data, attrs, encoding, fastpath=True)
    370 

/scratch/local/lib/python3.8/site-packages/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims)
    105             if utils.is_dict_like(coords):
    106                 # deprecated in GH993, removed in GH1539
--> 107                 raise ValueError(
    108                     "inferring DataArray dimensions from "
    109                     "dictionary like ``coords`` is no longer "

ValueError: inferring DataArray dimensions from dictionary like ``coords`` is no longer supported. Use an explicit list of ``dims`` instead.

Expected Output

An update of the documentation to correctly specify the current behavior. (I'll propose a PR later today)

Problem Description

A mismatch between API specification and it's behavior seems like a problem to me :)

Output of xr.show_versions()


xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.1 (default, Feb 11 2020, 12:54:25)
[GCC 5.4.0 20160609]
python-bits: 64
OS: Linux
OS-release: 4.15.0-88-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_IE.UTF-8
LOCALE: en_IE.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.3

xarray: 0.14.1
pandas: 0.25.3
numpy: 1.18.0
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.1.2
cartopy: None
seaborn: None
numbagg: None
setuptools: 45.2.0
pip: 20.0.2
conda: None
pytest: None
IPython: 7.11.1
sphinx: 2.3.1

Most helpful comment

Note that inferring dimensions from coords when it is a list of tuples does still work (with no deprecation warning):

In [1]: import numpy as np, xarray as xr                                                                                                                                                                                                                                     

In [2]: xr.DataArray(np.zeros((2, 2)), coords=[('x', [1, 2]), ('y', [1, 2])])                                                                                                                                                                                                
Out[2]: 
<xarray.DataArray (x: 2, y: 2)>
array([[0., 0.],
       [0., 0.]])
Coordinates:
  * x        (x) int64 1 2
  * y        (y) int64 1 2

All 8 comments

Agree! Thanks for spotting @sjvrijn . PR is appreciated, thanks for the offer!

For the PR: in general we don't need to document deprecated behavior, so you can remove than description rather than caveat it

@max-sixty Thanks for the tip. In the end it meant just changing the last line on dims. The paragraph on the coords argument is still valid after all.

On a related note: according to #727 (PR #993), this was deprecated since key-order in dictionaries was arbitrary at the time of that issue. However, their order is fixed since Python3.7, as noted in the documentation:

_Changed in version 3.7_: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6.

I guess it's still too soon to 'un-deprecate' this behavior again? 👼

I guess it's still too soon to 'un-deprecate' this behavior again?

Ha! We do support 3.6. I think that almost everyone is using xarray with CPython, so it's almost certainly safe. But maybe not worth the tradeoff given the benefit is marginal, what are your thoughts?

I think that inferring dimension-names from the coords-dict is the most intuitive way to define a DataArray.

Passing a dictionary for coords is in my opinion the clearest way to indicate which coordinates belong to which dimension, so then why do I have to specify the same dimension names again?

An example of how I create them from my current project:

values = xr.DataArray(
    values,
    coords={'n_high': n_highs,
            'n_low': n_lows,
            'rep': repetitions,
            'model': models,
            'idx': range(n_test_samples),},
    dims=['n_high', 'n_low', 'rep', 'model', 'idx'],  <-- repeated dim names
    attrs=attributes,
)

If you expect almost everyone to use CPython or 3.7+ anyway, then I don't actually see any drawbacks, while it would regularly make code shorter and less repetitive.

I definitely agree it's better if possible. I think the tradeoff is between better for almost everyone vs the potential of a break for a small minority.

Currently we support python versions for 42 months, and Python 3.6 was released December 2016. So that means we support it until July 2020 -- given that timeframe I think we wait until then and add this back?

Waiting a few more months until it will definitely not be a problem for anyone seems fair to me :+1:

Note that inferring dimensions from coords when it is a list of tuples does still work (with no deprecation warning):

In [1]: import numpy as np, xarray as xr                                                                                                                                                                                                                                     

In [2]: xr.DataArray(np.zeros((2, 2)), coords=[('x', [1, 2]), ('y', [1, 2])])                                                                                                                                                                                                
Out[2]: 
<xarray.DataArray (x: 2, y: 2)>
array([[0., 0.],
       [0., 0.]])
Coordinates:
  * x        (x) int64 1 2
  * y        (y) int64 1 2
Was this page helpful?
0 / 5 - 0 ratings

Related issues

phausamann picture phausamann  Â·  3Comments

d-chambers picture d-chambers  Â·  4Comments

Yefee picture Yefee  Â·  4Comments

duncanwp picture duncanwp  Â·  4Comments

jbusecke picture jbusecke  Â·  4Comments