Xarray: Set one-dimensional data variable as dimension coordinate?

Created on 4 Oct 2018 · 13Comments · Source: pydata/xarray

Code Sample

I have this dataset, and I'd like to make it indexable by time:

<xarray.Dataset>
Dimensions:                (station_observations: 46862)
Dimensions without coordinates: station_observations
Data variables:
    time                   (station_observations) datetime64[ns] ...
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

Problem description

I expected to be able to use ds.set_coords to make the time variable an indexable coordinate. The variable IS converted to a coordinate, but it is not a dimension coordinate, so I can't index with it. I can use assign_coords(station_observations=ds.time) to make station_observations indexable by time, but then the name in semantically wrong, and the time variable still exists, which makes the code harder to maintain.

Expected Output

ds.set_coords('time', inplace=True)
<xarray.Dataset>
Dimensions:                (station_observations: 46862)
Coordinates:
    time                   (station_observations) datetime64[ns] ...
Dimensions without coordinates: station_observations
Data variables:
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

In [95]: ds.sel(time='1896')
ValueError: dimensions or multi-index levels ['time'] do not exist

with assign_coords:

In [97]: ds=ds.assign_coords(station_observations=ds.time)

In [98]: ds.sel(station_observations='1896')
Out[98]: 
<xarray.Dataset>
Dimensions:                (station_observations: 366)
Coordinates:
  * station_observations   (station_observations) datetime64[ns] 1896-01-01 ...
Data variables:
    time                   (station_observations) datetime64[ns] ...
    SNOW_ON_THE_GROUND     (station_observations) float64 ...
    ONE_DAY_SNOW           (station_observations) float64 ...
    ONE_DAY_RAIN           (station_observations) float64 ...
    ONE_DAY_PRECIPITATION  (station_observations) float64 ...
    MIN_TEMP               (station_observations) float64 ...
    MAX_TEMP               (station_observations) float64 ...
Attributes:
    elevation:     15.0

works correctly, but looks ugly. It would be nice if the time variable could be assigned as a dimension directly. I can drop the time variable and rename the station_observations, but it's a little annoying to do so.

Output of `xr.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.16.0-041600-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8

xarray: 0.10.2
pandas: 0.22.0
numpy: 1.13.3
scipy: 0.19.1
netCDF4: 1.3.1
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: 1.2.0
cyordereddict: None
dask: 0.16.0
distributed: None
matplotlib: 2.1.1
cartopy: None
seaborn: None
setuptools: 39.0.1
pip: 9.0.1
conda: None
pytest: None
IPython: 5.5.0
sphinx: None

usage question

Source

nedclimaterisk

All 13 comments

Hi @nedclimaterisk.
Thanks for the raising an issue.

In that case, you can use swap_dims,

In [1]: import xarray as xr
   ...: ds = xr.Dataset({'x': ('i', [0, 1, 2]), 'y': ('i', ['a', 'b', 'c'])})
   ...: ds
   ...: 
   ...: 
Out[1]: 
<xarray.Dataset>
Dimensions:  (i: 3)
Dimensions without coordinates: i
Data variables:
    x        (i) int64 0 1 2
    y        (i) <U1 'a' 'b' 'c'

In [2]: ds.swap_dims({'i': 'x'})
Out[2]: 
<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
  * x        (x) int64 0 1 2
Data variables:
    y        (x) <U1 'a' 'b' 'c'

fujiisoup on 4 Oct 2018

👍1

I'm closing this issue, but if you have further issue, do not hesitate to reopen this.

fujiisoup on 4 Oct 2018

Awesome, thank you @fujiisoup.

It might be worth putting a "see also" note in the assign_coords and set_coords documentation for this. I tried searching quite a bit, but did not find this.

nedclimaterisk on 4 Oct 2018

It might be worth putting a "see also" note in the assign_coords and set_coords documentation for this. I tried searching quite a bit, but did not find this.

Thanks for the suggestion. It sounds a good idea. Do you mind to send a PR for this?

fujiisoup on 4 Oct 2018

@fujiisoup This method works when trying to add a single coordinate, but what about when you're trying to add multiple coordinates? Example:

import xarray as xr
ds = xr.Dataset({'x': ('i', [0, 1, 2]), 'y': ('i', ['a', 'b', 'c']),'z': ('i', ['a', 'b', 'c'])})
ds= ds.set_coords(['y','z'])
ds.swap_dims({'i':'y','i':'z'}) #doesn't work

ds = xr.Dataset({'x': ('i', [0, 1, 2]), 'y': ('i', ['a', 'b', 'c']),'z': ('i', ['a', 'b', 'c'])})
ds= ds.set_coords('y')
ds=ds.swap_dims({'i':'y'}) 
ds.set_coords('z') #doesn't work either

Is there any reason that this is the default behavior? This is a bit frustrating to work with after creating an xarray dataset from pandas.

M-Harrington on 22 Nov 2019

ds.rename_dims({"i": "y"})

Is this what you want?

dcherian on 22 Nov 2019

@dcherian not quite because I want z and y to both have the star next to them (or bolded in your screenshot) so that they're proper coordinates. I likewise thought that the answer would be as simple as:

data=pd.DataFrame({'x': [0, 1, 2], 'y': ['a', 'b', 'c'],'z': ['a', 'b', 'c']})
xr.DataArray(data.x, coords={'y': data.y, 'x':data.z})

But again, I haven't gotten anywhere this way either and getting the error ValueError: coordinate y has dimensions ('y',), but these are not a subset of the DataArray dimensions ('dim_0',)

M-Harrington on 22 Nov 2019

So that's not possible. You can't have both y and z be 1D coordinate variables for x since x is 1D.

What are you ultimately trying to do after the conversion to DataArray?

dcherian on 22 Nov 2019

This also doesn't work feeding y and z as data.y.values and data.z.values which are 1d arrays.

Ultimately merge to another dataset with the same coordinates. Seems like there's something obvious I'm missing here but I haven't been able to figure out what it is.

Ah I see in this example I need a dataset that's 3x3, let me fix the example and see if it's still relevant to my issue

M-Harrington on 22 Nov 2019

to get your example to work, use this:

data=pd.DataFrame({'x': [0, 1, 2], 'y': ['a', 'b', 'c'],'z': ['a', 'b', 'c']})
xr.DataArray(data.x, coords={'y': data.y, 'x':data.z}, dims="y")

to get both as dimensions, use

df = pd.DataFrame({'x': [0, 1, 2], 'y': ['a', 'b', 'c'],'z': ['a', 'b', 'c']})
ds = df.set_index(["y", "z"]).to_xarray()

keewis on 22 Nov 2019

🎉1

This worked perfectly, thanks so much!

M-Harrington on 22 Nov 2019

just note that in the end the result is still 2D with the missing values filled with nan

keewis on 22 Nov 2019

Right that's actually desired behavior to begin with so this works out

M-Harrington on 22 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Support flexible DataArray shapes in Dataset

ray306 · 4Comments

Masking and preserving int type

zxdawn · 3Comments

Awkward array backend?

benbovy · 3Comments

TypeError for NetCDF float16 output

duncanwp · 4Comments

Lost coords after multiplication

Yefee · 4Comments

Xarray: Set one-dimensional data variable as dimension coordinate?

Code Sample

Problem description

Expected Output

Output of xr.show_versions()

INSTALLED VERSIONS

All 13 comments

Related issues

Output of `xr.show_versions()`