I have this dataset, and I'd like to make it indexable by time:
<xarray.Dataset>
Dimensions: (station_observations: 46862)
Dimensions without coordinates: station_observations
Data variables:
time (station_observations) datetime64[ns] ...
SNOW_ON_THE_GROUND (station_observations) float64 ...
ONE_DAY_SNOW (station_observations) float64 ...
ONE_DAY_RAIN (station_observations) float64 ...
ONE_DAY_PRECIPITATION (station_observations) float64 ...
MIN_TEMP (station_observations) float64 ...
MAX_TEMP (station_observations) float64 ...
Attributes:
elevation: 15.0
I expected to be able to use ds.set_coords to make the time variable an indexable coordinate. The variable IS converted to a coordinate, but it is not a dimension coordinate, so I can't index with it. I can use assign_coords(station_observations=ds.time) to make station_observations indexable by time, but then the name in semantically wrong, and the time variable still exists, which makes the code harder to maintain.
ds.set_coords('time', inplace=True)
<xarray.Dataset>
Dimensions: (station_observations: 46862)
Coordinates:
time (station_observations) datetime64[ns] ...
Dimensions without coordinates: station_observations
Data variables:
SNOW_ON_THE_GROUND (station_observations) float64 ...
ONE_DAY_SNOW (station_observations) float64 ...
ONE_DAY_RAIN (station_observations) float64 ...
ONE_DAY_PRECIPITATION (station_observations) float64 ...
MIN_TEMP (station_observations) float64 ...
MAX_TEMP (station_observations) float64 ...
Attributes:
elevation: 15.0
In [95]: ds.sel(time='1896')
ValueError: dimensions or multi-index levels ['time'] do not exist
with assign_coords:
In [97]: ds=ds.assign_coords(station_observations=ds.time)
In [98]: ds.sel(station_observations='1896')
Out[98]:
<xarray.Dataset>
Dimensions: (station_observations: 366)
Coordinates:
* station_observations (station_observations) datetime64[ns] 1896-01-01 ...
Data variables:
time (station_observations) datetime64[ns] ...
SNOW_ON_THE_GROUND (station_observations) float64 ...
ONE_DAY_SNOW (station_observations) float64 ...
ONE_DAY_RAIN (station_observations) float64 ...
ONE_DAY_PRECIPITATION (station_observations) float64 ...
MIN_TEMP (station_observations) float64 ...
MAX_TEMP (station_observations) float64 ...
Attributes:
elevation: 15.0
works correctly, but looks ugly. It would be nice if the time variable could be assigned as a dimension directly. I can drop the time variable and rename the station_observations, but it's a little annoying to do so.
xr.show_versions()commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.16.0-041600-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8
xarray: 0.10.2
pandas: 0.22.0
numpy: 1.13.3
scipy: 0.19.1
netCDF4: 1.3.1
h5netcdf: None
h5py: None
Nio: None
zarr: None
bottleneck: 1.2.0
cyordereddict: None
dask: 0.16.0
distributed: None
matplotlib: 2.1.1
cartopy: None
seaborn: None
setuptools: 39.0.1
pip: 9.0.1
conda: None
pytest: None
IPython: 5.5.0
sphinx: None
Hi @nedclimaterisk.
Thanks for the raising an issue.
In that case, you can use swap_dims,
In [1]: import xarray as xr
...: ds = xr.Dataset({'x': ('i', [0, 1, 2]), 'y': ('i', ['a', 'b', 'c'])})
...: ds
...:
...:
Out[1]:
<xarray.Dataset>
Dimensions: (i: 3)
Dimensions without coordinates: i
Data variables:
x (i) int64 0 1 2
y (i) <U1 'a' 'b' 'c'
In [2]: ds.swap_dims({'i': 'x'})
Out[2]:
<xarray.Dataset>
Dimensions: (x: 3)
Coordinates:
* x (x) int64 0 1 2
Data variables:
y (x) <U1 'a' 'b' 'c'
I'm closing this issue, but if you have further issue, do not hesitate to reopen this.
Awesome, thank you @fujiisoup.
It might be worth putting a "see also" note in the assign_coords and set_coords documentation for this. I tried searching quite a bit, but did not find this.
It might be worth putting a "see also" note in the assign_coords and set_coords documentation for this. I tried searching quite a bit, but did not find this.
Thanks for the suggestion. It sounds a good idea. Do you mind to send a PR for this?
@fujiisoup This method works when trying to add a single coordinate, but what about when you're trying to add multiple coordinates? Example:
import xarray as xr
ds = xr.Dataset({'x': ('i', [0, 1, 2]), 'y': ('i', ['a', 'b', 'c']),'z': ('i', ['a', 'b', 'c'])})
ds= ds.set_coords(['y','z'])
ds.swap_dims({'i':'y','i':'z'}) #doesn't work
ds = xr.Dataset({'x': ('i', [0, 1, 2]), 'y': ('i', ['a', 'b', 'c']),'z': ('i', ['a', 'b', 'c'])})
ds= ds.set_coords('y')
ds=ds.swap_dims({'i':'y'})
ds.set_coords('z') #doesn't work either
Is there any reason that this is the default behavior? This is a bit frustrating to work with after creating an xarray dataset from pandas.
ds.rename_dims({"i": "y"})

Is this what you want?
@dcherian not quite because I want z and y to both have the star next to them (or bolded in your screenshot) so that they're proper coordinates. I likewise thought that the answer would be as simple as:
data=pd.DataFrame({'x': [0, 1, 2], 'y': ['a', 'b', 'c'],'z': ['a', 'b', 'c']})
xr.DataArray(data.x, coords={'y': data.y, 'x':data.z})
But again, I haven't gotten anywhere this way either and getting the error ValueError: coordinate y has dimensions ('y',), but these are not a subset of the DataArray dimensions ('dim_0',)
So that's not possible. You can't have both y and z be 1D coordinate variables for x since x is 1D.
What are you ultimately trying to do after the conversion to DataArray?
This also doesn't work feeding y and z as data.y.values and data.z.values which are 1d arrays.
Ultimately merge to another dataset with the same coordinates. Seems like there's something obvious I'm missing here but I haven't been able to figure out what it is.
Ah I see in this example I need a dataset that's 3x3, let me fix the example and see if it's still relevant to my issue
to get your example to work, use this:
data=pd.DataFrame({'x': [0, 1, 2], 'y': ['a', 'b', 'c'],'z': ['a', 'b', 'c']})
xr.DataArray(data.x, coords={'y': data.y, 'x':data.z}, dims="y")
to get both as dimensions, use
df = pd.DataFrame({'x': [0, 1, 2], 'y': ['a', 'b', 'c'],'z': ['a', 'b', 'c']})
ds = df.set_index(["y", "z"]).to_xarray()
This worked perfectly, thanks so much!
just note that in the end the result is still 2D with the missing values filled with nan
Right that's actually desired behavior to begin with so this works out