expand_dims functionalityApparently, expand_dims can only create a dimension for a point coordinate, i.e. it promotes a scalar coordinate into 1D coordinate. Here is an example:
>>> coords = {"b": range(5), "c": range(3)}
>>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys()))
>>> da
<xarray.DataArray (b: 5, c: 3)>
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
Coordinates:
* b (b) int64 0 1 2 3 4
* c (c) int64 0 1 2
>>> da["a"] = 0 # create a point coordinate
>>> da
<xarray.DataArray (b: 5, c: 3)>
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
Coordinates:
* b (b) int64 0 1 2 3 4
* c (c) int64 0 1 2
a int64 0
>>> da.expand_dims("a") # create a new dimension "a" for the point coordinated
<xarray.DataArray (a: 1, b: 5, c: 3)>
array([[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]]])
Coordinates:
* b (b) int64 0 1 2 3 4
* c (c) int64 0 1 2
* a (a) int64 0
>>>
I want to be able to do 2 more things with expand_dims or maybe a related/similar method:
1) broadcast the data across 1 or more new dimensions
2) expand an existing dimension to include 1 or more new coordinates
from collections import OrderedDict
import xarray as xr
def expand_dimensions(data, fill_value=np.nan, **new_coords):
"""Expand (or add if it doesn't yet exist) the data array to fill in new
coordinates across multiple dimensions.
If a dimension doesn't exist in the dataarray yet, then the result will be
`data`, broadcasted across this dimension.
>>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
>>> expand_dimensions(da, b=[1, 2, 3, 4, 5])
<xarray.DataArray (a: 3, b: 5)>
array([[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.],
[ 3., 3., 3., 3., 3.]])
Coordinates:
* a (a) int64 0 1 2
* b (b) int64 1 2 3 4 5
Or, if `dim` is already a dimension in `data`, then any new coordinate
values in `new_coords` that are not yet in `data[dim]` will be added,
and the values corresponding to those new coordinates will be `fill_value`.
>>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
>>> expand_dimensions(da, a=[1, 2, 3, 4, 5])
<xarray.DataArray (a: 6)>
array([ 1., 2., 3., 0., 0., 0.])
Coordinates:
* a (a) int64 0 1 2 3 4 5
Args:
data (xarray.DataArray):
Data that needs dimensions expanded.
fill_value (scalar, xarray.DataArray, optional):
If expanding new coords this is the value of the new datum.
Defaults to `np.nan`.
**new_coords (list[int | str]):
The keywords are arbitrary dimensions and the values are
coordinates of those dimensions that the data will include after it
has been expanded.
Returns:
xarray.DataArray:
Data that had its dimensions expanded to include the new
coordinates.
"""
ordered_coord_dict = OrderedDict(new_coords)
shape_da = xr.DataArray(
np.zeros(list(map(len, ordered_coord_dict.values()))),
coords=ordered_coord_dict,
dims=ordered_coord_dict.keys())
expanded_data = xr.broadcast(data, shape_da)[0].fillna(fill_value)
return expanded_data
Here's an example of broadcasting data across a new dimension:
>>> coords = {"b": range(5), "c": range(3)}
>>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys()))
>>> expand_dimensions(da, a=[0, 1, 2])
<xarray.DataArray (b: 5, c: 3, a: 3)>
array([[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]],
[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]],
[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]],
[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]],
[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]]])
Coordinates:
* b (b) int64 0 1 2 3 4
* c (c) int64 0 1 2
* a (a) int64 0 1 2
Here's an example of expanding an existing dimension to include new coordinates:
>>> expand_dimensions(da, b=[5, 6])
<xarray.DataArray (b: 7, c: 3)>
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[nan, nan, nan],
[nan, nan, nan]])
Coordinates:
* b (b) int64 0 1 2 3 4 5 6
* c (c) int64 0 1 2
If no one else is already working on this, and if it seems like a useful addition to XArray, then I would more than happy to work on this. Please let me know.
Thank you,
Martin
broadcast the data across 1 or more new dimensions
Yes, this feels in scope for expand_dims(). But I think there are two separate features here:
I think we would want both to be supported -- you should not be required to supply coordinate labels in order to expand to a dimension of size > 1. We can imagine the first being spelled like da.expand_dims({'a': 3}) or da.expand_dims(a=3).
expand an existing dimension to include 1 or more new coordinates
This feels a little different from expand_dims to me. Here the fundamental operation is alignment/reindexing, not broadcasting across a new dimension. The result also looks different, because you get all the NaN values.
I would probably write this with reindex, e.g.,
In [12]: da.reindex(b=list(da.b.values)+[5, 6])
Out[12]:
<xarray.DataArray (b: 7, c: 3)>
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[nan, nan, nan],
[nan, nan, nan]])
Coordinates:
* b (b) int64 0 1 2 3 4 5 6
* c (c) int64 0 1 2
Hi,
Thanks for replying. I see what you mean about the 2 separate features.
Would it be alright if I opened a PR sometime soon that upgraded expand_dims to support the inserting/broadcasting dimensions with size > 1 (the first feature)?
I would use your suggested API, i.e. not requiring explicit coordinate names -- that makes sense. However, it feels like the dimension kwargs (i.e. the new dimension/dimensions), should be allowed to be given implicit or explicit coordinates, in case the user doesn't want 0-based integer coordinates for the new dimension. For example,
da.expand_dims(a=3)
is equivalent to
da.expand_dims(a=[0, 1, 2])
but this will also work
da.expand_dims(a=['w', 'x', 'y', 'z'])
where da is
```
coords = {"b": range(5), "c": range(3)}
da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys()))
da
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
Coordinates:
- b (b) int64 0 1 2 3 4
- c (c) int64 0 1 2
````
Does that make sense?
Thank you!
Martin
da.expand_dims(a=3) should not be equivalent to da.expand_dims(a=[0, 1, 2]) because the latter will also create a co-ordinate a. Am I understanding this right?
Those _would_ be equivalent, I think, assuming they're both manipulating the same da object (I meant for them to be separate calls not sequential, but even if they were sequential, expand_dims doesn't and wouldn't alter da, but instead return a new xarray object). I edited my above post to clarify what da is.
Well then I think they should be different.
Currently, da.expand_dims('a') gives
<xarray.DataArray (a: 1, b: 5, c: 3)>
array([[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]]])
Coordinates:
* b (b) int64 0 1 2 3 4
* c (c) int64 0 1 2
Dimensions without coordinates: a
da.expand_dims(a=3) should give
<xarray.DataArray (a: 3, b: 5, c: 3)>
...
Coordinates:
* b (b) int64 0 1 2 3 4
* c (c) int64 0 1 2
Dimensions without coordinates: a
da.expand_dims(a=[9, 10, 11]) should give
<xarray.DataArray (a: 3, b: 5, c: 3)>
...
Coordinates:
* a (a) int64 9 10 11
* b (b) int64 0 1 2 3 4
* c (c) int64 0 1 2
i.e. in this last case, the user has specified co-ordinate labels and so the returned DataArray has a new co-ordinate a.
Oh I see what you're saying. Yeah, that makes sense.
To get the equivalent of da.expand_dims(a=[9, 10, 11]), you'd do
>>> new = da.expand_dims(a=3)
>>> new
<xarray.DataArray (a: 3, b: 5, c: 3)>
...
Coordinates:
* b (b) int64 0 1 2 3 4
* c (c) int64 0 1 2
Dimensions without coordinates: a
>>> new["a"] = [9, 10, 11]
Would it be alright if I opened a PR sometime soon that upgraded
expand_dimsto support the inserting/broadcasting dimensions with size > 1 (the first feature)?
Yes, that sounds welcome to me!
I think much of the underlying logic should already exist on the Variable.set_dims() method. See also the either_dict_or_kwargs utility in xarray.core.utils.
Unfortunately this most recent change has broken my workflow. I was using expand_dims to add a named dimension back onto a DataArray, when the dimension had been previously removed with the sel method. I realize this may not be the best way of doing things, but I wanted to point out that there is a loss of functionality here.
import xarray as xr
da = xr.DataArray([0,1,2], dims=['dim1'], coords={'dim1':['a','b','c']})
print(da.dims) # returns ('dim1',)
da = da.sel({'dim1':'a'})
print(da.dims) # returns ()
da = da.expand_dims(da.coords) # fails in 0.12.1
print(da.dims) # returns ('dim1',) in 0.12.0
@barkls I think da.expand_dims(list(da.coords)) should work for this use-case.
Previously, we only used the argument to expand_dims() as a sequence, but now we distinguish between mappings and other sequences.
I don't know what the best resolution would be here, but this seems to be a hazard of duck-typing. I did not anticipate that some users would already be iterating over mappings like .coords.
Another solution could be adding support for da.sel(dim1='a', squeeze=False) to avoid losing the dim1 dimension/coordinate in the first place
Another solution could be adding support for
da.sel(dim1='a', squeeze=False)to avoid losing thedim1dimension/coordinate in the first place
Or equivalently, you could just do
da.sel(dim1=['a'])
@pletchm that is the solution I found as well. Thanks all for the suggestions!
@pletchm was this issue closed by #2757?
Yes, @TomNicholas. My PR got merged but I forgot to close the issue -- closing it now. Thanks for checking.
Most helpful comment
Well then I think they should be different.
Currently,
da.expand_dims('a')givesda.expand_dims(a=3)should giveda.expand_dims(a=[9, 10, 11])should givei.e. in this last case, the user has specified co-ordinate labels and so the returned DataArray has a new co-ordinate a.