I am using the newly updated expand_dims API that was recently updated with this PR https://github.com/pydata/xarray/pull/2757. However the flag setting behaviour can also be observed using the old API syntax.
>>> expanded_da = xr.DataArray(np.random.rand(3,3), coords={'x': np.arange(3), 'y': np.arange(3)}, dims=('x', 'y')) # Create a 2D DataArray
>>> expanded_da
<xarray.DataArray (x: 3, y: 3)>
array([[0.148579, 0.463005, 0.224993],
[0.633511, 0.056746, 0.28119 ],
[0.390596, 0.298519, 0.286853]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
>>> expanded_da.data.flags # Check current state of numpy flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
>>> expanded_da.loc[0, 0] = 2.22 # Set a single value before expanding
>>> expanded_da # It works, the single value is set
<xarray.DataArray (x: 3, y: 3)>
array([[2.22 , 0.463005, 0.224993],
[0.633511, 0.056746, 0.28119 ],
[0.390596, 0.298519, 0.286853]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
>>> expanded_da = expanded_da.expand_dims({'z': 3}, -1) # Add a new dimension 'z'
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22 , 2.22 , 2.22 ],
[0.463005, 0.463005, 0.463005],
[0.224993, 0.224993, 0.224993]],
[[0.633511, 0.633511, 0.633511],
[0.056746, 0.056746, 0.056746],
[0.28119 , 0.28119 , 0.28119 ]],
[[0.390596, 0.390596, 0.390596],
[0.298519, 0.298519, 0.298519],
[0.286853, 0.286853, 0.286853]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
Dimensions without coordinates: z
>>> expanded_da['z'] = np.arange(3) # Add new coordinates to the new dimension 'z'
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22 , 2.22 , 2.22 ],
[0.463005, 0.463005, 0.463005],
[0.224993, 0.224993, 0.224993]],
[[0.633511, 0.633511, 0.633511],
[0.056746, 0.056746, 0.056746],
[0.28119 , 0.28119 , 0.28119 ]],
[[0.390596, 0.390596, 0.390596],
[0.298519, 0.298519, 0.298519],
[0.286853, 0.286853, 0.286853]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
* z (z) int64 0 1 2
>>> expanded_da.loc[0, 0, 0] = 9.99 # Attempt to set a single value, get 'read-only' error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/dataarray.py", line 113, in __setitem__
self.data_array[pos_indexers] = value
File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/dataarray.py", line 494, in __setitem__
self.variable[key] = value
File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/variable.py", line 714, in __setitem__
indexable[index_tuple] = value
File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/indexing.py", line 1174, in __setitem__
array[key] = value
ValueError: assignment destination is read-only
>>> expanded_da.data.flags # Check flags on the DataArray, notice they have changed
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : False
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
>>> expanded_da.data.setflags(write = 1) # Make array writeable again
>>> expanded_da.data.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
>>> expanded_da.loc[0, 0, 0] # Check the value I want to overwrite
<xarray.DataArray ()>
array(2.22)
Coordinates:
x int64 0
y int64 0
z int64 0
>>> expanded_da.loc[0, 0, 0] = 9.99 # Attempt to overwrite single value, instead it overwrites all values in the array located at [0, 0]
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[9.99 , 9.99 , 9.99 ],
[0.463005, 0.463005, 0.463005],
[0.224993, 0.224993, 0.224993]],
[[0.633511, 0.633511, 0.633511],
[0.056746, 0.056746, 0.056746],
[0.28119 , 0.28119 , 0.28119 ]],
[[0.390596, 0.390596, 0.390596],
[0.298519, 0.298519, 0.298519],
[0.286853, 0.286853, 0.286853]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
* z (z) int64 0 1 2
When applying the operation 'expand_dims({'z': 3}, -1)' on a DataArray the underlying Numpy array flags are changed. 'C_CONTIGUOUS' is set to False, and 'WRITEABLE' is set to False, and 'OWNDATA' is set to False. Upon changing 'WRITEABLE' back to True, when I try to set a single value in the DataArray using the '.loc' operator it will instead set all the values in that selected inner array.
I am new to Xarray so I can't be entirely sure if this expected behaviour. Regardless I would expect that adding a new dimension to the array would not make that array 'read-only'. I would also not expect the '.loc' method to work differently to how it would otherwise.
It's also not congruent with the Numpy 'expand_dims' operation. Because when I call the operation 'np.expand_dims(np_arr, axis=-1)' the 'C_CONTIGUOUS ' and 'WRITEABLE ' flags will not be modified.
Here is a similar flow of operations that demonstrates the behaviour I would expect from the DataArray after applying 'expand_dims':
>>> non_expanded_da = xr.DataArray(np.random.rand(3,3,3), coords={'x': np.arange(3), 'y': np.arange(3)}, dims=('x', 'y', 'z')) # Create the new DataArray to be in the same state as I would expect it to be in after applying the operation 'expand_dims({'z': 3}, -1)'
>>> non_expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[0.017221, 0.374267, 0.231979],
[0.678884, 0.512903, 0.737573],
[0.985872, 0.1373 , 0.4603 ]],
[[0.764227, 0.825059, 0.847694],
[0.482841, 0.708206, 0.486576],
[0.726265, 0.860627, 0.435101]],
[[0.117904, 0.40569 , 0.274288],
[0.079321, 0.647562, 0.847459],
[0.57494 , 0.578745, 0.125309]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
Dimensions without coordinates: z
>>> non_expanded_da.data.flags # Check flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
>>> non_expanded_da['z'] = np.arange(3) # Set coordinate for dimension 'z'
>>> non_expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[0.017221, 0.374267, 0.231979],
[0.678884, 0.512903, 0.737573],
[0.985872, 0.1373 , 0.4603 ]],
[[0.764227, 0.825059, 0.847694],
[0.482841, 0.708206, 0.486576],
[0.726265, 0.860627, 0.435101]],
[[0.117904, 0.40569 , 0.274288],
[0.079321, 0.647562, 0.847459],
[0.57494 , 0.578745, 0.125309]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
* z (z) int64 0 1 2
>>> non_expanded_da.loc[0, 0, 0] = 2.22 # Set value using .loc method
>>> non_expanded_da # The single value referenced is set which is what I expect to happen
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22 , 0.374267, 0.231979],
[0.678884, 0.512903, 0.737573],
[0.985872, 0.1373 , 0.4603 ]],
[[0.764227, 0.825059, 0.847694],
[0.482841, 0.708206, 0.486576],
[0.726265, 0.860627, 0.435101]],
[[0.117904, 0.40569 , 0.274288],
[0.079321, 0.647562, 0.847459],
[0.57494 , 0.578745, 0.125309]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
* z (z) int64 0 1 2
xr.show_versions()commit: None
python: 3.6.7 (default, Dec 29 2018, 12:05:36)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)]
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.4.1.1
xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.2
scipy: 1.2.1
netCDF4: 1.5.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.6.1.post1
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.0.3
cartopy: 0.17.0
seaborn: None
setuptools: 39.0.1
pip: 10.0.1
conda: None
pytest: None
IPython: None
sphinx: None
As you've noticed, these arrays are "read only" because otherwise indexing can modify more than the original values, e.g., consider:
original = xr.DataArray(np.zeros(3), dims='x')
result = original.expand_dims(y=2)
result.data.flags.writeable = True
result[0, 0] = 1
print(result)
Both "y" values were set to 1!
<xarray.DataArray (y: 2, x: 3)>
array([[1., 0., 0.],
[1., 0., 0.]])
Dimensions without coordinates: y, x
The work around is to call .copy() on the array after calling expand_dims(), e.g.,
original = xr.DataArray(np.zeros(3), dims='x')
result = original.expand_dims(y=2).copy()
result[0, 0] = 1
print(result)
Now the correct result is printed:
<xarray.DataArray (y: 2, x: 3)>
array([[1., 0., 0.],
[0., 0., 0.]])
Dimensions without coordinates: y, x
This is indeed intended behavior: by making the result read-only, we can expand dimensions without copying the original data, and without needing to worry about indexing modifying the wrong values.
That said, we could certainly improve the usability of this feature in xarray. Some options:
.copy() in the error message xarray prints when an array is read-only.copy argument to expand_dims, so users can write copy=True if they want a writeable result.expand_dims() is sometimes but not always writeable? Thank you @shoyer for taking the time to educate me on this. I understand completely now.
I agree that solutions one and two would be helpful for future developers new to Xarray and the expand_dims operation when they eventually encounter this behaviour. I also agree that option three would be confusing, and had it of been implemented as such I would still have found myself to be asking a similar question about why it is like that.
Another option to consider which might be easier still would be to just update the expand_dims documentation to include a note about this behaviour and the copy solution.
Thanks again!
OK, we would definitely welcome a pull request to improve this error message and the documentation for expand_dims!
I lean slightly against adding the copy argument since it's just as easy to add .copy() afterwards (that's one less function argument).
Yes good point. I do also think that the expand_dims interface should really only be responsible for that one single operation. If you then want to make a copy then go ahead and use that separate method afterwards.
Ok great, if I get time later today I'll see if I can't pick that up; that is if someone hasn't already done so in the meantime.
I am also affected in some code which used to work with earlier versions of xarray. In this case, I call ds.expand_dims('new_dim') on some dataset (not DataArray), e.g.:
ds = xr.Dataset({'testvar': (['x'], np.zeros(3))})
ds1 = ds.expand_dims('y').copy()
ds1.testvar.data.flags
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : False
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
The .copy() workaround is not helping in this case, I am not sure how to fix this?
I have just realized that .copy(deep=True) is a possible fix for datasets.
Most helpful comment
As you've noticed, these arrays are "read only" because otherwise indexing can modify more than the original values, e.g., consider:
Both "y" values were set to 1!
The work around is to call
.copy()on the array after callingexpand_dims(), e.g.,Now the correct result is printed:
This is indeed intended behavior: by making the result read-only, we can expand dimensions without copying the original data, and without needing to worry about indexing modifying the wrong values.
That said, we could certainly improve the usability of this feature in xarray. Some options:
.copy()in the error message xarray prints when an array is read-only.copyargument toexpand_dims, so users can writecopy=Trueif they want a writeable result.expand_dims()is sometimes but not always writeable?