Xarray: expand_dims() modifies numpy.ndarray.flags to write only, upon manually reverting this flag back, attempting to set a single inner value using .loc will instead set all of the inner array values

Created on 13 Apr 2019  路  6Comments  路  Source: pydata/xarray

I am using the newly updated expand_dims API that was recently updated with this PR https://github.com/pydata/xarray/pull/2757. However the flag setting behaviour can also be observed using the old API syntax.

>>> expanded_da = xr.DataArray(np.random.rand(3,3), coords={'x': np.arange(3), 'y': np.arange(3)}, dims=('x', 'y')) # Create a 2D DataArray
>>> expanded_da
<xarray.DataArray (x: 3, y: 3)>
array([[0.148579, 0.463005, 0.224993],
       [0.633511, 0.056746, 0.28119 ],
       [0.390596, 0.298519, 0.286853]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2

>>> expanded_da.data.flags # Check current state of numpy flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

>>> expanded_da.loc[0, 0] = 2.22 # Set a single value before expanding
>>> expanded_da # It works, the single value is set
<xarray.DataArray (x: 3, y: 3)>
array([[2.22    , 0.463005, 0.224993],
       [0.633511, 0.056746, 0.28119 ],
       [0.390596, 0.298519, 0.286853]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2

>>> expanded_da = expanded_da.expand_dims({'z': 3}, -1) # Add a new dimension 'z'
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22    , 2.22    , 2.22    ],
        [0.463005, 0.463005, 0.463005],
        [0.224993, 0.224993, 0.224993]],

       [[0.633511, 0.633511, 0.633511],
        [0.056746, 0.056746, 0.056746],
        [0.28119 , 0.28119 , 0.28119 ]],

       [[0.390596, 0.390596, 0.390596],
        [0.298519, 0.298519, 0.298519],
        [0.286853, 0.286853, 0.286853]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
Dimensions without coordinates: z

>>> expanded_da['z'] = np.arange(3) # Add new coordinates to the new dimension 'z'
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22    , 2.22    , 2.22    ],
        [0.463005, 0.463005, 0.463005],
        [0.224993, 0.224993, 0.224993]],

       [[0.633511, 0.633511, 0.633511],
        [0.056746, 0.056746, 0.056746],
        [0.28119 , 0.28119 , 0.28119 ]],

       [[0.390596, 0.390596, 0.390596],
        [0.298519, 0.298519, 0.298519],
        [0.286853, 0.286853, 0.286853]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
  * z        (z) int64 0 1 2

>>> expanded_da.loc[0, 0, 0] = 9.99 # Attempt to set a single value, get 'read-only' error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/dataarray.py", line 113, in __setitem__
    self.data_array[pos_indexers] = value
  File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/dataarray.py", line 494, in __setitem__
    self.variable[key] = value
  File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/variable.py", line 714, in __setitem__
    indexable[index_tuple] = value
  File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/indexing.py", line 1174, in __setitem__
    array[key] = value
ValueError: assignment destination is read-only

>>> expanded_da.data.flags # Check flags on the DataArray, notice they have changed
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

>>> expanded_da.data.setflags(write = 1) # Make array writeable again
>>> expanded_da.data.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

>>> expanded_da.loc[0, 0, 0] # Check the value I want to overwrite
<xarray.DataArray ()>
array(2.22)
Coordinates:
    x        int64 0
    y        int64 0
    z        int64 0

>>> expanded_da.loc[0, 0, 0] = 9.99 # Attempt to overwrite single value, instead it overwrites all values in the array located at [0, 0]
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[9.99    , 9.99    , 9.99    ],
        [0.463005, 0.463005, 0.463005],
        [0.224993, 0.224993, 0.224993]],

       [[0.633511, 0.633511, 0.633511],
        [0.056746, 0.056746, 0.056746],
        [0.28119 , 0.28119 , 0.28119 ]],

       [[0.390596, 0.390596, 0.390596],
        [0.298519, 0.298519, 0.298519],
        [0.286853, 0.286853, 0.286853]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
  * z        (z) int64 0 1 2

Problem description

When applying the operation 'expand_dims({'z': 3}, -1)' on a DataArray the underlying Numpy array flags are changed. 'C_CONTIGUOUS' is set to False, and 'WRITEABLE' is set to False, and 'OWNDATA' is set to False. Upon changing 'WRITEABLE' back to True, when I try to set a single value in the DataArray using the '.loc' operator it will instead set all the values in that selected inner array.

I am new to Xarray so I can't be entirely sure if this expected behaviour. Regardless I would expect that adding a new dimension to the array would not make that array 'read-only'. I would also not expect the '.loc' method to work differently to how it would otherwise.

It's also not congruent with the Numpy 'expand_dims' operation. Because when I call the operation 'np.expand_dims(np_arr, axis=-1)' the 'C_CONTIGUOUS ' and 'WRITEABLE ' flags will not be modified.

Expected Output

Here is a similar flow of operations that demonstrates the behaviour I would expect from the DataArray after applying 'expand_dims':

>>> non_expanded_da = xr.DataArray(np.random.rand(3,3,3), coords={'x': np.arange(3), 'y': np.arange(3)}, dims=('x', 'y', 'z')) # Create the new DataArray to be in the same state as I would expect it to be in after applying the operation 'expand_dims({'z': 3}, -1)'
>>> non_expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[0.017221, 0.374267, 0.231979],
        [0.678884, 0.512903, 0.737573],
        [0.985872, 0.1373  , 0.4603  ]],

       [[0.764227, 0.825059, 0.847694],
        [0.482841, 0.708206, 0.486576],
        [0.726265, 0.860627, 0.435101]],

       [[0.117904, 0.40569 , 0.274288],
        [0.079321, 0.647562, 0.847459],
        [0.57494 , 0.578745, 0.125309]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
Dimensions without coordinates: z

>>> non_expanded_da.data.flags # Check flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

>>> non_expanded_da['z'] = np.arange(3) # Set coordinate for dimension 'z'
>>> non_expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[0.017221, 0.374267, 0.231979],
        [0.678884, 0.512903, 0.737573],
        [0.985872, 0.1373  , 0.4603  ]],

       [[0.764227, 0.825059, 0.847694],
        [0.482841, 0.708206, 0.486576],
        [0.726265, 0.860627, 0.435101]],

       [[0.117904, 0.40569 , 0.274288],
        [0.079321, 0.647562, 0.847459],
        [0.57494 , 0.578745, 0.125309]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
  * z        (z) int64 0 1 2

>>> non_expanded_da.loc[0, 0, 0] = 2.22 # Set value using .loc method
>>> non_expanded_da # The single value referenced is set which is what I expect to happen
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22    , 0.374267, 0.231979],
        [0.678884, 0.512903, 0.737573],
        [0.985872, 0.1373  , 0.4603  ]],

       [[0.764227, 0.825059, 0.847694],
        [0.482841, 0.708206, 0.486576],
        [0.726265, 0.860627, 0.435101]],

       [[0.117904, 0.40569 , 0.274288],
        [0.079321, 0.647562, 0.847459],
        [0.57494 , 0.578745, 0.125309]]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2
  * z        (z) int64 0 1 2

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.7 (default, Dec 29 2018, 12:05:36)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)]
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: en_AU.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.4.1.1

xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.2
scipy: 1.2.1
netCDF4: 1.5.0
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: 0.9.6.1.post1
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.0.3
cartopy: 0.17.0
seaborn: None
setuptools: 39.0.1
pip: 10.0.1
conda: None
pytest: None
IPython: None
sphinx: None

design question error reporting good first issue help wanted

Most helpful comment

As you've noticed, these arrays are "read only" because otherwise indexing can modify more than the original values, e.g., consider:

original = xr.DataArray(np.zeros(3), dims='x')
result = original.expand_dims(y=2)
result.data.flags.writeable = True
result[0, 0] = 1
print(result)

Both "y" values were set to 1!

<xarray.DataArray (y: 2, x: 3)>
array([[1., 0., 0.],
       [1., 0., 0.]])
Dimensions without coordinates: y, x

The work around is to call .copy() on the array after calling expand_dims(), e.g.,

original = xr.DataArray(np.zeros(3), dims='x')
result = original.expand_dims(y=2).copy()
result[0, 0] = 1
print(result)

Now the correct result is printed:

<xarray.DataArray (y: 2, x: 3)>
array([[1., 0., 0.],
       [0., 0., 0.]])
Dimensions without coordinates: y, x

This is indeed intended behavior: by making the result read-only, we can expand dimensions without copying the original data, and without needing to worry about indexing modifying the wrong values.

That said, we could certainly improve the usability of this feature in xarray. Some options:

  • Mention the work-around of using .copy() in the error message xarray prints when an array is read-only.
  • Add a copy argument to expand_dims, so users can write copy=True if they want a writeable result.
  • Consider changing the behavior when a dimension is inserted with size 1 to make the result writeable -- in this case, individual elements of result can be modified (which will also modify the input array). But maybe it would be surprising to users if the result of expand_dims() is sometimes but not always writeable?

All 6 comments

As you've noticed, these arrays are "read only" because otherwise indexing can modify more than the original values, e.g., consider:

original = xr.DataArray(np.zeros(3), dims='x')
result = original.expand_dims(y=2)
result.data.flags.writeable = True
result[0, 0] = 1
print(result)

Both "y" values were set to 1!

<xarray.DataArray (y: 2, x: 3)>
array([[1., 0., 0.],
       [1., 0., 0.]])
Dimensions without coordinates: y, x

The work around is to call .copy() on the array after calling expand_dims(), e.g.,

original = xr.DataArray(np.zeros(3), dims='x')
result = original.expand_dims(y=2).copy()
result[0, 0] = 1
print(result)

Now the correct result is printed:

<xarray.DataArray (y: 2, x: 3)>
array([[1., 0., 0.],
       [0., 0., 0.]])
Dimensions without coordinates: y, x

This is indeed intended behavior: by making the result read-only, we can expand dimensions without copying the original data, and without needing to worry about indexing modifying the wrong values.

That said, we could certainly improve the usability of this feature in xarray. Some options:

  • Mention the work-around of using .copy() in the error message xarray prints when an array is read-only.
  • Add a copy argument to expand_dims, so users can write copy=True if they want a writeable result.
  • Consider changing the behavior when a dimension is inserted with size 1 to make the result writeable -- in this case, individual elements of result can be modified (which will also modify the input array). But maybe it would be surprising to users if the result of expand_dims() is sometimes but not always writeable?

Thank you @shoyer for taking the time to educate me on this. I understand completely now.

I agree that solutions one and two would be helpful for future developers new to Xarray and the expand_dims operation when they eventually encounter this behaviour. I also agree that option three would be confusing, and had it of been implemented as such I would still have found myself to be asking a similar question about why it is like that.

Another option to consider which might be easier still would be to just update the expand_dims documentation to include a note about this behaviour and the copy solution.

Thanks again!

OK, we would definitely welcome a pull request to improve this error message and the documentation for expand_dims!

I lean slightly against adding the copy argument since it's just as easy to add .copy() afterwards (that's one less function argument).

Yes good point. I do also think that the expand_dims interface should really only be responsible for that one single operation. If you then want to make a copy then go ahead and use that separate method afterwards.

Ok great, if I get time later today I'll see if I can't pick that up; that is if someone hasn't already done so in the meantime.

I am also affected in some code which used to work with earlier versions of xarray. In this case, I call ds.expand_dims('new_dim') on some dataset (not DataArray), e.g.:

ds = xr.Dataset({'testvar': (['x'], np.zeros(3))})
ds1 = ds.expand_dims('y').copy()
ds1.testvar.data.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

The .copy() workaround is not helping in this case, I am not sure how to fix this?

I have just realized that .copy(deep=True) is a possible fix for datasets.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mathause picture mathause  路  4Comments

blaylockbk picture blaylockbk  路  4Comments

ray306 picture ray306  路  4Comments

duncanwp picture duncanwp  路  4Comments

TomNicholas picture TomNicholas  路  4Comments