Xarray: Masking and preserving int type

Created on 8 Apr 2020 · 3Comments · Source: pydata/xarray

When DataArray is masked by .where(), the type is converted to float64.

But, if we need to use the DataArray ouput from .where() in .isel(), the dtype should be int.
(#3949 )

MCVE Code Sample

import numpy as np
import xarray as xr

val_arr = xr.DataArray(np.arange(27).reshape(3, 3, 3),
                       dims=['z', 'y', 'x'])

z_indices = xr.DataArray(np.array([[1, 0, 2],
                                  [0, 0, 1],
                                  [-2222, 0, 1]]),
                         dims=['y', 'x'])

fill_value = -2222
sub = z_indices.where(z_indices != fill_value)
indexed_array = val_arr.isel(z=sub)

Expected Output

array([[ 1,  0,  2],
       [ 0,  0,  1],
       [nan,  0,  1]])

Problem Description

  File "E:\miniconda3\envs\satpy\lib\site-packages\xarray\core\indexing.py", line 446, in __init__
    f"invalid indexer array, does not have integer dtype: {k!r}"
TypeError: invalid indexer array, does not have integer dtype: array([[ 1.,  0.,  2.],
       [ 0.,  0.,  1.],
       [nan,  0.,  1.]])

Currently, pandas supports NaN values. Is this possible for xarray? or another method around?

Source

zxdawn

All 3 comments

There has been a lot of discussion about the int vs nan problem in the past, here one issue #1194. My question for xarray-devs would be too, if there is some idea on adapting to the pandas scheme?

In the time being, you might just go the other way round (isel before where) and this little hack:

# overwrite fill_values with 0
sub = xr.where(z_indices == fill_value, 0, z_indices)
# isel with sub and mask with where
indexed_array = val_arr.isel(z=sub).where(z_indices != fill_value)

Update: Nevermind, this will make the indexed_array a float. You might use the same where-machinery and overwrite with a fill_value of your liking:

# overwrite fill_values with 0
sub = xr.where(z_indices == fill_value, 0, z_indices)
# isel with sub and mask with where
indexed_array = val_arr.isel(z=sub)
indexed_array = xr.where(z_indices == fill_value, fill_value, indexed_array)

I can't immediately see, but there might be a cleaner way to achieve this.

kmuehlbauer on 8 Apr 2020

👍1

@kmuehlbauer Thanks, Nice trick! It works well for this situation.

zxdawn on 8 Apr 2020

👍1

I would love to have support for integer NA values in xarray, but I don't think we want to build it into xarray.

Ideally this would either be built into NumPy (i.e., with a custom dtype, which will require some work before its possible) or someone could build an "integer with NA" duckarray, which could implement the various NumPy protocols such as __array_function__. The later is a bit less elegant but could be done today with very few changes in xarray.