Xarray: "TypeError: invalid type promotion" when masking some datasets with xr.where

Created on 17 Nov 2020  路  2Comments  路  Source: pydata/xarray

Hi!
I'm trying to mask some climate data, which usually works well with the ds.where() command in the case of rectangles. In this case, I'd like to mask out an area of this blue shape:

Screenshot 2020-11-17 at 08 57 52

To do this, I'm using this combination of ds.where and xr.where:

import xarray as xr
import numpy as np
CCSM = xr.open_dataset('datafiles/CCSM/CCSM.nc', decode_times=True)
CCSM['lon_2'] = np.linspace(60,110,num=21)
BHIST = xr.open_dataset('datafiles/CESM2.1.BHIST/CESM2.1.BHIST.nc', decode_times=True)
BHIST['lon_2'] = np.linspace(60,110,num=41)

def selectarea( input_data):
    output = input_data.where(input_data.lat<40).where(input_data.lat>28.5).where(input_data.lon_2>92.5).where(input_data.lon_2<104)
    output = xr.where((output.lat<32.5) & (output.lon_2<98),np.nan,output)
    return output;

This works with all my datasets, except CCSM. When working normally, only the blue shape area is left, the rest is NaN.
Running selectarea(CCSM), however, results in this error code:


TypeError Traceback (most recent call last)
in
10 return output;
11
---> 12 selectarea(CCSM)

in selectarea(input_data)
7 def selectarea( input_data):
8 output = input_data.where(input_data.lat<40).where(input_data.lat>28.5).where(input_data.lon_2>92.5).where(input_data.lon_2<104)
----> 9 output = xr.where((output.lat<32.5) & (output.lon_2<98),np.nan,output)
10 return output;
11

/anaconda3/envs/geocat/lib/python3.7/site-packages/xarray/core/computation.py in where(cond, x, y)
1499 join="exact",
1500 dataset_join="exact",
-> 1501 dask="allowed",
1502 )
1503

/anaconda3/envs/geocat/lib/python3.7/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, *args)
1053 dataset_join=dataset_join,
1054 fill_value=dataset_fill_value,
-> 1055 keep_attrs=keep_attrs,
1056 )
1057 elif any(isinstance(a, DataArray) for a in args):

/anaconda3/envs/geocat/lib/python3.7/site-packages/xarray/core/computation.py in apply_dataset_vfunc(func, signature, join, dataset_join, fill_value, exclude_dims, keep_attrs, *args)
380
381 result_vars = apply_dict_of_variables_vfunc(
--> 382 func, *args, signature=signature, join=dataset_join, fill_value=fill_value
383 )
384

/anaconda3/envs/geocat/lib/python3.7/site-packages/xarray/core/computation.py in apply_dict_of_variables_vfunc(func, signature, join, fill_value, args)
325 result_vars = {}
326 for name, variable_args in zip(names, grouped_by_name):
--> 327 result_vars[name] = func(
variable_args)
328
329 if signature.num_outputs > 1:

/anaconda3/envs/geocat/lib/python3.7/site-packages/xarray/core/computation.py in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, output_sizes, keep_attrs, meta, args)
602 "apply_ufunc: {}".format(dask)
603 )
--> 604 result_data = func(
input_data)
605
606 if signature.num_outputs == 1:

/anaconda3/envs/geocat/lib/python3.7/site-packages/xarray/core/duck_array_ops.py in where(condition, x, y)
266 def where(condition, x, y):
267 """Three argument where() with better dtype promotion rules."""
--> 268 return _where(condition, *as_shared_dtype([x, y]))
269
270

/anaconda3/envs/geocat/lib/python3.7/site-packages/xarray/core/duck_array_ops.py in as_shared_dtype(scalars_or_arrays)
174 # Note that result_type() safely gets the dtype from dask arrays without
175 # evaluating them.
--> 176 out_type = dtypes.result_type(*arrays)
177 return [x.astype(out_type, copy=False) for x in arrays]
178

/anaconda3/envs/geocat/lib/python3.7/site-packages/xarray/core/dtypes.py in result_type(arrays_and_dtypes)
167 return np.dtype(object)
168
--> 169 return np.result_type(
arrays_and_dtypes)

<__array_function__ internals> in result_type(args, *kwargs)

TypeError: invalid type promotion

If I comment out the xr.where line, the error disappears.

Do you have any suggestion why this is? Could it be a bug in xr.where? Or any potential fix?

Here are the two datasets included in this example:

https://liveuclac-my.sharepoint.com/:u:/g/personal/zcaptbs_ucl_ac_uk/ERYTNQ4NmcpPrNjpdwUuxWUBrlwX7CkZvIBCLZZQQ9s0HA?e=ERH9pq

Thanks!
//Tor

usage question

All 2 comments

the issue is the dtype of the time_bnds variable: while in BHIST it is of dtype object (it contains cftime.DatetimeNoLeap objects), in CCSM it is of dtype datetime64. object dtypes can contain anything so passing in a np.nan to xr.where is not an issue, but datetime64 only supports np.NaT.
Dataset.where will automatically choose the correct fill value, which is why only the call to x.where fails.

Also, there is an issue with your selection: try plotting it, the shape is not the one in the picture. I'd use something like

mask = (
    (a.lat < 40) & (a.lat > 28.5) & (a.lon_2 > 92.5) & (a.lon_2 < 104)
    & np.logical_not((a.lat < 32.5) & (a.lon_2 < 98))
)
output = input_data.where(mask)

or maybe divide the shape into two rectangles and join both together using |

Thanks a lot, @keewis ! Your suggested mask works like a dream - this puts an end to many frustrations :)

//Tor

Was this page helpful?
0 / 5 - 0 ratings

Related issues

aseyboldt picture aseyboldt  路  5Comments

ray306 picture ray306  路  4Comments

mathause picture mathause  路  4Comments

blaylockbk picture blaylockbk  路  4Comments

benbovy picture benbovy  路  3Comments