Xarray: Dataset.drop() no longer works with a set as an argument

Created on 20 Nov 2019  路  15Comments  路  Source: pydata/xarray

MCVE Code Sample

# Your code here
#!/usr/bin/env python

import xarray
import numpy

ds = xarray.Dataset()
ds['x'] = ('x', numpy.linspace(0., 1., 100))
ds['y'] = ('y', numpy.linspace(0., 1., 50))
print(ds)
ds = ds.drop(set('x'))
print(ds)

Expected Output

<xarray.Dataset>
Dimensions:  (x: 100, y: 50)
Coordinates:
  * x        (x) float64 0.0 0.0101 0.0202 0.0303 ... 0.9697 0.9798 0.9899 1.0
  * y        (y) float64 0.0 0.02041 0.04082 0.06122 ... 0.9592 0.9796 1.0
Data variables:
    *empty*
<xarray.Dataset>
Dimensions:  (y: 50)
Coordinates:
  * y        (y) float64 0.0 0.02041 0.04082 0.06122 ... 0.9592 0.9796 1.0
Data variables:
    *empty*

Problem Description

In versions before xarray 0.14.1, the code above involving calls to Dataset.drop() with a set as an argument worked as expected. With the new release, the result is an error as shown below. This is breaking backwards compatibility with our software (MPAS-Analysis).

<xarray.Dataset>
Dimensions:  (x: 100, y: 50)
Coordinates:
  * x        (x) float64 0.0 0.0101 0.0202 0.0303 ... 0.9697 0.9798 0.9899 1.0
  * y        (y) float64 0.0 0.02041 0.04082 0.06122 ... 0.9592 0.9796 1.0
Data variables:
    *empty*
Traceback (most recent call last):
  File "./drop_issue.py", line 10, in <module>
    ds = ds.drop(set('x'))
  File "/home/xylar/miniconda3/envs/test/lib/python3.7/site-packages/xarray/core/dataset.py", line 3643, in drop
    return self.drop_sel(labels, errors=errors)
  File "/home/xylar/miniconda3/envs/test/lib/python3.7/site-packages/xarray/core/dataset.py", line 3689, in drop_sel
    labels = either_dict_or_kwargs(labels, labels_kwargs, "drop")
  File "/home/xylar/miniconda3/envs/test/lib/python3.7/site-packages/xarray/core/utils.py", line 257, in either_dict_or_kwargs
    "the first argument to .%s must be a dictionary" % func_name
ValueError: the first argument to .drop must be a dictionary

Output of xr.show_versions()

Paste the output here xr.show_versions() here

INSTALLED VERSIONS

commit: None
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-1063-oem
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.1

xarray: 0.14.1
pandas: 0.25.3
numpy: 1.17.3
scipy: 1.3.2
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.1
dask: 2.8.0
distributed: 2.8.0
matplotlib: 3.1.2
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 41.6.0.post20191101
pip: 19.3.1
conda: None
pytest: 5.3.0
IPython: None
sphinx: None

bug

Most helpful comment

the reason for set raising an error is that in drop is_list_like(labels) or is_scalar(labels) is used instead of isinstance(labels, str) or not isinstance(labels, Iterable). Since is_list_like explicitly checks for tuple and list, passing set obviously won't call the right method.

All 15 comments

I believe a set() object should be a valid labels argument according to the documentation of drop():
http://xarray.pydata.org/en/v0.14.0/generated/xarray.Dataset.drop.html#xarray.Dataset.drop

How about drop_vars instead?

@dcherian - are you suggesting that @xylar change his downstream code to use drop_vars instead? drop_vars has been deprecated. I don't think that's the right response here.

Instead we should fix this bug. The docs for drop suggest that you should be able to provide a set as an input. Somehow we made that not work in the latest release.

Yes yes I agree!

drop_vars would be a quick unblocking fix...

@dcherian - are you suggesting that @xylar change his downstream code to use drop_vars instead? drop_vars has been deprecated. I don't think that's the right response here.

I'm confused. In 0.14.1, drop_vars seems to have just been introduced and drop seems to have been dropped.

How about drop_vars instead?

Yes, I switched to drop_vars but first I had to make a new build of my conda package that constrains the dependency xarray <0.14.0 because it's broken with 0.14.1. So that's why I'm concerned. I have to do a new release before this fix propagates to my users.

So @rabernat and @dcherian, please clarify if I should be switching to drop_vars or if I am going to run into issues yet again if I do so?

I think drop was deprecated in favor of drop_sel and drop_vars (#3475). Using v0.14.1, this works:

ds = xr.Dataset({"a": ("b", [0, 1, 2]), "c": [-1, 2, 4]}, coords={"b": list("abc")})
ds.drop_vars({"a", "b"})
ds.drop_vars(["a", "b"])
ds.drop(["a", "b"])

Additionally, I think we have a documentation bug: I can't reach the documentation on drop.

so I guess we need to figure out why sets don't work anymore?

Additionally, I think we have a documentation bug: I can't reach the documentation on drop.

Yes, I noticed that, too. I assumed it wasn't being documented anymore because it was considered deprecated.

I apologize for this slight tangent but it's related to why we use sets in drop(). Is there a good way in xarray to only keep a list of variables? Our approach is to create a set of all variable and subtract from it a set of variables we want to keep and to drop the resulting set of variables we don't want to keep. But it's hard for me to imagine others don't also need this functionality.

explicitly selecting should work like this:

ds = xr.Dataset(
    {"a": ("b", [0, 1, 2]), "c": ("b", [-1, 2, 4]), "d": ("b", [5, 1, 3])},
    coords={"b": list("abc")},
)
variables = ["a", "d"]
ds[variables]

I think Dataset.__getitem__ allows any kind of sequence here (e.g. sets)

Ah, thanks very much! I'll switch to using that syntax instead.

~drop_vars has been deprecated.~ Apparently that is wrong. Sorry for adding to the noise.

the reason for set raising an error is that in drop is_list_like(labels) or is_scalar(labels) is used instead of isinstance(labels, str) or not isinstance(labels, Iterable). Since is_list_like explicitly checks for tuple and list, passing set obviously won't call the right method.

drop_vars was just added! Let me know any questions / issues. We should fix set, though for your case it's best to switch to drop_vars

馃憤 re @keewis 's point re ds[variables]

We should fix set, though for your case it's best to switch to drop_vars

This also happens with other iterables like dict_keys.

Was this page helpful?
0 / 5 - 0 ratings