Xarray: Inconsistency between sel and isel when working with slice

Created on 20 Oct 2020  Β·  6Comments  Β·  Source: pydata/xarray

What happened:

Slice do not have the same effect when working with sel and isel.
The stop bound is not selected with isel while it is with sel.

What you expected to happen:
Either to select or not select the "stop" bound of the slice in both case.

Minimal Complete Verifiable Example:

import xarray as xr 
da = xr.Dataset()
da.coords["lat"] = [0, 1, 2, 3]
da["value"] = (('lat'),da["lat"].values+10)

print("Isel result is \n %s \n\n"%da.isel(lat=slice(1,2)))
print("Sel result is \n %s"%da.sel(lat=slice(1,2)))

Gives the following results

Isel result is 
 <xarray.Dataset>
Dimensions: (lat: 1)
Coordinates:
  * lat      (lat) int64 1
Data variables:
    value    (lat) int64 11 


Sel result is 
 <xarray.Dataset>
Dimensions:  (lat: 2)
Coordinates:
  * lat      (lat) int64 1 2
Data variables:
    value    (lat) int64 11 12

Anything else we need to know?:

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.2 | packaged by conda-forge | (default, Mar 5 2020, 17:11:00)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-118-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3

xarray: 0.15.0
pandas: 1.0.1
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.3
iris: None
bottleneck: None
dask: 2.12.0
distributed: 2.12.0
matplotlib: 3.2.0
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 46.0.0.post20200308
pip: 20.0.2
conda: None
pytest: 5.4.1
IPython: 7.13.0
sphinx: None

documentation

Most helpful comment

I understand this can be surprising at first glance, but the alternatives are more confusing β€” a label wouldn't select itself β€” so selecting a range including a label would require knowing what label followed the end of the range.

From the docs:

Like pandas, label based indexing in xarray is inclusive of both the start and stop bounds.

While the design is fixed, we'd take a PR to make the documentation clearer, if you have a view for how to improve it

All 6 comments

I understand this can be surprising at first glance, but the alternatives are more confusing β€” a label wouldn't select itself β€” so selecting a range including a label would require knowing what label followed the end of the range.

From the docs:

Like pandas, label based indexing in xarray is inclusive of both the start and stop bounds.

While the design is fixed, we'd take a PR to make the documentation clearer, if you have a view for how to improve it

Honestly, I think this is a design mistake in pandas that we copied blindly in xarray. I don't know how we could feasibly this, though -- we'd have come up with our own replacement for slice (xarray.Slice?) and deal with breaking lots of old code in subtle ways.

This feels a minor inconsistency where cleaning things up would not be worth the trouble.

What would your ideal design be @shoyer ? That sel operates the same as isel?

Honestly, I think this is a design mistake in pandas

~Honestly, I think this is a design mistake in C β€” 0 based indexing! πŸ˜€~ Edit: Not exactly applicable here

we'd have come up with our own replacement for slice (xarray.Slice?) and deal with breaking lots of old code in subtle ways.

One elegant way that rust deals with this is multiple range types β€” Range, RangeInclusive, etc β€” for each permutation of open / closed β€”Β and convenient syntax to represent these

It wouldn't be impossible to copy this approach β€” inherit from slice and implement these, without backward-incompat changes. But would be lots of work with (I think) modest benefits

What would your ideal design be @shoyer ? That sel operates the same as isel?

Yes, I think this is probably the easiest mode to use programmatically. Especially when using slicing to divide a largest dataset into many pieces, that way you only need to identify "split points" rather than lower and upper bounds for each range.

Honestly, I think this is a design mistake in pandas that we copied blindly in xarray. I don't know how we could feasibly this, though -- we'd have come up with our own replacement for slice (xarray.Slice?) and deal with breaking lots of old code in subtle ways.

This feels a minor inconsistency where cleaning things up would not be worth the trouble.

I agree that the inconsistency is minor does not worth the pain to correct it (as it will break a lots of code).
However, noticing this difference in the documentation with a small example may help.
I can try to do that if it's ok for you.

thanks, @vincentchabot, that would be great

Was this page helpful?
0 / 5 - 0 ratings