What happened:
Slice do not have the same effect when working with sel and isel.
The stop bound is not selected with isel while it is with sel.
What you expected to happen:
Either to select or not select the "stop" bound of the slice in both case.
Minimal Complete Verifiable Example:
import xarray as xr
da = xr.Dataset()
da.coords["lat"] = [0, 1, 2, 3]
da["value"] = (('lat'),da["lat"].values+10)
print("Isel result is \n %s \n\n"%da.isel(lat=slice(1,2)))
print("Sel result is \n %s"%da.sel(lat=slice(1,2)))
Gives the following results
Isel result is
<xarray.Dataset>
Dimensions: (lat: 1)
Coordinates:
* lat (lat) int64 1
Data variables:
value (lat) int64 11
Sel result is
<xarray.Dataset>
Dimensions: (lat: 2)
Coordinates:
* lat (lat) int64 1 2
Data variables:
value (lat) int64 11 12
Anything else we need to know?:
Environment:
Output of xr.show_versions()
commit: None
python: 3.8.2 | packaged by conda-forge | (default, Mar 5 2020, 17:11:00)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-118-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.15.0
pandas: 1.0.1
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.3
iris: None
bottleneck: None
dask: 2.12.0
distributed: 2.12.0
matplotlib: 3.2.0
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 46.0.0.post20200308
pip: 20.0.2
conda: None
pytest: 5.4.1
IPython: 7.13.0
sphinx: None
I understand this can be surprising at first glance, but the alternatives are more confusing β a label wouldn't select itself β so selecting a range including a label would require knowing what label followed the end of the range.
From the docs:
Like pandas, label based indexing in xarray is inclusive of both the start and stop bounds.
While the design is fixed, we'd take a PR to make the documentation clearer, if you have a view for how to improve it
Honestly, I think this is a design mistake in pandas that we copied blindly in xarray. I don't know how we could feasibly this, though -- we'd have come up with our own replacement for slice (xarray.Slice?) and deal with breaking lots of old code in subtle ways.
This feels a minor inconsistency where cleaning things up would not be worth the trouble.
What would your ideal design be @shoyer ? That sel operates the same as isel?
Honestly, I think this is a design mistake in pandas
~Honestly, I think this is a design mistake in C β 0 based indexing! π~ Edit: Not exactly applicable here
we'd have come up with our own replacement for
slice(xarray.Slice?) and deal with breaking lots of old code in subtle ways.
One elegant way that rust deals with this is multiple range types β Range, RangeInclusive, etc β for each permutation of open / closed βΒ and convenient syntax to represent these
It wouldn't be impossible to copy this approach β inherit from slice and implement these, without backward-incompat changes. But would be lots of work with (I think) modest benefits
What would your ideal design be @shoyer ? That
seloperates the same asisel?
Yes, I think this is probably the easiest mode to use programmatically. Especially when using slicing to divide a largest dataset into many pieces, that way you only need to identify "split points" rather than lower and upper bounds for each range.
Honestly, I think this is a design mistake in pandas that we copied blindly in xarray. I don't know how we could feasibly this, though -- we'd have come up with our own replacement for
slice(xarray.Slice?) and deal with breaking lots of old code in subtle ways.This feels a minor inconsistency where cleaning things up would not be worth the trouble.
I agree that the inconsistency is minor does not worth the pain to correct it (as it will break a lots of code).
However, noticing this difference in the documentation with a small example may help.
I can try to do that if it's ok for you.
thanks, @vincentchabot, that would be great
Most helpful comment
I understand this can be surprising at first glance, but the alternatives are more confusing β a label wouldn't select itself β so selecting a range including a label would require knowing what label followed the end of the range.
From the docs:
While the design is fixed, we'd take a PR to make the documentation clearer, if you have a view for how to improve it