Xarray: Add head(), tail() and thin() methods?

Created on 11 Feb 2015  Â·  10Comments  Â·  Source: pydata/xarray

These would be shortcuts for isel/slice syntax:

  • ds.head(time=5) -> ds.isel(time=slice(5)): select the first five time values
  • ds.tail(time=5) -> ds.isel(time=slice(-5, None)): select the last five time values
  • ds.thin(time=5) -> ds.isel(time=slice(None, None, 5)): select every 5th time value
enhancement good first issue indexing

All 10 comments

Clojure conventions: .take, .take_last get the first n and last n
pandas/ndarray conventions: .take([3,4,5]) selects rows 3,4,5.

probably want to be consistent with one of these.

On Tue, Feb 10, 2015 at 3:28 PM, Stephan Hoyer [email protected]
wrote:

These would be shortcuts for isel/slice syntax:

  • ds.head(time=5) -> ds.isel(time=slice(5)): select the first five
    time values
  • ds.tail(time=5) -> ds.isel(time=slice(-5, None)): select the last
    five time values
  • ds.subsample(time=5) -> ds.isel(time=slice(None, None, 5)): select
    every 5th time value

—
Reply to this email directly or view it on GitHub
https://github.com/xray/xray/issues/319.

@ebrevdo yes, I think we probably need to stick to the pandas/numpy convention for the meaning of take.

My inspiration for these names was the head() and tail() methods in pandas, which are quite convenient. But it's not entirely clear how/if these generalize cleanly to N-dimensions. I suppose take_first and take_last could be an improvement over head/tail.

seems like for, e.g., head, you can pass either a single dimension or
multiple ones (e.g., either as **kwargs or a dictionary) and use those as
the start dimension.

that said, about naming conventions, i think for tensors the most common
convention is definitely slice() (which is implemented as isel). head/tail
can be implemented in terms of slice().

e.g.:
ds.slice(dim1=3, dim2=(1,4), dim3=(1,None,5)) -- or --
ds.slice({'dim1': 3, 'dim2': (1,4), 'dim3': (1, None, 5)})

head/tail/whatever are easy calls to this and you can have that in the
documentation. as a result, people won't get confused because they
understand slice.

On Tue, Feb 10, 2015 at 4:14 PM, Stephan Hoyer [email protected]
wrote:

@ebrevdo https://github.com/ebrevdo yes, I think we probably need to
stick to the pandas/numpy convention for the meaning of take.

My inspiration for these names was the head() and tail() methods in
pandas, which are quite convenient. But it's not entirely clear how/if
these generalize cleanly to N-dimensions. I suppose take_first and
take_last could be an improvement over head/tail.

—
Reply to this email directly or view it on GitHub
https://github.com/xray/xray/issues/319#issuecomment-73812481.

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

I think this is still worth doing. I actually prefer @shoyer's original method names and API. I also think this would be a good first issue.

One virtue of tail() is particular is that we can define array.tail(x=0) to consistently return an array with an x dimension of size 0. This requires special case logic with slicing, since slice(None, -0) is the same as slice(None, 0).

Anyways so yes, I agree that these would definitely be more readable than the equivalent operations with isel().

Will work on this issue in the next three weeks if no one else does till then.

Is this being worked on?

If not, I can send a PR today since I have some code ready that might help add this functionality.

Go for it!

Was this page helpful?
0 / 5 - 0 ratings