Julia: Range `first` and `last` can be misleading

Created on 13 Jun 2017 · 24Comments · Source: JuliaLang/julia

I noticed that first(r) and last(r) where r is one of various range types can give spurious results in the case the range is empty.

julia> first(1:-1:2)
1

julia> last(1:-1:2)
2

I can see these are used to enable nice generic code for ranges, for various Base functions like isempty, issorted and so-on. However, to me, this seems to violate the iterator interface pretty badly, and I would expect a BoundsError thrown. Would it be more appropriate to use some specialized functions like rangestart and rangestop as an interface to get these numbers for ranges?

decision

Source

andyferris

👍1

Most helpful comment

Unfortunately @inbounds won't work here since it's not a guaranteed transform (e.g., running in --boundcheck=yes will break it).

I think we definitely need a completely new name for this — the question is which one: rangestart, r.start, or Range.first (or some such module as a namespace). Note that there's a third property in this set, step, which is a name we've talked about removing in the past since it's not really a general enough to be worth an exported generic function. Whichever scheme we choose, we may want to fold that one into it, too.

mbauman on 4 Jan 2018

👍2 ❤1

All 24 comments

:+1: to the BoundsError. ~~I'm not sure we need a replacement?~~ EDIT: non-exported makes sense for implementing the methods you mention.

timholy on 13 Jun 2017

👍1

I recall we looked at this once _quite_ a while ago, and I think fixing it caused some performance problems. Maybe we'll fare better now though.

JeffBezanson on 13 Jun 2017

In particular it might make sense to put it inside a @boundcheck?

timholy on 13 Jun 2017

An even more interesting/disturbing case:

julia> last(1:-1:3)
2

julia> 1:-1:3 # Explanation
1:-1:2

nalimilan on 13 Jun 2017

There should probably be a special display for an empty range:

julia> 1:1:-3
1:1:0

julia> 0:1:-3
0:1:-1

julia> -3:-1:1
-3:-1:-2

omus on 13 Jun 2017

I agree this is strange, but it's used in APIs like searchsorted, where you're interested in the insertion point of an empty range. It's documented:

  first(coll)

  Get the first element of an iterable collection. Returns the start point of a Range even if it is empty.

It'd definitely be worth seeing if we can replace it with a special function.

mbauman on 13 Jun 2017

I'd favor undocumenting that, though, and going the route suggested by @andyferris of adding rangestart and rangestop.

timholy on 13 Jun 2017

👍3

I agree this is strange, but it's used in APIs like searchsorted, where you're interested in the insertion point of an empty range.

I'm responsible for the return of a range by searchsorted, and I think that the combination of searchsorted, searchsortedfirst, and searchsortedlast should be cleaned up. The names are awkward, and I'm pretty sure they were added before keyword arguments were part of the language.

kmsquire on 14 Jun 2017

I'm responsible for the return of a range by searchsorted, and I think that the combination of searchsorted, searchsortedfirst, and searchsortedlast should be cleaned up. The names are awkward, and I'm pretty sure they were added before keyword arguments were part of the language.

Funny how often this comes up this week. In the Search & Find Julep a solution to this was to wrap sorted vectors in a SortedVector object, which would allow generic find/search functions to be used instead of the searchsorted* functions. Keyword arguments wouldn't be enough since the return type (range vs. vector) would have to change, which would create a type instability.

But AFAICT it would still be useful to know the insertion point, so a rangestart function would be needed.

nalimilan on 14 Jun 2017

For triage: I think we'd better raise a BoundsError for now, and add rangestart and rangestop functions for the searchsorted use case. We could even recommend using r.start for searchsorted, since the result is necessarily a UnitRange{Int}: that would leave us more time to decide whether adding rangestart is a good idea.

nalimilan on 29 Nov 2017

I don't love the rangestart and rangestop functions, but agree it would be nice to explore giving a BoundsError here. I agree that since rangestart is range-specific anyway, you might as well just access r.start directly. This can be considered if somebody happens to get to it in time.

JeffBezanson on 30 Nov 2017

But not all ranges have a start field:

julia> fieldnames(StepRangeLen)
4-element Array{Symbol,1}:
 :ref   
 :step  
 :len   
 :offset

(Explanation: for AbstractFloat, ref is the smallest-magnitude element in the range, even if that element occurs somewhere in the middle. This is necessary for high-precision arithmetic, otherwise you might, e.g., miss zero by an eps.)

timholy on 1 Dec 2017

👍1

Is this curious behavior only required for our implementation of searchsorted and so-on?

andyferris on 3 Dec 2017

I may be misunderstanding your question here, but it's used in packages too. For example, I'm almost sure it's exploited in AxisArrays.

timholy on 3 Dec 2017

For example, I'm almost sure it's exploited in AxisArrays.

For what purposes?

Regarding searchsorted, I'd like to be sure the current behavior is really needed. See https://github.com/JuliaLang/julia/issues/24883. If you want to know whether an element is already in the vector and to insert it if it isn't, wouldn't you use searchsortedfirst instead (which returns a single index)? At least what I can say is that searchsorted is used only once in Base, and no insertion happens.

nalimilan on 3 Dec 2017

Looks like it's basically the same purpose as in Base: https://github.com/JuliaArrays/AxisArrays.jl/blob/master/src/search.jl

timholy on 3 Dec 2017

OK. Though AFAICT the returned value is not used to insert entries, so we still don't have an example where the start of an empty index is useful.

nalimilan on 3 Dec 2017

What's up with adding this to the milestone without triage, @nalimilan? And yes, I do keep a list of 1.0 issues open and refresh it several times a day to see if there are any more or less issues on it.

StefanKarpinski on 29 Dec 2017

😄1

Sorry, I was going to add a comment saying that the milestone should probably have been added when this issue was triaged. I then started to see whether it would be hard to make a PR and got distracted by other things.

nalimilan on 29 Dec 2017

See PR https://github.com/JuliaLang/julia/pull/25385. It would be cool if triage could decide which approach is most appropriate.

nalimilan on 4 Jan 2018

Nice.

I had been thinking of making a simple PR which kind-of puns off @inbounds and @boundscheck for first and last (so is safe and correct by default), but I still felt a little conflicted if that was good...

andyferris on 4 Jan 2018

Unfortunately @inbounds won't work here since it's not a guaranteed transform (e.g., running in --boundcheck=yes will break it).

mbauman on 4 Jan 2018

👍2 ❤1

In the other thread I just proposed using properties to do r.start, r.step and r.stop uniformly across different range types.

StefanKarpinski on 4 Jan 2018

❤2

I think we can punt on this for 1.0 --- it doesn't seem to be an urgent problem, largely a theoretical concern. In fact I think #25385 demonstrates that the existing behavior was largely ok and convenient.

JeffBezanson on 5 Jan 2018

Was this page helpful?

0 / 5 - 0 ratings