Julia: Range `first` and `last` can be misleading

Created on 13 Jun 2017  路  24Comments  路  Source: JuliaLang/julia

I noticed that first(r) and last(r) where r is one of various range types can give spurious results in the case the range is empty.

julia> first(1:-1:2)
1

julia> last(1:-1:2)
2

I can see these are used to enable nice generic code for ranges, for various Base functions like isempty, issorted and so-on. However, to me, this seems to violate the iterator interface pretty badly, and I would expect a BoundsError thrown. Would it be more appropriate to use some specialized functions like rangestart and rangestop as an interface to get these numbers for ranges?

decision

Most helpful comment

Unfortunately @inbounds won't work here since it's not a guaranteed transform (e.g., running in --boundcheck=yes will break it).

I think we definitely need a completely new name for this 鈥斅爐he question is which one: rangestart, r.start, or Range.first (or some such module as a namespace). Note that there's a third property in this set, step, which is a name we've talked about removing in the past since it's not really a general enough to be worth an exported generic function. Whichever scheme we choose, we may want to fold that one into it, too.

All 24 comments

:+1: to the BoundsError. I'm not sure we need a replacement? EDIT: non-exported makes sense for implementing the methods you mention.

I recall we looked at this once _quite_ a while ago, and I think fixing it caused some performance problems. Maybe we'll fare better now though.

In particular it might make sense to put it inside a @boundcheck?

An even more interesting/disturbing case:

julia> last(1:-1:3)
2

julia> 1:-1:3 # Explanation
1:-1:2

There should probably be a special display for an empty range:

julia> 1:1:-3
1:1:0

julia> 0:1:-3
0:1:-1

julia> -3:-1:1
-3:-1:-2

I agree this is strange, but it's used in APIs like searchsorted, where you're interested in the insertion point of an empty range. It's documented:

  first(coll)

  Get the first element of an iterable collection. Returns the start point of a Range even if it is empty.

It'd definitely be worth seeing if we can replace it with a special function.

I'd favor undocumenting that, though, and going the route suggested by @andyferris of adding rangestart and rangestop.

I agree this is strange, but it's used in APIs like searchsorted, where you're interested in the insertion point of an empty range.

I'm responsible for the return of a range by searchsorted, and I think that the combination of searchsorted, searchsortedfirst, and searchsortedlast should be cleaned up. The names are awkward, and I'm pretty sure they were added before keyword arguments were part of the language.

I'm responsible for the return of a range by searchsorted, and I think that the combination of searchsorted, searchsortedfirst, and searchsortedlast should be cleaned up. The names are awkward, and I'm pretty sure they were added before keyword arguments were part of the language.

Funny how often this comes up this week. In the Search & Find Julep a solution to this was to wrap sorted vectors in a SortedVector object, which would allow generic find/search functions to be used instead of the searchsorted* functions. Keyword arguments wouldn't be enough since the return type (range vs. vector) would have to change, which would create a type instability.

But AFAICT it would still be useful to know the insertion point, so a rangestart function would be needed.

For triage: I think we'd better raise a BoundsError for now, and add rangestart and rangestop functions for the searchsorted use case. We could even recommend using r.start for searchsorted, since the result is necessarily a UnitRange{Int}: that would leave us more time to decide whether adding rangestart is a good idea.

I don't love the rangestart and rangestop functions, but agree it would be nice to explore giving a BoundsError here. I agree that since rangestart is range-specific anyway, you might as well just access r.start directly. This can be considered if somebody happens to get to it in time.

But not all ranges have a start field:

julia> fieldnames(StepRangeLen)
4-element Array{Symbol,1}:
 :ref   
 :step  
 :len   
 :offset

(Explanation: for AbstractFloat, ref is the smallest-magnitude element in the range, even if that element occurs somewhere in the middle. This is necessary for high-precision arithmetic, otherwise you might, e.g., miss zero by an eps.)

Is this curious behavior only required for our implementation of searchsorted and so-on?

I may be misunderstanding your question here, but it's used in packages too. For example, I'm almost sure it's exploited in AxisArrays.

For example, I'm almost sure it's exploited in AxisArrays.

For what purposes?

Regarding searchsorted, I'd like to be sure the current behavior is really needed. See https://github.com/JuliaLang/julia/issues/24883. If you want to know whether an element is already in the vector and to insert it if it isn't, wouldn't you use searchsortedfirst instead (which returns a single index)? At least what I can say is that searchsorted is used only once in Base, and no insertion happens.

Looks like it's basically the same purpose as in Base: https://github.com/JuliaArrays/AxisArrays.jl/blob/master/src/search.jl

OK. Though AFAICT the returned value is not used to insert entries, so we still don't have an example where the start of an empty index is useful.

What's up with adding this to the milestone without triage, @nalimilan? And yes, I do keep a list of 1.0 issues open and refresh it several times a day to see if there are any more or less issues on it.

Sorry, I was going to add a comment saying that the milestone should probably have been added when this issue was triaged. I then started to see whether it would be hard to make a PR and got distracted by other things.

See PR https://github.com/JuliaLang/julia/pull/25385. It would be cool if triage could decide which approach is most appropriate.

Nice.

I had been thinking of making a simple PR which kind-of puns off @inbounds and @boundscheck for first and last (so is safe and correct by default), but I still felt a little conflicted if that was good...

Unfortunately @inbounds won't work here since it's not a guaranteed transform (e.g., running in --boundcheck=yes will break it).

I think we definitely need a completely new name for this 鈥斅爐he question is which one: rangestart, r.start, or Range.first (or some such module as a namespace). Note that there's a third property in this set, step, which is a name we've talked about removing in the past since it's not really a general enough to be worth an exported generic function. Whichever scheme we choose, we may want to fold that one into it, too.

In the other thread I just proposed using properties to do r.start, r.step and r.stop uniformly across different range types.

I think we can punt on this for 1.0 --- it doesn't seem to be an urgent problem, largely a theoretical concern. In fact I think #25385 demonstrates that the existing behavior was largely ok and convenient.

Was this page helpful?
0 / 5 - 0 ratings