I noticed that first(r)
and last(r)
where r
is one of various range types can give spurious results in the case the range is empty.
julia> first(1:-1:2)
1
julia> last(1:-1:2)
2
I can see these are used to enable nice generic code for ranges, for various Base
functions like isempty
, issorted
and so-on. However, to me, this seems to violate the iterator interface pretty badly, and I would expect a BoundsError
thrown. Would it be more appropriate to use some specialized functions like rangestart
and rangestop
as an interface to get these numbers for ranges?
:+1: to the BoundsError
. I'm not sure we need a replacement? EDIT: non-exported makes sense for implementing the methods you mention.
I recall we looked at this once _quite_ a while ago, and I think fixing it caused some performance problems. Maybe we'll fare better now though.
In particular it might make sense to put it inside a @boundcheck
?
An even more interesting/disturbing case:
julia> last(1:-1:3)
2
julia> 1:-1:3 # Explanation
1:-1:2
There should probably be a special display for an empty range:
julia> 1:1:-3
1:1:0
julia> 0:1:-3
0:1:-1
julia> -3:-1:1
-3:-1:-2
I agree this is strange, but it's used in APIs like searchsorted
, where you're interested in the insertion point of an empty range. It's documented:
first(coll)
Get the first element of an iterable collection. Returns the start point of a Range even if it is empty.
It'd definitely be worth seeing if we can replace it with a special function.
I'd favor undocumenting that, though, and going the route suggested by @andyferris of adding rangestart
and rangestop
.
I agree this is strange, but it's used in APIs like
searchsorted
, where you're interested in the insertion point of an empty range.
I'm responsible for the return of a range by searchsorted
, and I think that the combination of searchsorted
, searchsortedfirst
, and searchsortedlast
should be cleaned up. The names are awkward, and I'm pretty sure they were added before keyword arguments were part of the language.
I'm responsible for the return of a range by searchsorted, and I think that the combination of searchsorted, searchsortedfirst, and searchsortedlast should be cleaned up. The names are awkward, and I'm pretty sure they were added before keyword arguments were part of the language.
Funny how often this comes up this week. In the Search & Find Julep a solution to this was to wrap sorted vectors in a SortedVector
object, which would allow generic find
/search
functions to be used instead of the searchsorted*
functions. Keyword arguments wouldn't be enough since the return type (range vs. vector) would have to change, which would create a type instability.
But AFAICT it would still be useful to know the insertion point, so a rangestart
function would be needed.
For triage: I think we'd better raise a BoundsError
for now, and add rangestart
and rangestop
functions for the searchsorted
use case. We could even recommend using r.start
for searchsorted
, since the result is necessarily a UnitRange{Int}
: that would leave us more time to decide whether adding rangestart
is a good idea.
I don't love the rangestart
and rangestop
functions, but agree it would be nice to explore giving a BoundsError here. I agree that since rangestart
is range-specific anyway, you might as well just access r.start
directly. This can be considered if somebody happens to get to it in time.
But not all ranges have a start
field:
julia> fieldnames(StepRangeLen)
4-element Array{Symbol,1}:
:ref
:step
:len
:offset
(Explanation: for AbstractFloat
, ref
is the smallest-magnitude element in the range, even if that element occurs somewhere in the middle. This is necessary for high-precision arithmetic, otherwise you might, e.g., miss zero by an eps.)
Is this curious behavior only required for our implementation of searchsorted
and so-on?
I may be misunderstanding your question here, but it's used in packages too. For example, I'm almost sure it's exploited in AxisArrays.
For example, I'm almost sure it's exploited in AxisArrays.
For what purposes?
Regarding searchsorted
, I'd like to be sure the current behavior is really needed. See https://github.com/JuliaLang/julia/issues/24883. If you want to know whether an element is already in the vector and to insert it if it isn't, wouldn't you use searchsortedfirst
instead (which returns a single index)? At least what I can say is that searchsorted
is used only once in Base, and no insertion happens.
Looks like it's basically the same purpose as in Base: https://github.com/JuliaArrays/AxisArrays.jl/blob/master/src/search.jl
OK. Though AFAICT the returned value is not used to insert entries, so we still don't have an example where the start of an empty index is useful.
What's up with adding this to the milestone without triage, @nalimilan? And yes, I do keep a list of 1.0 issues open and refresh it several times a day to see if there are any more or less issues on it.
Sorry, I was going to add a comment saying that the milestone should probably have been added when this issue was triaged. I then started to see whether it would be hard to make a PR and got distracted by other things.
See PR https://github.com/JuliaLang/julia/pull/25385. It would be cool if triage could decide which approach is most appropriate.
Nice.
I had been thinking of making a simple PR which kind-of puns off @inbounds
and @boundscheck
for first
and last
(so is safe and correct by default), but I still felt a little conflicted if that was good...
Unfortunately @inbounds
won't work here since it's not a guaranteed transform (e.g., running in --boundcheck=yes
will break it).
I think we definitely need a completely new name for this 鈥斅爐he question is which one: rangestart
, r.start
, or Range.first
(or some such module as a namespace). Note that there's a third property in this set, step
, which is a name we've talked about removing in the past since it's not really a general enough to be worth an exported generic function. Whichever scheme we choose, we may want to fold that one into it, too.
In the other thread I just proposed using properties to do r.start
, r.step
and r.stop
uniformly across different range types.
I think we can punt on this for 1.0 --- it doesn't seem to be an urgent problem, largely a theoretical concern. In fact I think #25385 demonstrates that the existing behavior was largely ok and convenient.
Most helpful comment
Unfortunately
@inbounds
won't work here since it's not a guaranteed transform (e.g., running in--boundcheck=yes
will break it).I think we definitely need a completely new name for this 鈥斅爐he question is which one:
rangestart
,r.start
, orRange.first
(or some such module as a namespace). Note that there's a third property in this set,step
, which is a name we've talked about removing in the past since it's not really a general enough to be worth an exported generic function. Whichever scheme we choose, we may want to fold that one into it, too.