rust 🚀 - Tracking issue for sorted collection ranges

CC @bluss @apasel422

Gankra on 13 Aug 2015

Will this still allow for efficient implementations of things like just the
predecessor (without iteration)?

On Wednesday, August 12, 2015, Alexis Beingessner notifications@github.com
wrote:

CC @bluss https://github.com/bluss @apasel422
https://github.com/apasel422

—
Reply to this email directly or view it on GitHub
https://github.com/rust-lang/rust/issues/27787#issuecomment-130498741.

apasel422 on 13 Aug 2015

@apasel422 Yes the builder is completely lazy -- get would just extract the one set bound and do lt/le/gt/whatever as appropriate.

Gankra on 13 Aug 2015

I should note the builder API is _considerably_ more hairy internally. Basically requires converting all the static types into dynamic types and then dispatching to the existing impls (for IntoIter) or those proposed in rust-lang/rfcs#1195

Gankra on 13 Aug 2015

Holy shit it actually works:

Gankra on 13 Aug 2015

Shoutouts to @zentner-kyle for proposing this API. Big improvement.

Gankra on 13 Aug 2015

In practice, with the parent pointer design I anticipate the guts of BTreeMap having the query API just do raw pointers and be an internal impl detail: get_lt(&BTreeMap, &Q) -> Handle<*mut Node>

A range knows how to compute handles from bounds by dispatching to the query API, and then get or into_iter are just munging up handles into a (K, V) pair or Iter(Handle, Handle).

Gankra on 13 Aug 2015

@apasel422 does this sound good to you?

Gankra on 13 Aug 2015

CC @cgaebel @glaebhoerl @jooert

Gankra on 13 Aug 2015

I have made an RFC to this affect: https://github.com/rust-lang/rfcs/pull/1254

Gankra on 14 Aug 2015

Removing the B-unstable tag as I believe these haven't been implemented.

alexcrichton on 27 Jan 2016

@alexcrichton range and range_mut have been around since 1.0, I think. But they won't be stabilized as-is.

Gankra on 28 Jan 2016

Oh oops there are indeed some unstable APIs tagged with this.

alexcrichton on 28 Jan 2016

What's the state of this? I'm using btree ranges in production code and would be happy to help as much as I can to stabilize this feature.

naktinis on 29 Feb 2016

👍4

I also would like to know the status as I'm building a program which uses ranges and if there is an idiomatic rust solution I'd love to use that over my own implementation.

calebmer on 11 Mar 2016

My understanding is that we're waiting on a new RFC in lieu of rust-lang/rfcs#1254.

apasel422 on 12 Apr 2016

How can we contribute to the discussion or at least keep track of the remaining issues?

naktinis on 13 Apr 2016

@naktinis A pre-RFC or general discussion thread on https://internals.rust-lang.org/ would probably work.

apasel422 on 13 Apr 2016

cc https://github.com/rust-lang/rust/issues/33197, a case where the APIs currently segfault apparently

alexcrichton on 25 Apr 2016

It seems to me that the extension Bound provides over Range is very small. I wonder why .range() doesn't take Range instead.

ticki on 9 May 2016

@ticki Range is [start, end), but Bound provides inclusive/exclusive/unbounded on both ends.

apasel422 on 9 May 2016

It would make sense to make it generic over a trait implemented for both Range and (Bound, Bound)

ticki on 9 May 2016

One important usecase not covered by the Range and Bound proposals is retrieving floor and ceiling entries and iterators. Java's Navigable[Map|Set] provides this, and C++ can do it through lower_bound plus an iterator decrement.

danburkert on 26 Jun 2016

I would suggest using a Cursor-based API for this, like I did in intrusive-collections. In particular the upper_bound and lower_bound methods on RBTree return a cursor that points to the first element satisfying the given bound (which can be Bound::Included or Bound::Excluded). Of course this is in addition to providing the range method which works just like the one in BTreeMap and returns a normal iterator.

Amanieu on 26 Jun 2016

While this remains unfixed there's effectively no stable sorted-map in std::collections :-(.

rocallahan on 26 Jun 2016

👍2

Here's another use-case not handled by range() as it stands: let the key type be (A,B) ordered lexicographically for two ordered types A and B. Unless B has min and max elements (which not all types do) we cannot express bounds for a range which matches (a, _anything_) i.e. a specific value from A.

If instead range bounds were expressed as 'monotone predicates' you could easily handle cases like the one above (which I mention because it's a real requirement for something I want to do).

Alternatively, if you could bisect using a monotone predicate, something like first_true(&self, p: P) -> Option<&K> that would give you all you'd need to find concrete range bounds.

ogoodman on 3 Jul 2016

Unless B has min and max elements (which not all types do) we cannot express bounds for a range which matches (a, anything) i.e. a specific value from A.

Actually, I think this is kinda-supported by the current signature, though it's a bit hard to use.

The trick is that the upper and lower bounds are not required to be of type K. They can be independent types Min and Max with the constraint that the key type K can be "borrowed" into the Min/Max type, and the ordering over Min/Max is used. I think you can use (abuse?) this to make Min/Max extend K with artificial values representing any boundary points you want. In your example, Min/Max could be a tuple (A, B') where B' is an enum extending B with artificial min and max values. Then you can implement Borrow on K to inject K into Min/Max.

This would be a bit clearer if BTreeMap was more like the bst crate and was parameterized with an explicit Compare object whose operators don't require the LHS and RHS to have the same type.

rocallahan on 8 Jul 2016

@rocallahan threw down the gauntlet on his blog. Maybe we can move this forward.

brson on 11 Jul 2016

🔔 This issue is now entering a cycle-long final comment period 🔔

After much deliberation the libs team has come to a conclusion about what to do with these methods. They have gone through much iteration but there are key drawbacks with the current few proposals we have:

The methods today, which take two enums, are wildly unergonomic.
The methods proposed in https://github.com/rust-lang/rfcs/pull/1254 work as a builder style, but one day we would also like language syntax to express all kinds of ranges which would obsolete these methods.
Waiting for language syntax for all kinds of ranges is likely to take quite awhile.

With all that in mind, we've come up with the following steps for stabilizing these APIs:

First, the RangeArgument trait will change to return a Bound from each of its methods. This will allow it to express all kinds of ranges that we'd like to specify.
Second, the current BTreeMap::range methods will be changed to take T: RangeArgument, similar to Vec::drain today.
Next, we'll implement RangeArgument for (Bound<T>, Bound<T>) so all possible bounds are expressible today, even without language syntax just yet.
After a cycle, stabilize the BTreeMap::range methods
At a future point, also stabilize the RangeArgument trait.

The idea here is that we can roughly have our cake and eat it too. We get nice and ergonomic access to the range methods which are expressive enough for all kinds of queries on the B-Tree types. We also get the ability to express _all_ kinds of ranges at the same time not only for B-Tree collections but also all other collections which take T: RangeArgument. Finally, if and when we do add language syntax for all kinds of ranges, we simply need to implement the RangeArgument trait for the new structs and you'll be able to immediately start using the new syntax for all the queries.

Our current plan is to implement these changes to RangeArgument very soon this cycle with an eye towards stabilizing BTreeMap::range this cycle as well. Note that RangeArgument will likely remain unstable for another cycle or two.

Now with all that out of the way, the libs team is very curious to hear others' thoughts!

alexcrichton on 13 Jul 2016

👍1

I don't find the current syntax "wildly unergonomic", personally, but your plan sounds good.

rocallahan on 13 Jul 2016

I have no objection to anything proposed by alexcrichton, but I do still think it would be useful to be able to construct a range from a pair of monotone predicates.

While rocallahan's suggestion works for the case I brought up, it is extra work for the user to provide an extension of the key type, and I'm not sure whether it can be done in complete generality. (Maybe the extension type could add in the monotone predicates, and you could probably provide an incomplete implementation of Ord knowing that it would never be used to compare two predicates.)

The implementation of range does not directly use the endpoints in the bounds: it uses predicates like |x: &K| x > k. So I think that making range (or a similarly-named sibling) work for predicates should make the implementation slightly simpler while increasing the set of use-cases that are handled.

ogoodman on 13 Jul 2016

@ogoodman I assume you mean a signature like the following:

impl<K: Ord, V> BTreeMap<K, V> {
    pub fn range<From, To>(&self, from: From, to: To) -> Range<K, V> where
        From: FnMut(&K) -> bool,
        To: FnMut(&K) -> bool;
}

with usage like

let map: BTreeMap<String, i32> = ...

for (k, v) in map.range(|k| k >= "bar", |k| k < "foo") {
    ...
}

Is that right?

apasel422 on 13 Jul 2016

@rocallahan hm true, I may have exaggerated a bit :)

@ogoodman do you have an example in mind where some form of range isn't suitable? That is, today's instance of these methods should be equally as expressive as what we're thinking of implementing, but it sounds like you want functionality beyond what's implemented even today?

alexcrichton on 13 Jul 2016

I commonly use sorted maps to store intervals, but it requires having a floor operation (find greatest key less than or equal to some value). Perhaps the Bound enum could be extended to have Floor and Ceil variants. A sketch of how it would work:

data = { 1, 3, 5, 7, 9 }

data.range(Floor(4), Ceil(6)) => { 3, 5, 7 }
data.range(Ceil(2), Floor(6)) => { 3, 5 } // a Ceil(n) lower bound is equivalent to Included(n) lower bound
                                          // a Floor(n) upper bound is equivalent to Included(n) upper bound

danburkert on 13 Jul 2016

and to be clear about what I mean about storing intervals, consider a data set with the following intervals:

[0, 10)
[20, 25)
[100, 200)

and I frequently want to ask questions like get me the interval (if it exists) which contains the value 22, or give me all intervals which intersect with the interval [22, 125):

data = btree_map!((0, 10), (20, 25), (100, 200));

let contains_22 = data.range(Floor(22), Included(22)).next();
let intersection = data.range(Floor(22), Floor(125));

danburkert on 13 Jul 2016

@danburkert I don't understand, can't you already do this with the existing bounds?

data = btree_map!((0, 10), (20, 25), (100, 200));

let contains_22 = data.range(Included(22), Included(22)).next();
let intersection = data.range(Included(22), Excluded(125));

EDIT: Nevermind, I understand what you're getting at. However what you want in this case is an interval tree, which is a different data structure.

Amanieu on 13 Jul 2016

An interval tree is used for overlapping intervals, and is consequently much more complex. What I'm asking for is common, and supported by the Java NavigableMap API and the C++ map API.

danburkert on 13 Jul 2016

@danburkert In that case what you're probably after is a Cursor interface to a BTreeMap. Have a look at my RBTree type in intrusive-collections. The lower_bound and upper_bound methods return a Cursor objects which points to a single element in the tree. You can then move the cursor to the next/previous element and look at them.

There are also _mut versions of these methods which return a CursorMut that allows removing/replacing/inserting elements at the cursor position.

Amanieu on 13 Jul 2016

@Amanieu do you have any reasons _not_ to have Floor and Ceil bound variants? I agree that cursors would be also solve my specific usecase, but as far as I can tell they aren't being considered for the standard library.

danburkert on 13 Jul 2016

@danburkert It's just that I don't really understand what you mean by those bounds. How exactly do they differ from Included & Excluded. Also they don't really fit in with the other enum variants. I should probably write an RFC for cursors at some point...

Amanieu on 13 Jul 2016

I think my interval example is muddying the waters. The definition of floor and ceiling bounds are:

Floor(n) - a bound inclusive of the greatest element less than or equal to n
Ceil(n) - a bound inclusive of the smallest element greater than or equal to n

There is no way to simulate this with just Inclusive or Exclusive unless the resulting Range can be expanded (similar to a cursor or C++ iterator).

danburkert on 13 Jul 2016

@ogoodman:

(Maybe the extension type could add in the monotone predicates, and you could probably provide an incomplete implementation of Ord knowing that it would never be used to compare two predicates.)

Yes, I'm pretty sure that works and could be packaged into something quite convenient to use (outside std).

@danburkert:
I'm using this interface for intervals myself. I find the greatest element less than or equal to n using range(Unbounded, Included(n)).next_back(), and the smallest element greater than or equal to n using range(Included(n), Unbounded).next().

rocallahan on 13 Jul 2016

@apasel422 Yes, that looks about right. I'm still learning rust so I didn't feel in a position to propose a signature.

@alexcrichton My first message here gave an example which is difficult with the current range bounds, namely: given a key type of (A,B) lexicographically ordered, come up with a range which matches a fixed a and any b in B. If B has minimum and maximum elements you can use inclusive range bounds (a, b_min) to (a, b_max) but not all types have min and max elements, e.g. String has no max, and arbitrary precision numbers have neither.

ogoodman on 13 Jul 2016

Recently, I've been trying to use this API and ran across a limitation (probably the same as @ogoodman's) in the interaction between range, Borrow and Ord. Basically, given some set BTreeSet<(A, B, C)> it would be nice to be able to (generically) query for all (a, b, *). This isn't currently possible without knowing some C::MIN, C::MAX. So, I'd like to propose an alternative solution: Instead of taking a range that has two bounds, take some opaque type that implements some OrdQuery (excuse the name) trait:

/// Applied some sorted sequence of values of type `T`, `OrdQuery::cmp` should
///  yield zero or more `Ordering::Less`, then zero or more `Ordering::Equal`, then
/// zero or more `Ordering::Greater`.
trait OrdQuery<T> where T: Ord {
    fn cmp(&self, &T) -> Ordering;
}

This means we can support queries like BTreeSet::<(A, B, C)>::query((a, b)) (note, I'm using query instead of range because it better describes the operation) in addition to things like my_set.query(0..100). Users could even define this trait on custom types to allow for custom lookups:

#[derive(Ord, PartialOrd, Eq, PartialEq, Debug)]
struct Person {
    last_name: String.
    first_name: String,
    age: u32,
}

struct ByFamilyName<'a>(&'a str)
struct ByName<'a>(&'a str,&'a str);

trait OrdQuery<Person> for ByFamilyName {
    fn cmp(&self, record: &Person) -> Ordering {
        self.0.cmp(&record.last_name)
    }
}
trait OrdQuery<Person> for ByName {
    fn cmp(&self, record: &Person) -> Ordering {
        (self.1, self.0).cmp(&(&record.last_name, &record.first_name))
    }
}

//...
map.query(ByName("Steven", "Allen"));
map.query(ByFamilyName("Allen"));

Stebalien on 21 Jul 2016

Something like:

use std::cmp::Ordering;
use std::ops::{Range, RangeFull};

trait OrderedRange<T: ?Sized> where T: Ord {
    fn range_cmp(&self, other: &T) -> Ordering;
}

impl<'a, T: 'a, A> OrderedRange<A> for &'a T
    where T: OrderedRange<A>, A: Ord
{
    fn range_cmp(&self, other: &A) -> Ordering {
        (&**self).range_cmp(&other)
    }
}

// Small tuple prefixes. Would use a macro to do more...
// Also implement for (&A, ...) so we can form tuples without cloning/copying.

impl<A, B> OrderedRange<(A, B)> for (A,)
    where A: Ord, B: Ord
{
    fn range_cmp(&self, other: &(A, B)) -> Ordering {
        self.0.cmp(&other.0)
    }
}

impl<'a, A, B> OrderedRange<(A, B)> for (&'a A,)
    where A: Ord, B: Ord
{
    fn range_cmp(&self, other: &(A, B)) -> Ordering {
        self.0.cmp(&other.0)
    }
}

impl<A, B, C> OrderedRange<(A, B, C)> for (A,)
    where A: Ord, B: Ord, C: Ord
{
    fn range_cmp(&self, other: &(A, B, C)) -> Ordering {
        self.0.cmp(&other.0)
    }
}

impl<'a, A, B, C> OrderedRange<(A, B, C)> for (&'a A,)
    where A: Ord, B: Ord, C: Ord
{
    fn range_cmp(&self, other: &(A, B, C)) -> Ordering {
        self.0.cmp(&other.0)
    }
}

impl<A, B, C> OrderedRange<(A, B, C)> for (A, B)
    where A: Ord, B: Ord, C: Ord
{
    fn range_cmp(&self, other: &(A, B, C)) -> Ordering {
        (&self.0, &self.1).cmp(&(&other.0, &other.1))
    }
}

impl<'a, A, B, C> OrderedRange<(A, B, C)> for (&'a A, &'a B)
    where A: Ord, B: Ord, C: Ord
{
    fn range_cmp(&self, other: &(A, B, C)) -> Ordering {
        (self.0, self.1).cmp(&(&other.0, &other.1))
    }
}

// Cover the ranges...

impl<T> OrderedRange<T> for Range<T>
    where T: Ord
{
    fn range_cmp(&self, other: &T) -> Ordering {
        if *other <= self.start {
            Ordering::Greater
        } else if *other > self.end {
            Ordering::Less
        } else {
            Ordering::Equal
        }
    }
}

impl<'a, T: 'a> OrderedRange<T> for Range<&'a T>
    where T: Ord
{
    fn range_cmp(&self, other: &T) -> Ordering {
        if *other <= *self.start {
            Ordering::Greater
        } else if *other > *self.end {
            Ordering::Less
        } else {
            Ordering::Equal
        }
    }
}

// Would like to use `Borrow` here but can't (not even specialization helps)...
impl<'a> OrderedRange<String> for Range<&'a str> {
    fn range_cmp(&self, other: &String) -> Ordering {
        if &**other <= self.start {
            Ordering::Greater
        } else if &**other > self.end {
            Ordering::Less
        } else {
            Ordering::Equal
        }
    }
}

impl<T> OrderedRange<T> for RangeFull where T: Ord {
    fn range_cmp(&self, _: &T) -> Ordering {
        Ordering::Equal
    }
}

// And for the other ranges...

// Now, for BTreeSet:

impl<T> BTreeSet<T> where T: Ord {
    fn query<Q: OrderedRange<T>>(&self, range: Q) -> btree_set::Range {
        /* ... */
    }
}

Stebalien on 21 Jul 2016

@Stebalien why is https://github.com/rust-lang/rust/issues/27787#issuecomment-231277344 not working for you?

rocallahan on 22 Jul 2016

@rocallahan the Borrow trait explicitly forbids this. That is, a.borrow().cmp(b.borrow) == a.cmp(b) for all a and b (the same is true for Hash and Eq). Also, Borrow can only return a reference to values so you can't just "make" things. You _can_ play some games with transmute and newtypes but this is very limited and not recommended.

Stebalien on 22 Jul 2016

OK, thanks for pointing that out.

Pulling https://crates.io/crates/compare into the standard library would be another way to fix this.

rocallahan on 22 Jul 2016

@rocallahan I don't really see how that helps.

Stebalien on 22 Jul 2016

bst::TreeMap is parameterized by a type C which is a subtype of Compare<K, K> _and_ Compare<Min,K>/Compare<Max,K> for the range bounds, and it definitely allows you to use any types Min/Max as long as your C parameter supports them.

rocallahan on 22 Jul 2016

So BTreeMap could use the same approach ... if Compare is pulled into the standard library.

rocallahan on 22 Jul 2016

I see. However, that's not nearly as flexible.

It doesn't allow one to write custom queries over external maps/sets (the query logic is built-in to the map/set, not the query).
The solution I posted above is significantly more ergonomic: set.query(1..100), set.query(("LastName",)), etc...

Here's a version that attempts to abstract over Borrow (unfortunately, it's a bit messy...). Usage examples at the bottom.

Stebalien on 22 Jul 2016

Unfortunately we were unable to get around to implementing these APIs this cycle, so we're going to punt this to the next.

alexcrichton on 11 Aug 2016

I'm highly against specialized queries as recommended by @Stebalien for range queries. The range API should exclusively depend on Ord for selecting from collections. Queries are fine, but there's an algorithmic cost, since you are adding an implicit filter/sort step to the process, and that should be made explicit.

All that I expect from range is that it can walk the tree from a starting point to an ending point in the same order the tree is constructed in, which is the simplest form of this operation, and requires no knowledge of the structure of the data other than that it implements Ord. Allowing other orderings requires re-sorting the map, an expensive operation.

I'm not saying it's not a good idea to allow other queries through other means, because it will be needed in some cases, but let's keep the default simple, especially since this will simplify implementation, signature, and speed.

I also think it's a little unnecessary to have inclusive/exclusive bounds, since operators like .. already set a precedent for it. Just let BTreeMap<T: Ord>::range(T,T) -> Range<T> suffice.

OtterCode on 19 Aug 2016

Actually, following both the .. and ... convention would be a decent way to avoid having to wrap an additional type. Just have a range method and a range_inclusive method, similar to here: Range Expressions.

OtterCode on 19 Aug 2016

@OtterCode It's actually a bit more complicated since both the start and end can be either inclusive or exclusive, whereas .. and ... always make the start inclusive.

Amanieu on 19 Aug 2016

@Amanieu But why? Inclusiveness in the dot ranges is also arbitrary, but necessary to simplify the interface. We can make the same sorts of decisions here. It's trivial to drop the first element of a range, a single skip is all that's required. It's also not difficult to construct most reasonable objects with a defined Ord to be one less or one more than any useful category.

We have to sacrifice some use cases to prevent the interface from creeping out of control. This should already be implemented and stable. I believe that one of the reasons it isn't, is because the definition keeps expanding, and the goal posts keep moving. Using the same definitions as .. and ... also has the massive benefit of symmetry, keeping the basics of the language consistent, which is a huge relief on the cognitive load.

Admittedly, this would make some operations harder than they need to be, but it would leave us with a great minimal default.

OtterCode on 19 Aug 2016

@OtterCode

Queries are fine, but there's an algorithmic cost, since you are adding an implicit filter/sort step to the process, and that should be made explicit.

I'm not sure what you're trying to get at here. These collections are already sorted and range queries don't let you filter anything. The algorithm would be pretty much identical to the current one.

Stebalien on 19 Aug 2016

@Stebalien

Well, take the following:

#[derive(Ord)]
struct Name {
    last_name: String,
    first_name: String
}

Notice that the derived Ord will sort first by last name, and second by first name.

Now say that you want to get a query where you don't care what the last name is, you just want to get every first name from Aaron to Bob. To make your range, you have to iterate over every possible last name, extracting the first names that fit inside [Aaron, Bob). You have to skip nodes in the middle, and you can't be sure you're done until you've walked the whole tree, that's a filter operation. Your example doesn't cover that, but what you're offering will lead newcomers to expect that they can ignore parts of an Ord definition at will, without cost. If they are only allowed to ignore parts of Ord in the order of their weight, that will introduce frustration and confusion. Moreover, this ignores the possibility of a more complex implementation of Ord with dependent fields or strange conditions. All we know about the contents of a BTreeMap is that it satisfies Ord, and it really should be left at that.

If an author wants to filter, then reduce, that's what they should do. Those are vastly more intensive operations than simply walking the tree until you bump against a failing leaf, but that's the cost of greater power and flexibility.

OtterCode on 19 Aug 2016

The range in "range query" is important. These aren't general purpose filters, they're queries that can select sorted ranges from sorted collections; in other words, they're queries that describe a sorted range of values. It's already possible to do this in rust using some fancy Borrow implementations but it violates Borrow's specification.

Stebalien on 19 Aug 2016

That's exactly what I'm arguing for, staying close to the simplest definition of range. I believe that the extended semantics obfuscate, rather than clarify that fact, and they additionally get in the way of getting this feature implemented.

OtterCode on 19 Aug 2016

@OtterCode I think this is a problem of naming. Would OrdRange (or OrderedRange) be less confusing?

The problem with just going ahead with a poor/inflexible design is that we can't just fix it later. One way to solve this is to do the same thing we did with RangeArgument: keep the trait unstable and only implement it on simple ranges for now. This would allow the user to query, e.g., a..b and Bound::Inclusive(a)..Bound::Inclusive(b) but not define their own queries.

Stebalien on 19 Aug 2016

@Stebalien That's a fair point. It also brings up the question, should we even implement this at all? It can be solved by iter().skip_while().take_while(). Is that enough?

As you say, and as the comment history clearly indicates, there is no single, concise definition of how to express a range in this context. Covering all possible ranges, such as (), (], [), and [], and layering your partial queries on top of that might be counter productive, and I believe it is. It certainly takes the interface farther and farther from anything I might consider using. I don't want to import an enum simply to define bounds on a range.

I still vote strongly for picking one, or maybe two definitions, and leaving it at that, unless we want a whole suite of range tools, in which case, perhaps, it's better to have a separate implementation of Range, and just call my_b_tree.into_range(), and then chain onto that in the same way the iter trait does, as was originally suggested, I believe.

OtterCode on 19 Aug 2016

That's a fair point. It also brings up the question, should we even implement this at all? It can be solved by iter().skip_while().take_while(). Is that enough?

No for the reason you mentioned in your first post: that's way too slow especially when the range is small.

As you say, and as the comment history clearly indicates, there is no single, concise definition of how to express a range in this context. Covering all possible ranges, such as (), (], [), and [], and layering your partial queries on top of that might be counter productive, and I believe it is. It certainly takes the interface farther and farther from anything I might consider using. I don't want to import an enum simply to define bounds on a range.

The point of my suggested design is that most users will be able to write my_map.range("first".."until") and be done with it. However, those who need more expressive queries can write them.

Stebalien on 19 Aug 2016

I feel like we're talking in circles. Is there another voice willing to weigh in?

OtterCode on 19 Aug 2016

A b-tree is a collection sorted by keys. It, like other tree-based data structures, allows item lookup in O(log n) given a key (where n is the number of elements in the collection).

Calling .iter() returns an iterator over all key-value pairs in the collection. Using skip_while and take_while always gets you an O(n) operation since the iterator walks over _all_ pairs, not just the ones in the range (so you'll always walk over the elements before the range you're interested in, just because the iterator happens to start there).

Using "proper" range queries, you can basically do 2 lookups in O(log n) (for the start and end of the range) and yield all elements within (which in itself isn't depending on the size of the collection but the number of elements in the range).

Caveat: I might be getting all of this wrong.

jonas-schievink on 19 Aug 2016

Nothing anyone has said convinces me that https://github.com/rust-lang/rust/issues/27787#issuecomment-232217282 is a bad plan.

I've been writing code using the current range API (as implemented in stable_bst). It feels good. It looks like intervals in mathematics. The Included/Excluded/Unbounded trichotomy is easy to write and easy to read --- much clearer than the equivalent code using the C++ map API, and clearer than having to use explicit successor/predecessor operators on keys to create exclusive bounds (even when that can be done). It's also clearer than using min/max keys to get unbounded endpoints. Simplifying the API semantics further, e.g. by removing control over inclusion/exclusion/unbounded as suggested by @OtterCode, would make my code a lot less readable and more error-prone.

I have been convinced that it would be good to offer a _separate_ API that creates a range whose endpoints are given by query predicates expressed as closures. I don't see the point of trying to merge that into the Bound-based API; that would make the overwhelmingly more common cases harder.

Queries are fine, but there's an algorithmic cost, since you are adding an implicit filter/sort step to the process, and that should be made explicit.

I don't understand. A query predicate, however it's expressed, has the same performance characteristics as a Bound: you perform O(log n) comparison operations to find a position in the collection.

Now say that you want to get a query where you don't care what the last name is, you just want to get every first name from Aaron to Bob.

You just can't do this efficiently in a balanced tree ordered by last-name first. If you need to do this efficiently, you need to create a different tree ordered by first-name first. This is a case where it would be convenient to parameterize the BTreeMap with a Compare trait like stable_bst does. Otherwise you have to convert the values to a new struct with the right implementation of Ord.

rocallahan on 20 Aug 2016

👍1

Is it out of scope to also add a drain method along with range? drain was included in the original builder-based API, but I haven't seen it mentioned since.

danburkert on 20 Aug 2016

@rocallahan Fair enough, not my preference, but it wouldn't be terribly hard to just wrap a function around the bounds selectors for a particular use case. What's holding this back? The comments just suggest that it wasn't a priority. Is there anything I can do to help, inexperience notwithstanding?

OtterCode on 20 Aug 2016

FWIW (having suggested a "pair of predicates" range method) I can't see anything wrong with @Stebalien 's suggestion: a single monotone 3-valued function (provided by a trait implementation) seems like a perfectly reasonable way to solve the problem. (The problem being that sentinel elements might be hard to find or non-existent, even though the range test is straightforward.)

ogoodman on 20 Aug 2016

Question: should my_btree_map[start..end] work as a default with the normal Range convention?

nerdrew on 18 Oct 2016

I need this in Servo. Please stop bikeshedding and stabilize this.

pcwalton on 7 Nov 2016

👍6

It needs to be implemented before it can be stabilized. The plan is https://github.com/rust-lang/rust/issues/27787#issuecomment-232217282

bluss on 7 Nov 2016

I'll put up a PR implementing the plan in the next day or two.

aturon on 7 Nov 2016

@aturon Is this being worked on? I would be happy to help in what way I can but I don't want to duplicate work if it's already in progress

djzin on 27 Nov 2016

Now that the changes to the surface API have landed, I think we're ready to consider stabilization.

@rfcbot rfc merge

aturon on 3 Feb 2017

Grr:

@rfcbot fcp merge

aturon on 3 Feb 2017

Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:

[x] @BurntSushi
[x] @Kimundi
[x] @alexcrichton
[x] @aturon
[x] @brson
[x] @sfackler

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot on 3 Feb 2017

@aturon, to clarify, what APIs are you thinking of stabilizing as part of this? Just BTreeMap::range{,_mut} or also the RangeArgument trait and associated types?

alexcrichton on 3 Feb 2017

@alexcrichton I think we should start with just the method for now, and let the trait bake a while longer.

aturon on 3 Feb 2017

In that case you have my checkmark!

alexcrichton on 3 Feb 2017

There is a small ergonomic issue I came across while writing tests for this relating to taking a range by borrowed key; say you have a map with strings as keys, then it would be nice to just use range syntax to specify the range. Unfortunately this doesn't work, because you would need a Range<str> which is unsized. Changing the bounds required type annotations in the common case. Instead what I did is implement RangeArgument<T> for (Bound<&T>, Bound<&T>), allowing you to do this, but it still requires type annotations.

#![feature(btree_range)]
#![feature(collections_bound)]

use std::collections::btree_map::BTreeMap;
use std::collections::Bound::*;

fn main() {
    let mut map = BTreeMap::new();
    map.insert("key".to_string(), "value".to_string());
    map.insert("lee".to_string(), "walue".to_string());
    map.insert("me".to_string(), "xalue".to_string());
    // let range = map.range("key".."me"); // doesn't work
    // let range = map.range((Included("key"), Excluded("me"))); // type annotations required
    let range = map.range::<str, _>((Included("key"), Excluded("me"))); // finally
    for (key, value) in range {
        println!("({}, {})", key, value);
    }
}

I suppose this is covered by not stabilizing RangeArgument, though?

djzin on 4 Feb 2017

@djzin We have to be a bit careful: once the range method is stabilized, we're committing to it working for at least all of the arguments allowed by today's definition of RangeArgument. That is, we can change the definition and impls for RangeArgument but only if all uses of range will continue to work.

aturon on 6 Feb 2017

I propose that #33197 should be fixed before stabilization (unfortunately), but it's being fixed.

bluss on 7 Feb 2017

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot on 13 Feb 2017

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot on 13 Feb 2017

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot on 13 Feb 2017

The first issue I listed above (not being able to use range syntax) may be fixed at a later time by adding impls such as

impl<'a, T: ?Sized> RangeArgument<T> for Range<&'a T> {
    fn start(&self) -> Bound<&T> { Included(self.start) }
    fn end(&self) -> Bound<&T> { Excluded(self.end) }
}

Then one may use map.range::<str, _>("a".."z").
A similar impl could be provided for &'a mut T if that were a thing.

For some reason, even though there is only one type that can fit here (as far as I can tell), rust still requires type annotations for the str. This seems to me like improvements in rust's type inference down the line may fix this ergonomic issue.

A related problem occurs with RangeFull aka .. - rust cannot derive the type for a String key. In this case there are multiple options, all with the same result. I feel as though this is a non-issue as selecting the whole range is a no-op.

None of these issues affect BTreeMaps with an i64 key (or any key that has no nontrivial implementation of Borrow), and the main ones are solvable in future backwards-compatibly. So I think, as this feature has been in unstable limbo for so long, it's due its chance to shine in stable rust-land :)

djzin on 18 Feb 2017

The final comment period is now complete.

rfcbot on 23 Feb 2017

!?!? REALLY?

Gankra on 30 Mar 2017

🎉5

@Gankro I was shocked too: https://github.com/solson/miri/pull/155 :)

this feature has been stable since ____. Attribute no longer needed is the best warning in rustc.

solson on 30 Mar 2017

wow
On Wed, Mar 29, 2017 at 7:35 PM Scott Olson notifications@github.com
wrote:

@Gankro https://github.com/Gankro I was shocked too: solson/miri@c6a18ce

diff-b4aea3e418ccdb71239b96952d9cddb6L2

https://github.com/solson/miri/pull/156/commits/c6a18cead8885b229f5c7bc5d11cf6ed42d4dc6c#diff-b4aea3e418ccdb71239b96952d9cddb6L2
:)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/rust-lang/rust/issues/27787#issuecomment-290258210,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAH3yVetV_d9Radh_Efyx_wJP4PucsrDks5rqurGgaJpZM4FqpeG
.

cgaebel on 30 Mar 2017

Rust: Tracking issue for sorted collection ranges

Most helpful comment

All 92 comments

diff-b4aea3e418ccdb71239b96952d9cddb6L2

Related issues