Rust: paper cut: is a slice / Iterator sorted?

Created on 6 Sep 2017 · 21Comments · Source: rust-lang/rust

I just used binary_search inside a loop and realized that it would be nice to check before the loop if my immutable slice was sorted. So I spent 10 minutes going through the standard library just to find out that there aren't any is_sorted, is_sorted_by, is_sorted_by_key, etc. algorithms implemented for neither slices nor iterators.

We should add:

[ ] Iterator::is_sorted
[ ] Iterator::is_sorted_by
[ ] Iterator::is_sorted_by_key

and since Iterator is implemented for slices, we can add those methods to slices as well by just forwarding to the Iterator implementation.

A-collections C-feature-accepted T-libs

Source

gnzlbg

👍10 ❤1

Most helpful comment

It would have to consume the entire iterator to get the result,

Of course.

leaving you only with the information that the iterator _that you now can't use anymore_ would indeed have been sorted.

There are many useful algorithms that consume an iterator. Iterators don't necessarily own the items; some iterators do, some don't. If you don't want the items to get lost then don't consume an iterator that owns them. You can either copy/clone the iterator before you consume it, or you can create it again. For example, given:

let v: Vec<u32> = vec![0, 1, 2, 3, 4, 5];

then this works

if v.iter().is_sorted() {
  for i in v.iter().map(|i| i* 2) {
    println!("{}", i);
  }
}

but this won't compile

if v.into_iter().is_sorted() { // vec is consumed here
  for i in v.iter().map(|i| i * 2) {
    println!("{}", i);
  }
}

Given that iterating over an Iterator consumes the iterator:

let v: Vec<u32> = vec![0, 1, 2, 3, 4, 5];
let it = v.iter().map(|i| i* 2);
for i in it { println!("{}", i); }  // it is moved here
for i in it { println!("{}", i); }  // ERROR: it was already moved

and that is_sorted must, by necessity, look at each element on the sequence, maybe you could elaborate on why you think that consuming an iterator in this algorithm is bad, and suggest what could be done about it instead?

.is_sorted_by_key(select_the_key) seems particularly redundant, given that it would just be .map(select_the_key).is_sorted().

I've added them for both convenience and consistency. The stable and unstable sort, sort_by, and sort_by_key functions are redundant as well, yet they are very convenient. To be consistent with the sort algorithms, I think is_sorted should provide these alternatives as well. However, I also think that having them is convenient.

gnzlbg on 7 Sep 2017

👍2

All 21 comments

We also shouldn't change it to be O(n) on debug-assertions builds - debug-assertions should be for the last 2x of performance, not for potential order-of-magnitude slowdowns.

arielb1 on 6 Sep 2017

@arielb1 sorry this issue turned out to be two issues, I've split that part to #44371.

gnzlbg on 6 Sep 2017

In the D language standard library there is also a wrapper for sorted iterables (that performs statistical tests inside its constructor), with it you can create functions that document in the code that return/accept sorted iterables.

leonardo-m on 6 Sep 2017

and since Iterator is implemented for slices those methods should work on slices as well.

Just to help clarify here: slices are iterable (IntoIterator for &[T], &mut [T]), but not themselves iterators.

bluss on 6 Sep 2017

@bluss what I meant is that we should probably implement those methods on slices as well, but that implementation should just be forwarded to the Iterator version. I've reworded the issue.

gnzlbg on 7 Sep 2017

How would it make sense for this operation to be defined on an iterator? It would have to consume the entire iterator to get the result, leaving you only with the information that the iterator that you now can't use anymore would indeed have been sorted. .is_sorted_by_key(select_the_key) seems particularly redundant, given that it would just be .map(select_the_key).is_sorted().

Thiez on 7 Sep 2017

👍2

It would have to consume the entire iterator to get the result,

Of course.

leaving you only with the information that the iterator _that you now can't use anymore_ would indeed have been sorted.

let v: Vec<u32> = vec![0, 1, 2, 3, 4, 5];

then this works

if v.iter().is_sorted() {
  for i in v.iter().map(|i| i* 2) {
    println!("{}", i);
  }
}

but this won't compile

if v.into_iter().is_sorted() { // vec is consumed here
  for i in v.iter().map(|i| i * 2) {
    println!("{}", i);
  }
}

Given that iterating over an Iterator consumes the iterator:

let v: Vec<u32> = vec![0, 1, 2, 3, 4, 5];
let it = v.iter().map(|i| i* 2);
for i in it { println!("{}", i); }  // it is moved here
for i in it { println!("{}", i); }  // ERROR: it was already moved

.is_sorted_by_key(select_the_key) seems particularly redundant, given that it would just be .map(select_the_key).is_sorted().

gnzlbg on 7 Sep 2017

👍2

How about a non-terminal .while_sorted() operation? All it would have to do is compare adjacent elements and stop iterating when they are not monotonic. .require_sorted() could panic instead. Maybe this could be generalized into .while_pair(|a,b| ...) and put into itertools?

the8472 on 7 Sep 2017

@the8472 itertools already has .tuple_window which combined with take_while is more generic than .while_pair.

Anyhow, I am neither in favour nor opposed to doing that, but I think that discussion should belong in a different issue.

This issue is about std neither checking the precondition of binary_search nor providing users the algorithm to check it themselves.

There is some prior art for is_sorted in other standard libraries, e.g., in C++, although there it is implemented on top of is_sorted_until.

gnzlbg on 7 Sep 2017

is_sorted_until seems like a good model. In the rust case that could be sorted_prefix() -> (impl Iterator, impl Iterator). Then a consumer could do a binary search with the prefix and either panic if the tail is non-empty or do something else with it.

the8472 on 7 Sep 2017

That seems like a lot of complexity to avoid a single call to sort() (or collect::<Vec<_>>() followed by sort()). Especially since your proposed sorted_prefix would have to perform hidden allocations to do its work anyway.

Thiez on 7 Sep 2017

Ok, how about num_sorted() -> usize for iterators? That could get you the sorted prefix of a slice for example. It's more powerful than is_sorted() without additional cost.

the8472 on 7 Sep 2017

Are there any concrete use cases for a "check sorted-ness while iterating" method like is_sorted_until() or while_sorted()? I can imagine those theoretically being used for optimizations like allowing if v is sorted { for x in v ... } to do only one iteration over v, or abort the iteration very early if v is likely to be randomly shuffled, but I have no idea what use case would require those particular optimizations. In particular, I don't know of a use case where "iterate only over the sorted prefix" is part of the desired semantics. So methods like that seem niche enough that they should probably start out in itertools or some other crate.

In contrast, it's quite obvious what the is_sorted{,_by{,_key}} trio would be useful for. So much so that it's almost surprising they aren't in std already.

Ixrec on 17 Sep 2017

👍1

I agree that it would be good to have an easy way to check whether a slice is sorted. This exists in Go as sort.IsSorted, C++ as std::is_sorted, D std.algorithm.sorting.is_sorted. Plenty of people are already defining their own version of this in Rust (135 results at present).

I am a bit skeptical of the equivalent on Iterator just because the return value does not seem actionable -- you aren't going to "sort" the iterator after you find out it is not already sorted. What are some use cases for this in real code that does not involve iterating over a slice?

dtolnay on 15 Nov 2017

I don't think it is a problem o consume the iterator, we make and consume iterators for many different tasks. For example any use of Iterator::position consumes an iterator just to answer the position question.

bluss on 15 Nov 2017

@dtolnay

Consuming an Iterator is orthogonal to consuming an underlying slice:

let mut v: Vec<_> = vec![1,3,2,4];
let not_sorted: bool = !v.iter().is_sorted();
if  not_sorted {
  v.sort();  // works
}
let not_sorted: bool = !v.iter_mut().is_sorted();
if not_sorted {
  v.sort(); // works
}

vs:

let mut v: Vec<_> = vec![1,3,2,4];
let not_sorted: bool = !v.into_iter().is_sorted();
if  not_sorted {
  v.sort();  // ERROR: v has been moved
}

The point of implementing this on Iterator is that it allows the algorithm to work on everything that can be iterated upon, not only slices.

You mentioned:

This exists in Go as sort.IsSorted, C++ as std::is_sorted, D std.algorithm.sorting.is_sorted

C++ std::is_sorted and D's std.algorithm.sorting.is_sorted require only a ForwardRange, that is, anything that can be Iterated except for streaming iterators.

If they were to work only on slices they would require a ContiguousRange or a RandomAccessRange in C++ parlance.

gnzlbg on 15 Nov 2017

I would be prepared to consider a PR or RFC for this (whichever is easier). Either way please include a list of design alternatives that you evaluated -- method on slice, trait method on Iterator, free function like std::cmp::is_sorted that takes an IntoIterator, etc.

dtolnay on 19 Nov 2017

I've implemented this in this PR for the itertools crate in case someone wants to play with them: https://github.com/bluss/rust-itertools/pull/261

gnzlbg on 5 Feb 2018

Since Iterator/Itertools conflicts are pretty thorny, I'd prefer if we can merge them to Iterator in libcore directly if that's an option. But they do make good companions with wherever the .sorted() method is on an Iterator too (the method that sorts the elements and returns an iterator).

bluss on 6 Feb 2018

👍1

Just FYI: I'll try to write an RFC about this in the next couple of days. I hope to summarize everything that has been said in this thread. I will link it here, once I'm finished (or update this issue if I give up :<).