I just used binary_search inside a loop and realized that it would be nice to check before the loop if my immutable slice was sorted. So I spent 10 minutes going through the standard library just to find out that there aren't any is_sorted, is_sorted_by, is_sorted_by_key, etc. algorithms implemented for neither slices nor iterators.
We should add:
Iterator::is_sortedIterator::is_sorted_byIterator::is_sorted_by_keyand since Iterator is implemented for slices, we can add those methods to slices as well by just forwarding to the Iterator implementation.
We also shouldn't change it to be O(n) on debug-assertions builds - debug-assertions should be for the last 2x of performance, not for potential order-of-magnitude slowdowns.
@arielb1 sorry this issue turned out to be two issues, I've split that part to #44371.
In the D language standard library there is also a wrapper for sorted iterables (that performs statistical tests inside its constructor), with it you can create functions that document in the code that return/accept sorted iterables.
and since Iterator is implemented for slices those methods should work on slices as well.
Just to help clarify here: slices are iterable (IntoIterator for &[T], &mut [T]), but not themselves iterators.
@bluss what I meant is that we should probably implement those methods on slices as well, but that implementation should just be forwarded to the Iterator version. I've reworded the issue.
How would it make sense for this operation to be defined on an iterator? It would have to consume the entire iterator to get the result, leaving you only with the information that the iterator that you now can't use anymore would indeed have been sorted. .is_sorted_by_key(select_the_key) seems particularly redundant, given that it would just be .map(select_the_key).is_sorted().
It would have to consume the entire iterator to get the result,
Of course.
leaving you only with the information that the iterator _that you now can't use anymore_ would indeed have been sorted.
There are many useful algorithms that consume an iterator. Iterators don't necessarily own the items; some iterators do, some don't. If you don't want the items to get lost then don't consume an iterator that owns them. You can either copy/clone the iterator before you consume it, or you can create it again. For example, given:
let v: Vec<u32> = vec![0, 1, 2, 3, 4, 5];
then this works
if v.iter().is_sorted() {
for i in v.iter().map(|i| i* 2) {
println!("{}", i);
}
}
but this won't compile
if v.into_iter().is_sorted() { // vec is consumed here
for i in v.iter().map(|i| i * 2) {
println!("{}", i);
}
}
Given that iterating over an Iterator consumes the iterator:
let v: Vec<u32> = vec![0, 1, 2, 3, 4, 5];
let it = v.iter().map(|i| i* 2);
for i in it { println!("{}", i); } // it is moved here
for i in it { println!("{}", i); } // ERROR: it was already moved
and that is_sorted must, by necessity, look at each element on the sequence, maybe you could elaborate on why you think that consuming an iterator in this algorithm is bad, and suggest what could be done about it instead?
.is_sorted_by_key(select_the_key) seems particularly redundant, given that it would just be .map(select_the_key).is_sorted().
I've added them for both convenience and consistency. The stable and unstable sort, sort_by, and sort_by_key functions are redundant as well, yet they are very convenient. To be consistent with the sort algorithms, I think is_sorted should provide these alternatives as well. However, I also think that having them is convenient.
How about a non-terminal .while_sorted() operation? All it would have to do is compare adjacent elements and stop iterating when they are not monotonic. .require_sorted() could panic instead. Maybe this could be generalized into .while_pair(|a,b| ...) and put into itertools?
@the8472 itertools already has .tuple_window which combined with take_while is more generic than .while_pair.
Anyhow, I am neither in favour nor opposed to doing that, but I think that discussion should belong in a different issue.
This issue is about std neither checking the precondition of binary_search nor providing users the algorithm to check it themselves.
There is some prior art for is_sorted in other standard libraries, e.g., in C++, although there it is implemented on top of is_sorted_until.
is_sorted_until seems like a good model. In the rust case that could be sorted_prefix() -> (impl Iterator, impl Iterator). Then a consumer could do a binary search with the prefix and either panic if the tail is non-empty or do something else with it.
That seems like a lot of complexity to avoid a single call to sort() (or collect::<Vec<_>>() followed by sort()). Especially since your proposed sorted_prefix would have to perform hidden allocations to do its work anyway.
Ok, how about num_sorted() -> usize for iterators? That could get you the sorted prefix of a slice for example. It's more powerful than is_sorted() without additional cost.
Are there any concrete use cases for a "check sorted-ness while iterating" method like is_sorted_until() or while_sorted()? I can imagine those theoretically being used for optimizations like allowing if v is sorted { for x in v ... } to do only one iteration over v, or abort the iteration very early if v is likely to be randomly shuffled, but I have no idea what use case would require those particular optimizations. In particular, I don't know of a use case where "iterate only over the sorted prefix" is part of the desired semantics. So methods like that seem niche enough that they should probably start out in itertools or some other crate.
In contrast, it's quite obvious what the is_sorted{,_by{,_key}} trio would be useful for. So much so that it's almost surprising they aren't in std already.
I agree that it would be good to have an easy way to check whether a slice is sorted. This exists in Go as sort.IsSorted, C++ as std::is_sorted, D std.algorithm.sorting.is_sorted. Plenty of people are already defining their own version of this in Rust (135 results at present).
I am a bit skeptical of the equivalent on Iterator just because the return value does not seem actionable -- you aren't going to "sort" the iterator after you find out it is not already sorted. What are some use cases for this in real code that does not involve iterating over a slice?
I don't think it is a problem o consume the iterator, we make and consume iterators for many different tasks. For example any use of Iterator::position consumes an iterator just to answer the position question.
@dtolnay
Consuming an Iterator is orthogonal to consuming an underlying slice:
let mut v: Vec<_> = vec![1,3,2,4];
let not_sorted: bool = !v.iter().is_sorted();
if not_sorted {
v.sort(); // works
}
let not_sorted: bool = !v.iter_mut().is_sorted();
if not_sorted {
v.sort(); // works
}
vs:
let mut v: Vec<_> = vec![1,3,2,4];
let not_sorted: bool = !v.into_iter().is_sorted();
if not_sorted {
v.sort(); // ERROR: v has been moved
}
The point of implementing this on Iterator is that it allows the algorithm to work on everything that can be iterated upon, not only slices.
You mentioned:
This exists in Go as sort.IsSorted, C++ as std::is_sorted, D std.algorithm.sorting.is_sorted
C++ std::is_sorted and D's std.algorithm.sorting.is_sorted require only a ForwardRange, that is, anything that can be Iterated except for streaming iterators.
If they were to work only on slices they would require a ContiguousRange or a RandomAccessRange in C++ parlance.
I would be prepared to consider a PR or RFC for this (whichever is easier). Either way please include a list of design alternatives that you evaluated -- method on slice, trait method on Iterator, free function like std::cmp::is_sorted that takes an IntoIterator, etc.
I've implemented this in this PR for the itertools crate in case someone wants to play with them: https://github.com/bluss/rust-itertools/pull/261
Since Iterator/Itertools conflicts are pretty thorny, I'd prefer if we can merge them to Iterator in libcore directly if that's an option. But they do make good companions with wherever the .sorted() method is on an Iterator too (the method that sorts the elements and returns an iterator).
Just FYI: I'll try to write an RFC about this in the next couple of days. I hope to summarize everything that has been said in this thread. I will link it here, once I'm finished (or update this issue if I give up :<).
The RFC has been merged: https://github.com/rust-lang/rust/issues/53485
Most helpful comment
Of course.
There are many useful algorithms that consume an iterator. Iterators don't necessarily own the items; some iterators do, some don't. If you don't want the items to get lost then don't consume an iterator that owns them. You can either copy/clone the iterator before you consume it, or you can create it again. For example, given:
then this works
but this won't compile
Given that iterating over an
Iteratorconsumes the iterator:and that
is_sortedmust, by necessity, look at each element on the sequence, maybe you could elaborate on why you think that consuming an iterator in this algorithm is bad, and suggest what could be done about it instead?I've added them for both convenience and consistency. The stable and unstable
sort,sort_by, andsort_by_keyfunctions are redundant as well, yet they are very convenient. To be consistent with the sort algorithms, I thinkis_sortedshould provide these alternatives as well. However, I also think that having them is convenient.