Rust: Tracking issue for `BinaryHeap` sorted iterator methods

Created on 18 Mar 2019  路  13Comments  路  Source: rust-lang/rust

This tracks the stabilization of BinaryHeap::into_iter_sorted (binary_heap_into_iter_sorted) and BinaryHeap::drain_sorted (binary_heap_drain_sorted) implemented in https://github.com/rust-lang/rust/pull/65091.

EDIT much later:

  • [ ] https://github.com/rust-lang/rust/pull/76234 looked at stabilizing part of this, but didn't come to a conclusion that it was ready-to-go as-is. The questions from there (such as consistency in order with into_sorted_vec) need to be addressed before a future attempt to stabilize.

Original feature request:


Currently, both BinaryHeap::into_iter and BinaryHeap::drain yield the heap elements in an arbitrary order. This seems counter-intuitive given the nature of a heap. Sadly, we can't simply change their behavior behind the scenes, as we would have to remove the implementation of DoubleEndedIterator. Should we perhaps add into_ordered_iter and drain_ordered? The implementation is pretty straightforward so I'd be happy to submit a PR.

A-collections A-iterators B-unstable C-tracking-issue Libs-Tracked T-libs

Most helpful comment

@clarfon No, because getting the largest k items from that is O(n lg n + k), whereas with into_ordered_iter() it would be O(k lg n).

All 13 comments

Isn't into_sorted_vec().into_iter() sufficient?

@clarfon No, because getting the largest k items from that is O(n lg n + k), whereas with into_ordered_iter() it would be O(k lg n).

@clarfon In addition to what @scottmcm said, into_sorted_vec().into_iter() takes up space proportional to the number of items in the heap because you have to create the Vec first. That means that you've roughly doubled the amount of memory you need, which could be a concern either when you are on more limited systems, or when you're dealing with very large amounts of data.

@ckaran into_sorted_vec actually doesn't take up any space at all; it reuses the same buffer.

@clarfon Sorry, I was being unclear. I was thinking that you'd want to continue using the heap afterwards. IIRC, into_sorted_vec() will consume the collection (which, you're right, is done in place), which means that you need to do a clone on the heap first, which is what I was really commenting on. Sorry about being unclear.

Hi,

I've implemented both API in the local rust tree.
I see some problems in the API and I'm proposing the better version at the bottom of this comment.

Any ideas?

Currently proposed API

.into_ordered_iter(self)

    /// Returns an iterator which retrieves elements in heap order.
    /// This method consumes the original heap.
    ///
    /// # Examples
    ///
    /// Basic usage:
    ///
    /// ```
    /// #![feature(binary_heap_into_ordered_iter)]
    /// use std::collections::BinaryHeap;
    /// let heap = BinaryHeap::from(vec![1, 2, 3, 4, 5]);
    ///
    /// assert_eq!(heap.into_ordered_iter().take(2).collect::<Vec<_>>(), vec![5, 4]);
    /// ```
    #[unstable(feature = "binary_heap_into_ordered_iter", issue = "59278")]
    pub fn into_ordered_iter(self) -> IntoOrderedIter<T> {
        IntoOrderedIter {
            inner: self,
        }
    }

.drain_ordered(&mut self)

    /// Returns an iterator which retrieves elements in heap order.
    /// The retrieved elements will be removed from the original heap.
    ///
    /// Note: Unlike other `.drain()` methods, this method removes elements *lazily*.
    /// Only the retrieved elements will be removed.
    ///
    /// # Examples
    ///
    /// Basic usage:
    ///
    /// ```
    /// #![feature(binary_heap_drain_ordered)]
    /// use std::collections::BinaryHeap;
    ///
    /// let mut heap = BinaryHeap::from(vec![1, 2, 3, 4, 5]);
    /// assert_eq!(heap.len(), 5);
    ///
    /// let removed = heap.drain_ordered().take(2).collect::<Vec<_>>(); 
    /// assert_eq!(removed, vec![5, 4]);
    /// assert_eq!(heap.len(), 3);
    ///
    /// let removed = heap.drain_ordered().take(3).collect::<Vec<_>>(); 
    /// assert_eq!(removed, vec![3, 2, 1]);
    /// assert_eq!(heap.len(), 0);
    /// ```
    #[inline]
    #[unstable(feature = "binary_heap_drain_ordered", issue = "59278")]
    pub fn drain_ordered(&mut self) -> DrainOrdered<'_, T> {
        DrainOrdered {
            inner: self,
        }
    }

Problems I see

  • (P1) Too many iters: New users won't get to the right one.
    => Edit: BinaryHeap doc example could be updated after this feature would be stabilized.
  • (P2) Code duplication: The impl of .into_ordered_iter and .drain_ordered is essentially the same. They are duplicated codes.
    => @scottmcm: Both owning and borrowing versions are useful.
  • (P3) .drain() inconsistency: The semantics of .drain_ordered() is lazy. Other all .drain() method immediately removes elements.
    => @scottmcm: I agree that drop(foo.drain_ordered()); should result in an empty foo. But that can be accomplished by just clearing in the iterator's drop (assuming it's safe so doesn't need to amplify).

~Better API (Solution)~

  • ~Implement one method .into_ordered_iter(&mut self).
    This is the mix of above .into_ordered_iter(self) and .drain_ordered(&mut self). I picked more useful one and changed the name.~
  • ~Do not implement .drain_ordered() method.~

-> ~Solves P2 and P3. Partially solves P1 (discoverability).~

Both owning and borrowing versions are useful. I can't return a version that borrows from a local variable, but I can return the version that moves it.

That said, I agree that drop(foo.drain_ordered()); should result in an empty foo. But that can be accomplished by just clearing in the iterator's drop (assuming it's safe so doesn't need to amplify).

(I'm not worried about discoverability here; people have presumably been taught about the difference in things like Vec, which also has .into_iter() and .drain(..).)

Thanks @scottmcm for the comment. I updated my comment accordingly.

@jonhoo, are you still working on PR for this?

@sekineh nope, not currently working on a PR

I've updated the issue to reflect that this now tracks the features implemented in https://github.com/rust-lang/rust/pull/65091.

@KodrAus (not sure if you're the right person to ping for this) This should probably get C-tracking.

I kind of feel like in-order iteration should be the default for a heap. I was surprised to learn that the IntoIterator impl yields items in arbitrary order.

Also the destructor of DrainSorted will currently leak items if one of their destructors panics.

Was this page helpful?
0 / 5 - 0 ratings