Rust: add pop() to HashSet etc.?

Created on 13 Aug 2015  路  19Comments  路  Source: rust-lang/rust

In Python I've often found the set.pop() method useful which would remove and return a single arbitrary element of the set (or raise if empty). In Rust this should return an Option<T>, of course.

I haven't found such a method on HashSet, have I just not looked hard enough?

A-collections C-feature-accepted T-libs

Most helpful comment

Yes please! Lots of worklist algorithms use a set, and repeatedly pop an arbitrary element.

All 19 comments

cc @Gankro

Pop doesn't make sense for our HashMaps -- it would be an O(m/n) expected linear search where m is the capacity and n the len.

If you want a good pop you should use https://libraries.io/cargo/linked-hash-map

BTree* could cleanly provide such functionality (pop_min, pop_max)

I understand the desire to implement only efficient operations, but in that case my question is really how to get elements out of the set at all (without turning it into an iterator).

O(m/n) doesn't sound too bad for me, btw.

Just saw a reimplementation of this: http://jneem.github.io/regex-dfa/src/regex_dfa/dfa.rs.html (PopArbitrary at the top).

For the BTreeSet it could be implemented in less than linear time, it's a much better alternative.

Having just written code that needs some equivalent of pop on an arbitrary HashMap, I'd like to second this proposal. Since HashMap is used pervasively, sometimes that is simply what is one has to deal with, and using an alternative data-structure (e.g. BTreeMap) isn't possible. At the moment, the only way of doing this is via something like M.iter().next().unwrap().clone() followed by M.remove(...) which is ugly, inefficient, and obfuscatory. I imagine the same functionality is needed by HashSet and the like, though I see HashMap used much more often.

[Although it's a little off-topic, I have doubts about the performance and memory trade-offs of a linked hashmap. A more cache-friendly technique for ordered maps was first described -- as far as I know -- [by Raymond Hettinger](http://code.activestate.com/recipes/578375/). It was then independently implemented (and possibly reinvented) in PHP (who [have a nice description of it](https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html)) and PyPy (who [have a slightly shorter but still interesting description](http://morepypy.blogspot.co.uk/2015/01/faster-more-memory-efficient-and-more.html)). Still, none of this helps if one has been passed a HashMap.]

I need one for whatever set this is possible to implement on.

@Gankro that depends on frequency of operation. Even though this particular operation may be slow if the operation is performed unfrequently HashSet may be still a better option overall.

(And if you have a datatype which does not/cannot be Clone you cannot roll this primitive on your own.)

second the pop on the HashSet. I see how drain( range ) is not a good idea but it's very painful right now to iterate over elements and then remove them in another pass performance-wise.

@ltratt ordermap now exists as a Rust implementation of that idea. It's got an efficient .pop() for this reason.

I've wanted this a few times. If I wrote a PR for it is it likely to get merged?

It needs to be a constant time (average) method, doesn't it? In that case it requires implementation changes in the hashmap and deciding that is full of tradeoffs.

I think that if it could have been added without such major changes, you could just post a PR and we'd have it stable soon.

I was assuming it would be O(m/n), but maybe it would be better to have some kind of iterator which allows removing elements from the set. eg. an iterator of OccupiedEntry. That would be more versatile and would sometimes be more efficient to use.

The O(m/n) behaviour is a DOS footgun, I don't think it would be acceptable to ship. In the same vein I don't think this is important enough to possibly redesign hashmap to make efficient.

How about shrinking the table when it is more empty then a given threshold?
This would ensure that O(m/n) is not that bad.
(AFAICT even shrinking only in pop should work from an amortised cost perspective)

I would be interested in seeing an implementation of this in a PR. I don't necessarily agree with https://github.com/rust-lang/rust/issues/27804#issuecomment-339040278 and https://github.com/rust-lang/rust/issues/27804#issuecomment-339695492 that the implementation needs to be constant amortized time. Let's look at some benchmarks once there is an implementation to see how bad it gets.

Agree with the need of a take_arbitrary method.

Imagine the following situation: you have a pool (set) of elements to handle, and while handling each, you may find yourself handling other elements, and would thus wish to remove those other encountered elements from the pool.
You cannot simply iterate over the pool, since the body will attempt to mutate the pool, you must then iterate manually.
Here is an example in Python pseudo-code:

# to_visit = set(...)
while to_visit:  # while non-empty:
    current_one = to_visit.pop()
    do_slow_stuff_with(current_one)
    if some_cond(current_one):
        another_one = some_fast_function(current_one)
        to_visit.discard(another_one)

I have implemented such a method the most efficiently I could (no Copy or Clone whatsoever), but sadly it has required the use of an unsafe block to hide the origin of a reference to the compiler, since it implies to mutably borrow the set while having a shared/borrowed value from that very set:

use std::hash::{
    Hash,
    BuildHasher,
};
use std::collections::HashSet;

/// Takes an arbitrary element from a `HashSet`, or None if empty
pub fn hashset_take_arbitrary<K, S> (
    set: &mut HashSet<K, S>,
) -> Option<K>
where
    K: Hash + Eq,
    S: BuildHasher,
{
    let key_ref = {
        if let Some(key_ref) = set.iter().next() {
            /* must hide the origin of this borrow ... */
            unsafe { &*(key_ref as *const _) }
        } else {
            return None
        }
    };
    /* ... so that we may be able to mutably borrow the set here
       despite key_ref existence */
    set.take(key_ref)
}

That way we can iterate as expected with

while let Some(current_one) = hashset_take_arbitrary(&mut to_visit) {
  /* ... let another_one = ... */
  (&mut to_visit).remove(&another_one);
}

See also rust-lang/rfcs#1800.

Yes please! Lots of worklist algorithms use a set, and repeatedly pop an arbitrary element.

Was this page helpful?
0 / 5 - 0 ratings