Rfcs: pre-RFC: HashSet::pop()

Created on 24 Nov 2016  路  28Comments  路  Source: rust-lang/rfcs

(Originally filed as https://github.com/rust-lang/rust/issues/37986, but a pre-RFC belongs in the RFC repo, not the rust repo. Re-filing it here.)

I'd like to have a pop() method available for HashSet, which removes an item from the set if any, and returns an Option<T> (returning None if empty). That would make it easy to iterate over a set with while let element = set.pop() { ... }, without holding a mutable reference to the set as HashSet::drain() does, so you can insert more items into the set as you iterate.

Does that seem like a reasonable method to add to HashSet? Does this need an RFC, or just a patch to std?

T-libs

Most helpful comment

I'm still interested in this and would still like to see such a method exist, or some other alternative that allows iteration without holding a reference.

All 28 comments

Only issue is that it'd have to search linearly through the table for a non-empty bucket, so calling it in a loop would be O(n\^2)ish.

@comex If you stored a "hint" index for the last location with an item, you should get O(n) amortized time to pop every item out of a set.

If you had a loop that also inserted new items, you should still avoid quadratic behavior, as long as the hash doesn't place items immediately behind the hint index every time.

@joshtriplett Sounds more like what a draining iterator might do.

@eddyb, @joshtriplett specifically requested this so that they can add more items into the set during iteration, so, drain is out for this use case

This is efficient (O(1) average) with a hashmap implementation like OrderMap's, but not with Rust's current HashMap.

crates.io already provides ordermap and linked-hash-map for two different use cases, but both have O(1) pop.

@bluss An OrderSet would solve this for me. It'd be nice to have those in core Rust, but I don't mind pulling in another crate.

@joshtriplett These are all part of the contain-rs organisation, so, I'd talk to them about that. A lot of these data structures are deemed as non-essential and many were actually removed from libstd, but if you can make a compelling case otherwise you could probably write up an RFC for it.

I don't know where the original discussion on this was but you could probably search for it.

Triage ping @joshtriplett

I'm still interested in this and would still like to see such a method exist, or some other alternative that allows iteration without holding a reference.

@joshtriplett What I mean is: do you have any plans on making a PR / writing an RFC? 馃槂

@Centril Making a PR, no. Writing an RFC, maybe; depends heavily on whether one would be welcomed or not, and based on this pre-RFC I'm not sure.

@joshtriplett Personally I like the idea. cc @rust-lang/libs -- what do you think?

Does the switch to hashbrown affect this at all? Would that make this easier/harder?

@joshtriplett I suspect it could make this significantly faster, at least the naive linear scan.
cc @Amanieu

Yes the linear scan would be a bit faster, but you would still end up with O(n^2) complexity for a pop loop. I don't think that this is something that we want to encourage.

I wonder if it would be feasible to add this under a name like scan_remove_next to highlight the potential cost?

IMO anyone that needs this functionality should be using IndexMap instead, which guarantees O(1) pop. Incidentally this is similar to the hash table that Python uses for its dict.

That's what I'm now using in the places where I can, and you're right that that's what to use where possible! It's perhaps somewhat less discoverable, so maybe we should point people towards that from the HashMap docs (there are other "nice" features it adds that people may be used to coming from things like dict)? Unfortunately, IndexMap has the downside that items are popped in (reverse?) insertion order, which means it doesn't fit all use-cases (see, e.g., https://github.com/Amanieu/hashbrown/issues/43). That said, maybe those cases are too niche to really worry about.

IMO anyone that needs this functionality should be using IndexMap instead, which guarantees O(1) pop. Incidentally this is similar to the hash table that Python uses for its dict.

Unfortunately that's not very discoverable, if all you know is that you need a pop function.

I'm nervous about such an innocuous sounding method being O(n) time too.

After NLL lands then you could write pop() like

hm.keys().next().and_then( |k| hm.remove_entry(k) )

If you need several sequential pops, or need to skip some, then adapting this code snippet gives better performance. Could both it and IndexMap, etc. be mentioned in some book?

On Tue, Apr 23, 2019 at 09:06:41AM -0700, Jeff Burdges wrote:

I'm nervous about such an innocuous sounding method being O(n) time too.

I would ideally like such a method to be (amortized) O(1), which AFAICT
would just require storing a hint value.

@joshtriplett I don't quite see how that's true? After the value in question has been removed, you need to search for the _next_ hint value, which is the same cost as the original search anyway.

Not quite; you can skip re-searching large empty areas of the table. That'd help avoid O(n**2) behavior when using pop in a loop, as long as you didn't happen to insert new elements right behind the hint every time.

(I'd also be happy with an equivalent to drain() that doesn't prevent adding new items in the body of the loop.)

I'd think some Drain::abort method achieves that easiest, no?

I'm worried that adding your hint hurts almost all HashMap usages. Also, inserting corrupts the hint for some use cases like resuming after a partial drain. It's kinda neat your hint makes pop like an undo for insert, except this sounds fragile and encourages bugs.

Instead, one could maybe expose the index, like returning it from Drain::abort, and provide some Drain::resume(HashMap<..>, usize) -> Drain method? In resume, there are explicitly no guarantees about behavior since resizing could make an index invalid, but if you want that behavior then the method exists.

I wonder if it could work to have the drain iterator provide a couple of methods to modify the HashMap that it holds a mutable reference to?

Suppose you could do this:

let iter = foo.drain();
for x in iter {
    // ...
    if bar {
        iter.modify(|foo| foo.insert(...))
    }
}

I don't think Drain is the right place for this since it keeps the table in an invalid state while iterating (items have been moved out but not marked as such in the metadata).

Wouldn't it be good if there existed a data structure like indexmap::IndexMap in Rust std library? I've always found it surprising that the default hashmap in Rust has outright bad performance for iteration.

I suppose std HashMap is much faster for inserts and lookups, which means we can't just change std HashMap implementation to that of IndexMap.

But, for certain kinds of problems, for instance breadth-first searches in graphs, the usecase where IndexMap excels is very common. It's unsatisfactory that standard C# beats Rust performance wise for these problems.

Couldn't we arrange for IndexMap to be available in the standard library? And have the documentation for HashMap hint that if iterations is needed, IndexMap may give better performance?

The O(n^2) for a pop loop is sufficiently surprising that I don't believe pop should be in that form in std.

As best I recall, every time I've wanted pop it's been for a pop loop that may return early and can't take ownership of the remainder of the entries. Is it possible to expose iteration of OccupiedEntrys? In which case that use case could be handled like this:

for (k, v) in map.entries().map(OccupiedEntry::remove_entry) {
    ...
}

Or, to avoid borrowing map. O(n^2) again but at least it's more obvious:

while let Some((k, v)) = map.entries().map(OccupiedEntry::remove_entry).next() {
    ...
}

In many cases of course IndexSet/Map is more appropriate. Perhaps that crate can be referenced in the HashSet/Map docs as providing .pop()?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

silversolver1 picture silversolver1  路  3Comments

onelson picture onelson  路  3Comments

mqudsi picture mqudsi  路  3Comments

burdges picture burdges  路  3Comments

steveklabnik picture steveklabnik  路  4Comments