Rust: Tracking issue for HashMap::raw_entry

Created on 22 Nov 2018  Â·  12Comments  Â·  Source: rust-lang/rust

Added in #54043.


As of 6ecad338381cc3b8d56e2df22e5971a598eddd6c / 2019-01-09, this feature covers:

impl<K, V, S> HashMap<K, V, S>
    where K: Eq + Hash,
          S: BuildHasher
{
    pub fn raw_entry(&self) -> RawEntryBuilder<K, V, S> {…}
    pub fn raw_entry_mut(&mut self) -> RawEntryBuilderMut<K, V, S> {…}
}

pub struct RawEntryBuilder<'a, K: 'a, V: 'a, S: 'a> {…} // Methods return Option<(&'a K, &'a V)>
pub struct RawEntryBuilderMut<'a, K: 'a, V: 'a, S: 'a> {…} // Methods return RawEntryMut<'a, K, V, S>
pub enum RawEntryMut<'a, K: 'a, V: 'a, S: 'a> {
    Occupied(RawOccupiedEntryMut<'a, K, V>),
    Vacant(RawVacantEntryMut<'a, K, V, S>),
}
pub struct RawOccupiedEntryMut<'a, K: 'a, V: 'a> {…}
pub struct RawVacantEntryMut<'a, K: 'a, V: 'a, S: 'a> {…}

… as well as Debug impls for each 5 new types, and their inherent methods.

A-collections B-unstable C-tracking-issue I-nominated Libs-Tracked T-libs

Most helpful comment

This is a really great API, it's also what keeps crates (hashlink for example) using hashbrown instead of the stdlib hash map -- since hashbrown exposes this.

What could be next steps here towards stabilization?

All 12 comments

What is the motivation for having separate from_hash and search_bucket methods? It seems that the only difference is whether the hash value is checked before calling is_match. However if the table does not store full hashes (i.e. hashbrown) then there is no difference between these methods.

Could we consider merging these methods into a single one? Or is there some use case where the difference in behavior is useful?

I am also extremely confused by this distinction, as my original designs didn't include them (I think?) and the documentation that was written is very unclear.

cc @fintelia

The reason I added search_bucket was because I wanted to be able to delete a random element from a HashMap in O(1) time, without storing an extra copy of all the keys. Basically, instead of doing something like this:

let key = map.iter().nth(rand() % map.len()).0.clone();
map.remove(&key);

I wanted to just be able to pick a random "bucket" and then get an entry/raw entry to the first element in it if any:

loop {
    if let Occupied(o) = map.raw_entry_mut().search_bucket(rand(), || true) {
        o.remove();
        break;
    }
}

(the probabilities aren't uniform in the second version, but close enough for my purposes)

I continue to not want to support the "random deletion" usecase in std's HashMap. You really, really, really, should be using a linked hashmap or otherwise ordered map for that.

I have removed this method in the hashbrown PR (#56241). Your code snippet for random deletion won't work with hashbrown anyways since it always checks the hash as part of the search process.

I can avoid unnecessary clones inherent to the original entry API which is nice. But unless I'm mistaken, the current raw_entry API seems to hash the keys twice in this simple use case:

#![feature(hash_raw_entry)]

use std::collections::HashMap;

let mut map = HashMap::new();

map.raw_entry_mut()
   .from_key("poneyland")
   .or_insert("poneyland", 3);

Currently I use the following function to hash once and automatically provide an owned key if necessary (somewhat similar to what was discussed in https://github.com/rust-lang/rfcs/pull/1769):

use std::borrow::Borrow;
use std::collections::hash_map::RawEntryMut;
use std::hash::{BuildHasher, Hash, Hasher};

fn get_mut_or_insert_with<'a, K, V, Q, F>(
    map: &'a mut HashMap<K, V>,
    key: &Q,
    default: F,
) -> &'a mut V
where
    K: Eq + Hash + Borrow<Q>,
    Q: Eq + Hash + ToOwned<Owned = K>,
    F: FnOnce() -> V,
{
    let mut hasher = map.hasher().build_hasher();
    key.hash(&mut hasher);
    let hash = hasher.finish();

    match map.raw_entry_mut().from_key_hashed_nocheck(hash, key) {
        RawEntryMut::Occupied(entry) => entry.into_mut(),
        RawEntryMut::Vacant(entry) => {
            entry
                .insert_hashed_nocheck(hash, key.to_owned(), default())
                .1
        }
    }
}

Given k1 and k2 with the same type K such that hash(k1) != hash(k2), is there a use-case for calling RawEntryBuilderMut::from_key_hashed_nocheck with hash(k1), &k1 and then inserting with RawVacantEntry::or_insert using k2 ?

If there isn't, why not saving the hash in RawVacantEntryMut and using it inside RawVacantEntryMut::insert ? It would even be possible to assert in debug builds that the owned key has indeed the same hash as the borrowed key used to lookup the entry.

I'm not yet very familiar with this API, but what @gdouezangrard suggested seems like a great idea to me. What even happens currently if the two hashes don't match, is the key-value pair then inserted into the wrong bucket? It's not clear to me from (quickly) reading the source code.

I submitted https://github.com/rust-lang/hashbrown/pull/54 to support using a K that doesn't implement Hash via the raw entry API. See https://github.com/rust-lang/hashbrown/issues/44 for the original motivation. Now that hashbrown is merged into std, could we expose this functionality on the std::collections::hash_map types as well?

If so, I'd be happy to submit a PR!

This is a really great API, it's also what keeps crates (hashlink for example) using hashbrown instead of the stdlib hash map -- since hashbrown exposes this.

What could be next steps here towards stabilization?

Just gonna add another ping here -- what's blocking this right now?

I see a few things that need to be resolved:

I would recommend prototyping in the hashbrown crate first, which can then be ported back in the the std HashMap.

Was this page helpful?
0 / 5 - 0 ratings