Rfcs: Put HashMap's hasher in an Result and Default::default() it to Err(Default::default)

Created on 24 Sep 2018 · 24Comments · Source: rust-lang/rfcs

HashMap implements Clone, but cloning HashMap::default() should NOT produce the same hasher state. Instead, there should be a distinction between a default HashMap and an empty/cleared HashMap. Both are empty, but the former also has the hasher set to Err(Default::default).

This is important because one could easily do:

let v: Vec<HashSet<&str>> = vec![];
// some stuff here
if v.len() < my.len() {
    v.resize(my.len(), HashSet::default());
}

and accidentally introduce a DoS exploit to their code. (this would be partially solved by resize_default, but I don't think it's a proper solution.)

T-libs

Source

SoniEx2

👍1

Most helpful comment

We already established that your proposal cannot work @SoniEx2 You'd require some BuildBuildHasher machinery.

Afaik, there is no actual danger here so long as you use hashers like SipHasher24 that provide cryptographic security assurances for their key material, so afaik RandomState remains safe.

I'd argue running any adversary controlled data through some weak hasher like fnv inherently creates a DoS vector. If you do that while arguing otherwise based on an ephemeral hasher, then you could create exactly the same problem by cloning a full hashmap, like by using Cow<HashMap<K,V>>.

Afaik, there is nothing special about empty hash maps here and no nice way to exploit them even if so.

burdges on 7 Oct 2018

👍2

All 24 comments

@rust-lang/libs This suggestion would help prevent one possible cause of denial-of-service vulnerabilities. Is it worth the cost of adding a Option to HashMap (which I think would increase its size from 5 words to 6)? Or is there an alternate mitigation without that cost?

mbrubeck on 29 Sep 2018

There's no issue if one writes

v.resize_with(my.len(), HashSet::default);

right?

Is this just in the category with .or_default(foo()) where function calls should lint to use the *_with version? We could presumably have something like an attribute on the parameter to say #[suggest_if_direct_call("resize_with")] or something.

scottmcm on 29 Sep 2018

👍1

I prefer "right by default". It seems dangerous to hide a potential DoS exploit behind an innocent-looking lint that can be easily disabled.

(resize_with isn't stable yet, either)

SoniEx2 on 29 Sep 2018

We could always replace the hasher when cloning empty HashMaps @mbrubeck so no new Option required. I've no idea if RandomState has reseeding, etc. like rand, but at worst this incurs one syscall per clone.

Is this worse than v.resize(my.len(), ["dog"].iter().collect()) though? We could actually replace the hasher when cloning any HashMap, at the cost of rehashing everything.

I do like @scottmcm answer that lints should encourage explicitness, but actually no answer above handles the perfectly reasonable Cow<'a,HashMap<K,V>> well, so..

We could add a clone mode via an optional const parameter to HashMap or a associated constant in BuildHasher:

pub enum RekeyingMode {
    FastClone,  // Never reseed
    CheapClones,  // Reseed only if empty, probably unnecessary
    SecureClone,  // Reseed even if non-empty
}

I dislike this because although explicit it mostly just shifts blame onto users.

Can we instead simply lint against using Clone for HashMap entirely, even when buried inside Cow or whatever? If you need to clone a HashMap then maybe you should use some wrapper with the desirable behavior.

burdges on 29 Sep 2018

👍1

Looking at this again, I don't think the original proposed change is allowable, regardless of whether we want to, because there's no way to get an S: BuildHasher from None in insert and friends:

impl<K, V, S> HashMap<K, V, S> where K: Eq + Hash, S: BuildHasher {
    pub fn insert(&mut self, k: K, v: V) -> Option<V>;
}

It would additionally need something like S: Default, which is a breaking change and not desirable anyway. Not to mention that Option<S> isn't enough, since HashMap is Send, so

pub fn hasher(&self) -> &S

means it would need instead to be something like Mutex<Option<S>> to lazy-init safely.

scottmcm on 29 Sep 2018

👍1

Add some dependent typing and we can easily solve this. It would only be None in cases where Default was called, or in HashMap::new().

Altho I do wonder if you could have two impls of HashMap, one where it is Default and one where it isn't.

SoniEx2 on 29 Sep 2018

Instances where this becomes dangerous sound rare, thanks to the strong protections provided by SipHash24. If the user switch to weaker hash functions like say fnv for performance, then clone becomes more risky. If the user justifies their weaker hash functions by keying them separately, then clone actually becomes dangerous.

We should just warn against using HashMap's Clone so I started writing an RFC for that:

Summary

Allow implementations of traits to warn against unexpected behavior or limitations by specifing lints against their own use. Also, provide a warning that cloning a HashMap also clones its BuildHasher.

Motivation

We encounter scenarions where implementing some trait makes sense, and sems desirable in general, but caveats about specific use cases still exist. We should expect such scenarios to encourage abuse of heavy handed techniques like hidden trait implementations (RFC #2529).

As an example, we want HashMap: Clone when possible, but if the HashMap uses a non-cryptographic hasher then this may create a DoS vector, even if no individual clone does so. We could address this shortcoming:

#[derive(Clone)]
pub struct HashMap<K, V, S = RandomState> {
    ...
}

#[caveat("A cloned HashMap can become a DoS vector, provided its Hasher is not cryptographic, because cloned BuildHasher instance can facilitate exposing the Hasher's key.  RandomState has cryptographic key protections, making clone safer.")]
default impl<K: Clone, V: Clone, S: Clone> Clone for HashMap<K, V, S>

#[no_caveat]
impl<K: Clone, V: Clone> Clone for HashMap<K, V, RandomState>

burdges on 3 Oct 2018

Hmm... If we're gonna be using another word for it anyway, we could always go with Result<H, fn() -> H>?

fn new() -> Foo {
    Foo { hasher: Err(<H as Default>::default) }
}

SoniEx2 on 7 Oct 2018

@burdges Bikeshed: add names to #[caveat]s, in case one impl ends up with more than one:

#[caveat("hashmap_clone_dos", "A cloned HashMap can become a DoS vector, [etc]")] default impl<K: Clone, V: Clone, S: Clone> Clone for HashMap<K, V, S> { .. }

#[no_caveat("hashmap_clone_dos")] impl<K: Clone, V: Clone> Clone for HashMap<K, V, RandomState> { .. }

Moongoodgirl on 7 Oct 2018

👍1

This is not about caveats in impls, please keep that separate.

SoniEx2 on 7 Oct 2018

We already established that your proposal cannot work @SoniEx2 You'd require some BuildBuildHasher machinery.

Afaik, there is no actual danger here so long as you use hashers like SipHasher24 that provide cryptographic security assurances for their key material, so afaik RandomState remains safe.

Afaik, there is nothing special about empty hash maps here and no nice way to exploit them even if so.

burdges on 7 Oct 2018

👍2

"We" already established you can't use Option, because there's no function pointer to make the BuildHasher.

I said you can use an Result<S, fn() -> S> instead of an Option and it'd work. https://play.rust-lang.org/?gist=9c724f67bb9379c9f89568bbbfc8bb3c&version=stable&mode=debug&edition=2015

SoniEx2 on 7 Oct 2018

You're now proposing what I called a "buildbuildhasher" scheme, but..

We've no strong reason to believe this improves security over using the current default of S = RandomState. Also, these schemes cannot cleanly address cloning non-empty HashMaps when S != RandomState either. In principle, these scheme also create vulnerabilities or new DoS vectors in low entropy settings too.

The right answer is to warn anyone using the HashMap: Clone bound with S != RandomState.

burdges on 7 Oct 2018

RandomState doesn't guarantee DoS-proofing. And, in fact, RandomState isn't affected by low-entropy settings! See here.

So, yes, it would improve things. At the very least, it would be no worse than what we currently have.

SoniEx2 on 7 Oct 2018

Very nice! :) I hadn't know about these attacks outside the siphash threat model, so scheme like you propose actually do help, but only with empty hash maps.

I'm afraid lints against the HashMap: Clone bound remain the only solution that covers all cases, although now we cannot exclude S = RandomState so the lints become more intrusive.

burdges on 8 Oct 2018

you could create exactly the same problem by cloning a full hashmap

Well, what would it take to solve that? We could re-hash everything in the hashmap on clone, right? If S::clone changed the hashing seed and we re-hashed everything, I think it would be fine.

But here's an interesting observation: _There's no requirement from HashMap that S::clone return a new BuildHasher that produces Hashers that compute the same hashes as the original one!_

BuildHasher specifically says that Hashers _from the same instance_ produce the same hashes, but doesn't say anything about other instances, even Clones. Hash has the "if you're implementing this and Eq" rule on it, but there's nothing that says what Clone is supposed to mean.

So arguably it's a bug today that we're _not_ re-hashing everything on HashMap::clone...

scottmcm on 8 Oct 2018

Yes exactly @scottmcm but..

Right now HashMap: Clone costs only an allocation and a memcpy. If you change the Hasher then you must rehashing everything, which becomes extremely slow in comparison. If you want that, then you should just write old_hash_map.iter().collect::<HashMap<_,_,_>>(). Also, there are scenarios where quickly coping a HashMap makes sense, so another method must provide Clone

I still like your original idea about using lints, but we should lint against the HashMap: Clone bound itself, not just methods like Vec::resize.

I also like lints here because hidden impls #2529 make me nervous. I'd hope lints giving caveats about impls should help prevent over usage of hidden impls.

burdges on 8 Oct 2018

@burdges

Right now HashMap: Clone costs only an allocation and a memcpy.

That seems implausible to me. HashMaps can have Strings in them.

scottmcm on 8 Oct 2018

Again this isn't about lints please leave those elsewhere.

SoniEx2 on 8 Oct 2018

@scottmcm Yes, HashMap: Clone actually clones all the entries, whatever that means, but if those entries are simple enough then it should just copy memory, but rehashing a large table could be quite expensive.

burdges on 8 Oct 2018

As an aside, BuildHasherDefault is kinda an anti-pattern since people commonly envision Default as constant. And the example even uses #[derive(Default)].

burdges on 8 Oct 2018

hmm, is it a logic error for a key's Clone to change its hash?

SoniEx2 on 28 Oct 2018

From rereading the official documentation for Clone and Hash, no it doesn't seem to be a logic error. So I think HashMap's clone() does need to rerun hash() on every key value just in case the hashes changed.

I think there might even be legitimate use cases for that. Say the key type is some kind of Box or Gc type or whatever that contains a pointer. In that case, hash() simply returning the wrapped pointer would probably be a pretty decent implementation, and that pointer value would obviously change on clone(). The part I'm not sure about is why you'd ever want to use smart pointers as HashMap keys, but I suspect someone has an answer to that.

In order for it to be a logic error, you'd need a special rule just for types implementing both Hash and Clone, like the rule we have for Hash and Eq. I don't think you could add a Clone rule and a Hash rule that naturally combine to form this guarantee, because there's not a lot Clone can guarantee about a cloned value's semantics (for instance, a rule like "a value and its clone must have no observable difference in behavior" would immediately invalidate all smart pointers, and probably tons of other stuff).

Ixrec on 28 Oct 2018

okay, so it's (IMO) reasonable to make BuildHasher also change on Clone. it'd be a better solution to this security issue and doesn't make HashMap any bigger.

SoniEx2 on 28 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings