HashMap implements Clone, but cloning HashMap::default() should NOT produce the same hasher state. Instead, there should be a distinction between a default HashMap and an empty/cleared HashMap. Both are empty, but the former also has the hasher set to Err(Default::default).
This is important because one could easily do:
let v: Vec<HashSet<&str>> = vec![];
// some stuff here
if v.len() < my.len() {
v.resize(my.len(), HashSet::default());
}
and accidentally introduce a DoS exploit to their code. (this would be partially solved by resize_default, but I don't think it's a proper solution.)
@rust-lang/libs This suggestion would help prevent one possible cause of denial-of-service vulnerabilities. Is it worth the cost of adding a Option to HashMap (which I think would increase its size from 5 words to 6)? Or is there an alternate mitigation without that cost?
There's no issue if one writes
v.resize_with(my.len(), HashSet::default);
right?
Is this just in the category with .or_default(foo()) where function calls should lint to use the *_with version? We could presumably have something like an attribute on the parameter to say #[suggest_if_direct_call("resize_with")] or something.
I prefer "right by default". It seems dangerous to hide a potential DoS exploit behind an innocent-looking lint that can be easily disabled.
(resize_with isn't stable yet, either)
We could always replace the hasher when cloning empty HashMaps @mbrubeck so no new Option required. I've no idea if RandomState has reseeding, etc. like rand, but at worst this incurs one syscall per clone.
Is this worse than v.resize(my.len(), ["dog"].iter().collect()) though? We could actually replace the hasher when cloning any HashMap, at the cost of rehashing everything.
I do like @scottmcm answer that lints should encourage explicitness, but actually no answer above handles the perfectly reasonable Cow<'a,HashMap<K,V>> well, so..
We could add a clone mode via an optional const parameter to HashMap or a associated constant in BuildHasher:
pub enum RekeyingMode {
FastClone, // Never reseed
CheapClones, // Reseed only if empty, probably unnecessary
SecureClone, // Reseed even if non-empty
}
I dislike this because although explicit it mostly just shifts blame onto users.
Can we instead simply lint against using Clone for HashMap entirely, even when buried inside Cow or whatever? If you need to clone a HashMap then maybe you should use some wrapper with the desirable behavior.
Looking at this again, I don't think the original proposed change is allowable, regardless of whether we want to, because there's no way to get an S: BuildHasher from None in insert and friends:
impl<K, V, S> HashMap<K, V, S> where K: Eq + Hash, S: BuildHasher {
pub fn insert(&mut self, k: K, v: V) -> Option<V>;
}
It would additionally need something like S: Default, which is a breaking change and not desirable anyway. Not to mention that Option<S> isn't enough, since HashMap is Send, so
pub fn hasher(&self) -> &S
means it would need instead to be something like Mutex<Option<S>> to lazy-init safely.
Add some dependent typing and we can easily solve this. It would only be None in cases where Default was called, or in HashMap::new().
Altho I do wonder if you could have two impls of HashMap, one where it is Default and one where it isn't.
Instances where this becomes dangerous sound rare, thanks to the strong protections provided by SipHash24. If the user switch to weaker hash functions like say fnv for performance, then clone becomes more risky. If the user justifies their weaker hash functions by keying them separately, then clone actually becomes dangerous.
We should just warn against using HashMap's Clone so I started writing an RFC for that:
Allow implementations of traits to warn against unexpected behavior or limitations by specifing lints against their own use. Also, provide a warning that cloning a HashMap also clones its BuildHasher.
We encounter scenarions where implementing some trait makes sense, and sems desirable in general, but caveats about specific use cases still exist. We should expect such scenarios to encourage abuse of heavy handed techniques like hidden trait implementations (RFC #2529).
As an example, we want HashMap: Clone when possible, but if the HashMap uses a non-cryptographic hasher then this may create a DoS vector, even if no individual clone does so. We could address this shortcoming:
#[derive(Clone)]
pub struct HashMap<K, V, S = RandomState> {
...
}
#[caveat("A cloned HashMap can become a DoS vector, provided its Hasher is not cryptographic, because cloned BuildHasher instance can facilitate exposing the Hasher's key. RandomState has cryptographic key protections, making clone safer.")]
default impl<K: Clone, V: Clone, S: Clone> Clone for HashMap<K, V, S>
#[no_caveat]
impl<K: Clone, V: Clone> Clone for HashMap<K, V, RandomState>
Hmm... If we're gonna be using another word for it anyway, we could always go with Result<H, fn() -> H>?
fn new() -> Foo {
Foo { hasher: Err(<H as Default>::default) }
}
@burdges Bikeshed: add names to #[caveat]s, in case one impl ends up with more than one:
#[caveat("hashmap_clone_dos", "A cloned HashMap can become a DoS vector, [etc]")] default impl<K: Clone, V: Clone, S: Clone> Clone for HashMap<K, V, S> { .. }
#[no_caveat("hashmap_clone_dos")] impl<K: Clone, V: Clone> Clone for HashMap<K, V, RandomState> { .. }
This is not about caveats in impls, please keep that separate.
We already established that your proposal cannot work @SoniEx2 You'd require some BuildBuildHasher machinery.
Afaik, there is no actual danger here so long as you use hashers like SipHasher24 that provide cryptographic security assurances for their key material, so afaik RandomState remains safe.
I'd argue running any adversary controlled data through some weak hasher like fnv inherently creates a DoS vector. If you do that while arguing otherwise based on an ephemeral hasher, then you could create exactly the same problem by cloning a full hashmap, like by using Cow<HashMap<K,V>>.
Afaik, there is nothing special about empty hash maps here and no nice way to exploit them even if so.
"We" already established you can't use Option, because there's no function pointer to make the BuildHasher.
I said you can use an Result<S, fn() -> S> instead of an Option and it'd work. https://play.rust-lang.org/?gist=9c724f67bb9379c9f89568bbbfc8bb3c&version=stable&mode=debug&edition=2015
You're now proposing what I called a "buildbuildhasher" scheme, but..
We've no strong reason to believe this improves security over using the current default of S = RandomState. Also, these schemes cannot cleanly address cloning non-empty HashMaps when S != RandomState either. In principle, these scheme also create vulnerabilities or new DoS vectors in low entropy settings too.
The right answer is to warn anyone using the HashMap: Clone bound with S != RandomState.
RandomState doesn't guarantee DoS-proofing. And, in fact, RandomState isn't affected by low-entropy settings! See here.
So, yes, it would improve things. At the very least, it would be no worse than what we currently have.
Very nice! :) I hadn't know about these attacks outside the siphash threat model, so scheme like you propose actually do help, but only with empty hash maps.
I'm afraid lints against the HashMap: Clone bound remain the only solution that covers all cases, although now we cannot exclude S = RandomState so the lints become more intrusive.
you could create exactly the same problem by cloning a full hashmap
Well, what would it take to solve that? We could re-hash everything in the hashmap on clone, right? If S::clone changed the hashing seed and we re-hashed everything, I think it would be fine.
But here's an interesting observation: _There's no requirement from HashMap that S::clone return a new BuildHasher that produces Hashers that compute the same hashes as the original one!_
BuildHasher specifically says that Hashers _from the same instance_ produce the same hashes, but doesn't say anything about other instances, even Clones. Hash has the "if you're implementing this and Eq" rule on it, but there's nothing that says what Clone is supposed to mean.
So arguably it's a bug today that we're _not_ re-hashing everything on HashMap::clone...
Yes exactly @scottmcm but..
Right now HashMap: Clone costs only an allocation and a memcpy. If you change the Hasher then you must rehashing everything, which becomes extremely slow in comparison. If you want that, then you should just write old_hash_map.iter().collect::<HashMap<_,_,_>>(). Also, there are scenarios where quickly coping a HashMap makes sense, so another method must provide Clone
I still like your original idea about using lints, but we should lint against the HashMap: Clone bound itself, not just methods like Vec::resize.
I also like lints here because hidden impls #2529 make me nervous. I'd hope lints giving caveats about impls should help prevent over usage of hidden impls.
@burdges
Right now
HashMap: Clonecosts only an allocation and a memcpy.
That seems implausible to me. HashMaps can have Strings in them.
Again this isn't about lints please leave those elsewhere.
@scottmcm Yes, HashMap: Clone actually clones all the entries, whatever that means, but if those entries are simple enough then it should just copy memory, but rehashing a large table could be quite expensive.
As an aside, BuildHasherDefault is kinda an anti-pattern since people commonly envision Default as constant. And the example even uses #[derive(Default)].
hmm, is it a logic error for a key's Clone to change its hash?
From rereading the official documentation for Clone and Hash, no it doesn't seem to be a logic error. So I think HashMap's clone() does need to rerun hash() on every key value just in case the hashes changed.
I think there might even be legitimate use cases for that. Say the key type is some kind of Box or Gc type or whatever that contains a pointer. In that case, hash() simply returning the wrapped pointer would probably be a pretty decent implementation, and that pointer value would obviously change on clone(). The part I'm not sure about is why you'd ever want to use smart pointers as HashMap keys, but I suspect someone has an answer to that.
In order for it to be a logic error, you'd need a special rule just for types implementing both Hash and Clone, like the rule we have for Hash and Eq. I don't think you could add a Clone rule and a Hash rule that naturally combine to form this guarantee, because there's not a lot Clone can guarantee about a cloned value's semantics (for instance, a rule like "a value and its clone must have no observable difference in behavior" would immediately invalidate all smart pointers, and probably tons of other stuff).
okay, so it's (IMO) reasonable to make BuildHasher also change on Clone. it'd be a better solution to this security issue and doesn't make HashMap any bigger.
Most helpful comment
We already established that your proposal cannot work @SoniEx2 You'd require some
BuildBuildHashermachinery.Afaik, there is no actual danger here so long as you use hashers like SipHasher24 that provide cryptographic security assurances for their key material, so afaik
RandomStateremains safe.I'd argue running any adversary controlled data through some weak hasher like fnv inherently creates a DoS vector. If you do that while arguing otherwise based on an ephemeral hasher, then you could create exactly the same problem by cloning a full hashmap, like by using
Cow<HashMap<K,V>>.Afaik, there is nothing special about empty hash maps here and no nice way to exploit them even if so.