Rust: Exposure of HashMap iteration order allows for O(n²) blowup.

Created on 15 Sep 2016  Â·  97Comments  Â·  Source: rust-lang/rust

Exposing HashMap's iteration order can cause O(n²) blowup even in innocent-looking code _without_ the presence of an attacker. In the presence of an attacker, access to the order of a dictionary allows HashDoS-like attacks with only two requests in common scenarios.

Without an attacker

Consider a user with two possibly-disjoint hash maps

let first_map: HashMap<u64, _> = (0..900000).map(|i| (i, ())).collect();
let second_map: HashMap<u64, _> = (900000..1800000).map(|i| (i, ())).collect();

The user wants to merge the hash maps, and does so naïvely,

let mut merged = first_map;
merged.extend(second_map);

Time for merge when second_map is shuffled: 0.4s

Time for merge when second_map is not shuffled: 40.s (x100 amplification)

This effect is noticeably more pronounced when merging with a round robin strategy.

With an attacker

The threat model here is simple. The attacker is able to send JSON to the server. The server parses the JSON into a HashMap and through whatever means - an error message including the formatted map or explicit listing of the contents of the map - may reveal the order of the map.

The attack on this model requires two requests. The first sends some large JSON to the server

let first_request: HashMap<u64, _> = (0..900000).map(|i| (i, ())).collect();

and recieves an ordering

let returned_data: Vec<_> = first_request.into_iter().collect();

The attacker then sends the first and third quarter of this list in a new JSON object.

let second_request: HashMap<u64, _> =
 returned_data[..225000].iter()
     .chain(&returned_data[450000..675000])
     .cloned().collect();

Total time _without_ second request: 0.1s

Total time _with_ second request: 200s (x2000 amplification)


Solutions, near-solutions and non-solutions

These solutions should not be considered to be ordered, necessarily disjoint, nor an exhaustive list of options.

Fall back to BTreeMap

It should be clear that we cannot treat hash maps as a solved problem just because we use SipHash. In fact, SipHash is entirely insufficient to solve the problem. My first suggestion is thus to stop trying to make the hasher secure, and instead fall back to BTreeMap when nonlinear behaviour is detected.

This guarantees a minimum level of performance regardless of the capabilities of the attacker, and allows usage of a faster hashing algorithm by default. Hash maps should get faster by default as a result. This does not prevent having to consider the issue, since the fallback is costly and must be rare, but this is an easier problem than entirely securing the hash map.

Use a hash map without problematic blowup, or less affected by it

Java solves this problem by using a hash map with chaining and converting large buckets to tree maps. This mitigates the impact of degradation, but does not seem to allow using contiguous hash maps by default.

As far as I am aware, the blowup cannot be resolved by moving to another common form of open addressing, although quadratic probing would be significantly less affected by some of these attacks. Chaining alone also defeats the attacks given here, but still requires a secure hash and fails with attackers with more capabilities.

Use different seeds for each hash map

Pull requests #31356 (closed) and #33318 (merged) first proposed incrementing the thread local seed for each hash map. This was later removed when no threat model was seen, but would prevent both attacks listed here.

This still allows attacks when hash maps are reused.

Randomize iteration order

I am not sure how one would randomize iteration order efficiently. However, it should solve the problem unless hashes are exposed through other means.

Ignore the problem

Given that Rust went so far as to use SipHash, quadratic blowup on code as simple as fst.extend(snd) seems too extreme to ignore.

A-collections C-bug P-medium T-libs

Most helpful comment

We'd also have to mandate a sort in JSON serialization, which seems somewhat unreasonable.

The fact that simply merging two maps can run into sadness makes it seem to me like we have to do something about this.

All 97 comments

It seems to me like we should basically just revert #33318 for now and use different seeds per map. Ideally we could find a strategy that avoids key reuse but also doesn't regress hashmap creation performance too much, but as @briansmith noted in #33318 there are risks to using multiple correlated keys as well.

cc @rust-lang/libs

The threat model here is simple. The attacker is able to send JSON to the server. The server parses the JSON into a HashMap and through whatever means - an error message including the formatted map or explicit listing of the contents of the map - may reveal the order of the map.

Perhpas we can resolve this by just saying this is outside the threat model. And/or, the Debug implementation for a HashMap isn't performance-sensitive in most applications, and so it could be redone to always output the values in sorted order by key--i.e. it could make a copy of itself into a BTreeMap and then Debug::fmt the BTreeMap. This seems reasonable since either the HashMap is small or it is probably too gigantic to reasonably expect to format in the first place. (Note that HashMap doesn't implement Display, not should it.)

We'd also have to mandate a sort in JSON serialization, which seems somewhat unreasonable.

The fact that simply merging two maps can run into sadness makes it seem to me like we have to do something about this.

You can't always sort a hashmap because the keys aren't required to be Ord. This doesn't matter for the BTreeMap fallback because you can sort by hash code (note: this means it's required to use a cryptographic hash), but sorting by hash is exactly the opposite of what you want to do when you don't have that fallback.

A shuffle would do fine though I think.

@briansmith: Debug is just a trivial example of how the information might leak, though, not the actual issue.

Another example would be if a web service validates the JSON it's given key-by-key, by iterating over (key, value) pairs and delegating to a validation function.

  • If it returns the first error it encounters, the attacker can learn the order.
  • If it errors early, even without a descriptive message, the attacker can learn the order.
  • If it collects the errors and returns them without shuffling (or sorting), the attacker can learn the order.

It's a footgun.

We could also use Python's technique to preserve insertion order. It introduces a little more indirection, but the cost doesn't seem to be excessive.

This also makes hash map order deterministic, which is a nice usability bonus. Note that tombstones aren't required for determinism, but are required to preserve insertion order through deletions. Either way prevents the attacks given.

Revisiting #33318 sounds like the reasonable way out.

HashMap creation needs to be fast in the common case. I think we need a HashMap that defaults to FNV but rehashes with a random SIP key when it gets too much collisions.

TBH, I'm also curious how alternate approaches (a BTreeMap whose nodes are stored in an overallocated Vec, for similar allocation amortization as HashMap; a trie or crit-bit tree keyed on hashes with the same allocation style, etc) would perform.

Both would likely avoid the pathological merging behavior, and the amortized allocations may bring them closer to HashMap.

So the problem here is not hard collisions, but rather soft collisions?

@Veedrac It has some wins too, since the data that needs to be swapped by bucket stealing while inserting is smaller, and iteration is much faster since it's on a mostly compact space. But the indirection costs in lookup may not be acceptable at all. (I made a prototype that demonstrates exactly these differences with a +6% hit to lookup.)

Maybe if one moves the hash field to be next to the index, or even special case smaller than 2**32 maps so that they keep 32-bit hashes and 32-bit indices next to each other?

@bluss The new layout should allow you to lower the occupancy (perhaps cap it at 80%?) without paying much of a hit, since the key/value arrays can safely cap out at 100% - only the index array needs to grow. This should get you some speed back, as well as make pathological insertions a lot rarer.

@arielb1 Exactly; the first half of the table is ending up with significantly over 100% occupancy.

Some extra discussion is available in this Reddit thread, including back of the envelope analysis of the asymptotics and commentary on a similar attack mentioned in the SipHash paper.

the consistent order map prototype is here https://github.com/bluss/ordermap I actually began that _not_ because of this thread but due to curiosity about the new dict representation in Python 3.6.

Removal breaks the insert order property. Otherwise you need tombstones (ideas/forks for efficient way of doing that are welcome).

Conclusions: iteration can be so much faster than libstd HashMap. Growth too. Lookup is slightly slower (more than the 6% I said above if the map is large and out of cache). It solves the merge pathology.

That looks pretty awesome at first glance.

I don't think like we need to guarantee insertion order iteration, but it would be nice to make it iterate more obviously in not-insertion order - maybe alternate adding to the front and back of the iteration list or something like that? Probably not worth doing if it'd get complex.

However it looks in this issue, the current HashMap impl has lots of aspects in which its performance is great already. Don't think it's that easy to replace it.

Oh yeah, absolutely. It's just good to see that there are reasonable routes forward.

@bluss Safe, stable Rust and only 6% slower insertion normally? Amazing work.

The particular representation in ordermap maps well to safe rust (using vectors without holes). Making accurate benchmarks is tricky (there are so many different scenarios), I can only ask for help there, and my general summary is in the readme.

The libs team discussed this during triage yesterday and the conclusion was that while we'd definitely like to fix this it's not absolutely critical we do so immediately. We don't mind taking a bit of time to explore our options like @bluss's crate.

Tagging as P-high though to ensure that this comes up during triage. Our thinking though is to revisit this in 6 weeks at the latest.

I've posted a thread on internals about this as well.

Huh, I once considered collision attacks based on iteration order but decided they shouldn't be a problem, for some reason. Guess I was wrong.

But I don't understand this attack at all. What is the significance of the first and third quarter? The first quarter should represent the entries whose hashes have the high bits 00, and the third quarter 10. But if the problem is collisions when the table is smaller, as it is being built, then why would this matter, when the modulo takes bits starting at the _low_ end? In the first example, why is it a problem to insert in order of hash value, when in a smaller table that should be equivalent to cycling linearly through the buckets - which sounds more like an ideal load than a pathological one? With a simpler hasher, couldn't you reproduce that by simply inserting numeric keys in order? yet that is hardly unusual behavior, nor slow in any other common hash map.

Edit: This seems to be a regression. The first test case is slow on Rust 1.12.0, but not on some development version arourandom nd 1.8.0 that NixOS decided to install on my system. To be clear, this is what I ran:

use std::collections::HashMap;
fn main() {
    let first_map: HashMap<u64, _> = (0..900000).map(|i| (i, ())).collect();
    let second_map: HashMap<u64, _> = (900000..1800000).map(|i| (i, ())).collect();

    let mut merged = first_map;
    println!("-->");
    merged.extend(second_map);
    println!("{}", merged.len());
}

@comex Yes, it's a regression because in older rust, every hashmap is created with its own random key for SipHash. In current rust, all hashmaps in the same thread share the same key.

(Cloning a hashmap always creates another one with the same key, regardless).

@comex When in a hash map half the size, the first and second halves overlap. The first and third quarters will thus fill the first half of a half-size map up twice (well, ~1.8 times), and never put values in the second half.

Ah, I see. I was thinking that 2x overlap per bucket wasn't so bad, nothing like the original attack involving many keys with the same hash. But of course with open addressing, if the first half of the table is already mostly full, adding more entries that belong there will soon turn the first half into a single enormous chain, at which point later insertions will have to linear probe through almost the whole thing before finding a free slot in the second half.

Randomizing the seed - maybe re-randomizing on growth - would of course be enough to solve this if a secure hash function is being used. This could be optimized, as is being discussed in the internals thread, but with an adaptive hash table that only switches to a secure function upon being attacked, usage of seeds in the first place should be rare enough that just randomizing every time should be fine.

That is, if the attack detection lives up to its name and doesn't switch just because someone is trying to extend a HashMap with another one. It would also be nice if maps overridden to use a weak hasher (unconditionally) would both cope well with this case, and also at least try to deal with a lot of buckets with similar hash values being inserted.

Copying Python's insertion-order-preserving indirection would fix extend, but aside from wasting a bit of memory, it would also mandate an extra cache lookup per map lookup/insert compared to what I'd consider optimal: which is not what HashMap currently does, but that's a different topic. (In short, I think each bucket's hash, key, and value should definitely all be stored adjacently in at least some cases, if not all, depending on size and statistics I don't have yet.) That scheme is very nice if you want order, but in the long term, suboptimal for the common case. And once adopted, it would be very hard to go back on due to the risk of breaking code that expects the order property - whether it be officially guaranteed by the documentation, or not.

But there are simpler solutions. One is to permute the hash value based on the table size, so that items that are close together at one size will be randomly distributed at a different size. (Randomly enough, anyway, given that the cure for an intentional attacker is separate.) The best way to do that is probably to just include the table size in the data to be hashed for each key. Alternately, there could be an explicit post-hasher step. This could be as simple as a bit rotate, although that is problematic if the hasher is _really_ bad, e.g. hash(k: i32) = k. May not be worth caring about that use case. A more robust option is multiplying by some large number and modulo by a prime close to the maximum integer, which is at least almost a permutation... except that that requires u128 and may be very slow on some platforms... eh, better to go with putting it in the hashed data.

One is to permute the hash value based on the table size, so that items that are close together at one size will be randomly distributed at a different size.

That won't work if you can have multiple source dictionaries, since you can combine two halves of different dictionaries to make one the original size instead of taking two quarters from one dictionary.

In that case, the adaptive hash code should detect the excessive chain length and switch to a secure hash. If you think combining halves of separate maps (halves determined by iteration order, _not_ picking a subset of the keys) is likely to come up _outside_ of an attack scenario, which I suppose is possible, then the adaptive code can mostly deal with this by trying a few doublings with the insecure hash before giving up and using a secure one. It should probably do this anyway.

Edit: But I guess this means chain length detection should be present even on non-adaptive maps.

Oh, OK, I misread your original comment.

Aren't there plans to replace SipHash anyway?

Also, why not cuckoo hashing instead of robin hood? edit: links

basic theory web.stanford.edu/class/cs166/lectures/13/Small13.pdf
original paper (.pdf)
some concurrent work https://da-data.blogspot.nl/2013/03/optimistic-cuckoo-hashing-for.html

@Ofekmeister one just have to prove the Cuckoo benefits and open the PR.

Personally I think chucko hash will perform worse on average. There's this excellent paper with a comprehensive-ish analysis https://infosys.cs.uni-saarland.de/publications/p249-richter.pdf but they don't store the hash on the table and that's really a shame because doing so really unlocks the best optimizations for RH. I'm working on optimizing the current implementation, for instance https://github.com/rust-lang/rust/pull/36692

The libs team discussed this during triage yesterday and the conclusion was to avoid a rewrite for now and instead restart investigation of unique siphash keys per map. The discussion has brought up that the "plus one" strategy is unlikely to be insecure, but we could also try a strategy where we use isaac seeded at the beginning of a process to generate all further keys. This should still be fast and also provide a nice distribution of keys for hash maps with no correlation.

Ah we did think that it would be worthwhile to benchmark this, however, to see _how_ much slower using isaac would be vs just adding one.

Out of curiosity, would there be volunteers to test out these benchmarks and send a PR? I'd at least be willing to help out if any is needed of course!

Shouldn't it be a thread-local instead?

@arthurprs yes right now the keys are thread-local, and the rng/+1 strategy would likely continue to do the same

Anyone working on this? If not I can pick this up.

@arthurprs all yours!

Results are as follow:

test new_isaac_1::new                       ... bench:           6 ns/iter (+/- 0)
test new_isaac_1::new_insert                ... bench:          58 ns/iter (+/- 18)
test new_issac_2::new                       ... bench:          15 ns/iter (+/- 1)
test new_issac_2::new_insert                ... bench:          61 ns/iter (+/- 3)
test new_seq::new                           ... bench:           2 ns/iter (+/- 0)
test new_seq::new_insert                    ... bench:          50 ns/iter (+/- 5)

new_insert is new() + 1 insert()

The difference between _1 and _2 is that the first caches one of the seeds in thread local so on new() it only has to generate 1 u64 instead of 2. Also, I realized that this will hurt 32bit systems more as it takes 2x+ the instructions to generate each u64 seed.

Personally I'd go with the +1 and call it a day. There's no actual benefit from making it more complicated.

Ok, thanks for generating those benchmarks! Seems reasonable to me to go with the +1 strategy.

Cool, I'll make the PR later.

Maybe we should leave this open until we figure something that helps Custom Hashers too. Or maybe not, rust isn't obliged to guarantee security for those.

I think that a ran into a related problem, but I didn't get to the bottom of it. Today I was transforming one HashMap into another (with natural language words as keys), iterating over the keys of the first HashMap, inserting them in the second with different values. This was prohibitively slow (I just stopped). I tried to simulate this using a small program generating strings between length [1,10):

``` .rust
extern crate rand;

use std::collections::hash_set::HashSet;
use rand::Rng;

fn main() {
let mut rng = rand::thread_rng();

let mut one = HashSet::new();
for _ in 1..1000000 {
    let len = rng.gen_range(1,10);
    let s = rng.gen_ascii_chars().take(len).collect::<String>();
    one.insert(s);
}

let mut two = HashSet::new();
for v in one {
    two.insert(v);
}

}
```

The first loop is near-instanteneous, however the second loop is very slow (I didn't time it). Interestingly enough, it works fine when the random strings are of a fixed length.

I think not reusing random seeds would solve this case as well, since using FNV for the second HashMap makes it fast again in the original program.

@danieldk, yep, that looks like this issue. We are going to be adjusting the random seed per map, which should fix things for your use case: #37470

But what if change order of iteration just will fix this?

We can use just simplest lcg generator for walking order:

  // Pseudo code for iterator
  if (hash.size == 0) return
  i := 0
  do {
    if (slot_occupied(hash, i))
       yield hash.key[i]
    i = (i *5 + 1) & hash.mask
   } until (i != 0) /* returned to first position, so we walked all hash */

@funny-falcon that will certainly not defend against malicious actors.

@sfackler why? It certainly randomize iteration order, so robin-hood hashing will shine again.

It does not seem particularly difficult to take a list of entries returned in that order and reconstructing the internal storage order from that.

Ok, it doesn "randomize", but changes it in a way robin-hood hashing will benefit from. And it will work with any hasher function, even non-seeded.

@sfackler, ah I see : given output of iteration one may construct new input values in desired order to DoS the application.

Ok, lcg is very tunable, so we can show different order on every iteration:

  // Pseudo code for iterator
  if (hash.size == 0) return
  ptis = ptis * 5 + 1 //per_thread_iteration_seed
  delta := ptis * 2 + 1
  mult := ((ptis ^ (ptis>>16)) * 8) | 5
  i := 0
  do {
    if (slot_occupied(hash, i))
       yield hash.key[i]
    i = (i * mult + delta) & hash.mask
   } until (i != 0)

@arthurprs point s on performance of iteration, and I agree that cache misses on huge hashtable will destroy performance. So, my suggestion is not acceptable.

I'm not going to read everything up there ---^ but I want to make sure people are aware of the limits of the "random seed per HashMap" approach:

If you clone a HashMap then will both HashMaps have equivalent RandomState? It seems if every HashMap needs a different RandomState then Clone should modify the RandomState. See the Twitter thread: https://twitter.com/BRIAN_____/status/801609167559458816

Also, consider the following attack:

  1. Force the applications to insert (large) N elements into a HashMap.
  2. Force the application to expose the iteration order for those N elements, and save that.
  3. Force the application to clear the HashMap and shrink_to_fit(). (I have seen a few applications auto-shrink-to-fit on clearing a HashMap, and there's even libraries to help automate this, so this is pretty realistic.)
  4. Re-insert all in same HT in same order.

It seems like any operation that shrinks a HashMap's storage would need to rekey the HashMap to prevent such an attack.

Similarly, I think if you can force an application to insert (large) N elements into a HashMap, drain the HashMap, and then forget() the drain, then attempting re-inserting the N elements again may also trigger the quadratic behavior.

It's likely there are more variants on this theme. It's not clear where people are drawing the line.

To add to what @briansmith just said, you can skip the shrink_to_fit step if you allow filling the table twice before the attack. If you get two equal-length ordered listings from the same hash table, and insert the first half of each into the original hash table, you get the O(n²) behaviour back.

I wasn't actually aware that Clone preserved the hash seed. That makes the problem much easier to trigger. Consider a minor variation of @danieldk's version of the bug, where one is cloned early.

let one = HashMap::new();
let two = one.clone();

fill(one);
for v in one {
    two.insert(v);
}

So, may be just decrease fillfactor? and use Cuckoo Hash? Or even quadratic probing?

It doesn't pay to "be single language in a world which has open-addressing hash tables with 90% fill rate".

lowering fill factor 0.857 results in 12~18% speedup in grow benchmarks (insert N keys into a 0 capacity map), not bad.

I've been thinking about a possible solution, while keeping the current Robin Hood hashing scheme.

Others have commented about detecting the excessive chain length and switching to a different hash or hash seed. The idea I had is instead: detect the excessive chain length and grow the table early, that is, temporarily lower the fill factor.

My proposal depends on the following reasonable assumptions:

  • The hash function is uniformly distributed;
  • The hash seed is unknown outside the HashMap, meaning that while the iteration order can be observed, the actual hash values can't.

I haven't done the math, but my thinking is: in the scenario where you insert a sequence of keys in iteration order, you will have long sequences of elements with no hole where the insertion can stop. If the hash function is uniformly distributed, doubling the table size should open up holes uniformly distributed over that long sentence, allowing insertions to go quickly until the "leftmost" holes are filled up again. When that happens, the algorithm can double the table size again, opening up new holes. The extra growth will probably end when the table is approximately the size it would have if the keys were inserted in a random order.

I believe that, by choosing the correct average insertion length before the table is grown early, it should be possible to limit the insertion time to no more than O(n log n), while keeping the final table size to no more than 2x what it would be without the early growth, and with no effect (other than extra math on insertion) on non-pathological cases. That should be similar to how introsort switches from quicksort to heapsort when it detects a pathological case.

@cesarb: @pczarn has something very similar here https://github.com/contain-rs/hashmap2/pull/5 but it changes the Hasher instead of early growing as you propose. I think it could work. Edit: there's the RFC as well https://github.com/rust-lang/rfcs/pull/1796

I don't know any 100% safe hashmap, you can degenerate Cuckoo or Hopscotch if you know enough. Every type needs adaptations to work around the worst case.

Still: decrease fill rate. It common memory/cpu tradeoff, and I doubt current cpu slowdown costs current memory gain.

Ok, I'm just a troll, you may ignore me.

@funny-falcon you can always discuss it in this issue #38003

I'm not sure if shifting (using the high bits) or masking (using the low bits) of the hash is used to get the initial probe index, but one option would be to use the last bit from the other end of the hash for the direction the linear probing takes, which wouldn't hurt locality, and would add symmetry to the algorithm. It's the asymmetry here that causes long probe sequences before the load gets high enough to trigger a resize. The downside is that the shift-out logic for removing an item becomes more complicated.

On a side note, has anyone checked the performance impact of making the linear probe sequence wrap around at L1 cache, L2 cache, and memory page boundaries?

@kmag I could be mistaken, but I think Robin-Hood hashing relies on asymmetry.

What if position depends both on hash and capacity?

   position = mix(hash(key), capacity) & (capacity - 1);

Mix could simple multiplicative:

    mix(h, capa) {
      t = ((h^capa)*0xf123456789abcd);
      return t ^ (t >> 32);
    }

Or even better: if we store logarithm of capacity instead of capacity, then we may simply rotate hashsum:

    position = rot_right(hash(key), logcapa) & ((1<<logcapa) - 1)
    // Note: it is important to rotate right here
    // So next capacity has two new high bits (instead of one low bit)

(but with ratation, there no way to use "bad" hash sums, as identity for integers, for example)

Proposed fix in #38368

Can this be closed?

Seems so!

I'm not convinced this should be closed. The problem is a lot less bad than it was, but IIUC we still expose iteration order and this can still cause O(n²) behaviour. We should make sure we're convinced that we're happy with the risks as they stand before closing, and I haven't seen an official statement as such.

Ah, okay. I wasn't aware that O(n2) behavior could still occur. I'll reopen in that case.

I don't think the situation can be improved further without a substantial trade-off (all changes so far had essentially no cost in neither memory nor runtime) but I'm curious what other people think.

OrderMap hides iteration order and seems to be faster in most scenarios than std's HashMap according to other people. It's not totally without drawbacks, but it seems worth consideration.

Then you get into the substantial trade-off part, it's faster for iterations/inserts/deletes but you pay consuming more memory and slower lookup.

The Github page says

Lookup is faster than libstd HashMap for "small" tables (below something like 100 000 key-value pairs), but suffers under load more due to the index vec to entries vec indirection still.

Memory should be less for large elements, too, so it's not clear-cut there either.

That's not true anymore, I think I removed most of the overhead of the std hashmap bucket interface. In the current state of things it's not uncommon to have order-map consuming _(see comment bellow)_ the memory.

_unif can lookup any key from the key set and the _non_ _unif accesses only 10% of the key set (zipf like).

test ord_::lookup_100_000             ... bench:     356,040 ns/iter (+/- 25,036)
test ord_::lookup_100_000_unif        ... bench:     363,378 ns/iter (+/- 480,153)
test ord_::lookup_1_000_000           ... bench:  13,376,270 ns/iter (+/- 1,429,406)
test ord_::lookup_1_000_000_unif      ... bench:  13,502,113 ns/iter (+/- 1,349,924)
test std_::lookup_100_000             ... bench:     344,022 ns/iter (+/- 20,305)
test std_::lookup_100_000_unif        ... bench:     346,890 ns/iter (+/- 150,954)
test std_::lookup_1_000_000           ... bench:   8,534,857 ns/iter (+/- 592,409)
test std_::lookup_1_000_000_unif      ... bench:   8,731,387 ns/iter (+/- 893,121)

--- edit: math is hard

Ok 2x memory is totally off, I think it's more like
y = ( 8 / (0.75 * 0.75) + (8 + x) / (1.0 * 0.75) ) / ( (8 + x) / (0.91 * 0.75) )

Which is ~11% for a (K, V) of 40 bytes.

While I was at it and for the sake of completeness

std -> ordermap

➜  hashmap2 git:(adapt) ✗ cargo benchcmp std_ ord_ bench.txt
 name                          std_ ns/iter  ord_ ns/iter  diff ns/iter   diff % 
 ::grow_100_000                8,950,163     6,774,414       -2,175,749  -24.31% 
 ::grow_10_000                 731,130       475,714           -255,416  -34.93% 
 ::grow_big_value_10_000       1,644,737     962,251           -682,486  -41.50% 
 ::insert_100_000              4,358,280     4,104,093         -254,187   -5.83% 
 ::insert_10_000               269,920       283,011             13,091    4.85% 
 ::insert_int_bigvalue_10_000  701,788       581,711           -120,077  -17.11% 
 ::insert_str_10_000           274,906       288,666             13,760    5.01% 
 ::insert_string_10_000        746,560       683,109            -63,451   -8.50% 
 ::iterate_100_000             386,084       67,899            -318,185  -82.41% 
 ::lookup_100_000              368,827       376,037              7,210    1.95% 
 ::lookup_100_000_unif         445,130       521,884             76,754   17.24% 
 ::lookup_1_000_000            8,457,146     12,819,619       4,362,473   51.58% 
 ::lookup_1_000_000_unif       8,761,012     13,058,826       4,297,814   49.06% 

86% load factor std lib -> ordmap

➜  hashmap2 git:(adapt) ✗ cargo benchcmp s86_ ord_ bench.txt
 name                          s86_ ns/iter  ord_ ns/iter  diff ns/iter   diff % 
 ::grow_100_000                7,975,130     6,774,414       -1,200,716  -15.06% 
 ::grow_10_000                 597,496       475,714           -121,782  -20.38% 
 ::grow_big_value_10_000       1,453,406     962,251           -491,155  -33.79% 
 ::insert_100_000              4,420,082     4,104,093         -315,989   -7.15% 
 ::insert_10_000               273,364       283,011              9,647    3.53% 
 ::insert_int_bigvalue_10_000  702,656       581,711           -120,945  -17.21% 
 ::insert_str_10_000           279,262       288,666              9,404    3.37% 
 ::insert_string_10_000        735,912       683,109            -52,803   -7.18% 
 ::iterate_100_000             394,462       67,899            -326,563  -82.79% 
 ::lookup_100_000              366,745       376,037              9,292    2.53% 
 ::lookup_100_000_unif         434,933       521,884             86,951   19.99% 
 ::lookup_1_000_000            8,465,497     12,819,619       4,354,122   51.43% 
 ::lookup_1_000_000_unif       8,668,845     13,058,826       4,389,981   50.64% 

stdlib -> 86% load factor std lib

➜  hashmap2 git:(adapt) ✗ cargo benchcmp std_ s86_ bench.txt
 name                          std_ ns/iter  s86_ ns/iter  diff ns/iter   diff % 
 ::grow_100_000                8,950,163     7,975,130         -975,033  -10.89% 
 ::grow_10_000                 731,130       597,496           -133,634  -18.28% 
 ::grow_big_value_10_000       1,644,737     1,453,406         -191,331  -11.63% 
 ::insert_100_000              4,358,280     4,420,082           61,802    1.42% 
 ::insert_10_000               269,920       273,364              3,444    1.28% 
 ::insert_int_bigvalue_10_000  701,788       702,656                868    0.12% 
 ::insert_str_10_000           274,906       279,262              4,356    1.58% 
 ::insert_string_10_000        746,560       735,912            -10,648   -1.43% 
 ::iterate_100_000             386,084       394,462              8,378    2.17% 
 ::lookup_100_000              368,827       366,745             -2,082   -0.56% 
 ::lookup_100_000_unif         445,130       434,933            -10,197   -2.29% 
 ::lookup_1_000_000            8,457,146     8,465,497            8,351    0.10% 
 ::lookup_1_000_000_unif       8,761,012     8,668,845          -92,167   -1.05% 

code: https://github.com/arthurprs/hashmap2/blob/adapt/benches/bench.rs

Can you spell out the memory calculations more explicitly?

I'm not sure I got that 100% right but here it goes:

Assume you're calculating the average case (between resizes, relative load factor is 1 when full and 0.5 after resize). For that multiply the actual factors by Z = (1+0.5)/2 = 0.75.

Ordmap =(size of index / (indexes_load * Z)) + (size of pair+size of hash) / (pairs_load * Z) = 8 / (0.75 * 0.75) + (8 + x) / (1.0 * 0.75)
Std = (size of hash+ size of pair) / (load factor * Z) = (8 + x) / (0.91 * 0.75)

Thanks

Any progress on this? Do we have a plan for fixing the DoS unsafety?

There's a bit of history in this thread, so I'll try to summarize the current state bellow.

https://github.com/rust-lang/rust/pull/38368 improved things immensely by detecting degenerate probe lengths and/or forward shifting on insert. There's also a check for minimum occupancy in order to trigger the early resize (so a runtime attack don't become a memory attack). This is somewhat similar to Java's solution. They convert the linked-list collision chain into a balanced-tree when the length crosses a threshold.

So there's still a small area that an attacker can exploit, by crafting inputs that cause extra work below those limits. That's really small though and I didn't test if one can actually exploit it.

Edit: Also, default hasher gives each hashmap a different seed (clone() retains the seed though), but that's orthogonal to the above.

Here's a general solution to the problem. after every rehash or clone, reseed the hasher. It will regress the performance of rehash but it will solve the O(n²) problem. Luckily in production code (as a whole) lookups dominate insertions by a large margin so the tradeoff is actually sound.

Note: if you end up going this route it might no longer make sense to store the full hashes in the hashtable as they are not going to help when rehashing (they are still going to save comparisons but it is likely that the costs outweigh the benefits). You can instead store the distance from the ideal position in a u8. This will reduce the amount of memory touched until finding the element or terminating the search.

@alkis

it might no longer make sense to store the full hashes ...
they are still going to save comparisons

Storing of 8bits of hash is enough to save comparisons. Actually, I even stored 7bits with great result (using 8bit for other purpose).
If this "hashik" is always non-zero (ie calculated as hshik = hash % 251 + 1), then it could role of "presence" flag.
(Module by a constant is optimized by compiler into two multiplications).

~That's not general as it requires support from the hasher, most just initialize to the same seed.~ Edit: can be made general by hashing the table address as well https://github.com/rust-lang/rust/issues/36481#issuecomment-338910908

I don't think that's a reasonable tradeoff for a stdlib hasher, growing and cloning the hashmap becomes much much slower. As the slot moving is essentially randomized it'll require a lot of random access and Robin hood adjustments as things go into the new table. The fact that sip isn't the fastest hasher around just makes it even worse.

@funny-falcon if we store 8-bits of hash for comparisons then we can't bail out early from probe chains based on distance from ideal position. It basically voids the Robin Hood. Or perhaps I misunderstood your suggestion.

@arthurprs yes either BuildHasher or Hasher traits need an additional requirement. BuildHasher can support it by requiring Default and Hasher by allowing setting the seed. You are right there is going to be a performance hit - I already mentioned it "it regresses the performance of rehash". I posit there is no win-win solution here. You either take performance hit by shuffling the order on rehash or you have O(n²) on insertion. As already mentioned, in the global scheme of things this does not matter: insertions happen orders of magnitude less often than lookups. Also a lot of the regression can be mitigated by using reserve().

@alkis , I mean, store both 8bits of hash + 8bits distance. So, 16 bits in total.
Storing just 16bits of hash is less meaningful, cause in big table they all will be equal for same position.
(that is another reason why "hashik" is modulo prime: for same table position there will be 251 different "hashik").

@funny-falcon This makes sense :+1:

That's not reasonable, it's too easy to trip on that land-mine.

The most reasonable trade-off I've seen thus far is the two array map (OrderMap), that "fixes" the problem by making the degenerate case cheap to a point it can't really be exploited.

@arthurprs there is a way to "reseed" without requiring support from either BuildHasher or Hasher and placing no extra requirements to the user. Instead of hashing the key like this:

let mut state = hash_state.build_hasher();
t.hash(&mut state);
t.finish()

use the pointer to the buckets as additional bytes (this becomes a naturally occurring seed that changes at all the right places):

let mut state = hash_state.build_hasher();
state.write_usize(table.hash_start as usize);
t.hash(&mut state);
t.finish()

EDIT: The disadvantage to this solution is that if hash(usize) + hash(T) is slower than set_seed(seed) + hash(T) we get a suboptimal implementation. If there was an API in BuildHasher to create a Hasher with a specific seed then this could be done more efficiently across all hashers. This can be done by changing BuildHasher::build_hasher to take an additional argument. I am not sure how feasible this is at this point.

@alkis neat! That makes it general.

Any takers to implement this approach and close this bug once and forall? :-)

One has to provide proof that the performance hit isn't huge. I'd not be surprised if this fix yields a ~500% hit on grow/clone and a 20+% hit on lookup when using sip13.

There is even no need to calculate new hash value each time.
(Sorry, I don't know rust well, so I'll try to copy from above)

// key hash calculated once and stored
let mut state = hash_state.build_hasher();
t.hash(&mut state);
hashvalue = SafeHash.new(state.finish());
// position calculated dependent on storage pointer
// (example for 64bit)
poshash = hashvalue + (table.hash_start as usize);
poshash ^= poshash >> 32
poshash *= 0xdeadcafe71febeef; // or some other "well known" odd number (from Murmur, CityHash, etc)
poshash ^= poshash >> 32;

It will be enough for fixing bad case, and there is no need in recalculating SipHash.

@funny-falcon I do not think this will be very good. The pointer to hash_start has low entropy and the difference between one hash_starrt and another can be very small. I think mixing this way will weaken the composite hashfunction. Let's see how it works without this optimization first and then decide if this tradeoff is worth it. Perhaps some hash function experts can weigh in as well.

@alkis, just try. It will be just enough to fix bad case. There is no need to be better.

I think mixing this way will weaken the composite hashfunction

No way. hashvalue calculated with SipHash, and poshash is calculated in bijective way (ie it is revertible).

Multiplying by a "random looking" constant and shifting provides really good results. This is a well-known technique (Knuth multiplicative hashing).

Simulation (in C):

#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>

uint64_t poshash(uint64_t hv, uint64_t hs) {
    uint64_t ph;
    ph = hv + hs;
    ph ^= ph >> 32;
    ph *= 0xdeadcafe71febeeful;
    ph ^= ph >> 32;
    return ph;
}

int main(int ac, char** av) {
    int i, j;
    uint64_t hv1, hv2, hv3, hashstart;
    uint64_t ph1, ph2, ph3;
    uint64_t masks[3] = {0xffff, 0x7fff, 0x3fff};

    hv1 = 0x123456789abcdef0ul;
    hv2 = 0x98523aed81fcdef0ul;
    hv3 = 0x3fa83924dcbcdef0ul;
    hashstart = 0x00007fe342e50000ul;
    printf("\thv1\thv2\thv3\n");
    for (j=0; j<3; j++) {
        uint64_t mask = masks[j];
        printf("%lx\t%lx\t%lx\t%lx\n",
            mask, hv1 & mask, hv2 & mask, hv3 & mask);
    }
    for (i=0; i<5; i++) {
        uint64_t ph1, ph2, ph3;
        ph1 = poshash(hv1, hashstart);
        ph2 = poshash(hv2, hashstart);
        ph3 = poshash(hv3, hashstart);
        printf("\t----\t----\t----\n");
        for (j=0; j<3; j++) {
            uint64_t mask = masks[j];
            printf("%lx\t%04lx\t%04lx\t%04lx\n",
                mask, ph1 & mask, ph2 & mask, ph3 & mask);
        }
        hashstart += 0x100ul; /* next allocation */
    }
}

results:

mask | hv1 | hv2 | hv3
-----|-------|-------|------
ffff | def0 | def0 | def0
7fff | 5ef0 | 5ef0 | 5ef0
3fff | 1ef0 | 1ef0 | 1ef0
-----|-------|-------|------
ffff | 8b86 | f7a3 | ae2e
7fff | 0b86 | 77a3 | 2e2e
3fff | 0b86 | 37a3 | 2e2e
-----|-------|-------|------
ffff | fd87 | 41a6 | 202d
7fff | 7d87 | 41a6 | 202d
3fff | 3d87 | 01a6 | 202d
-----|-------|-------|------
ffff | 6f85 | d3a5 | 522c
7fff | 6f85 | 53a5 | 522c
3fff | 2f85 | 13a5 | 122c
-----|-------|-------|------
ffff | 518c | 1ddf | c42a
7fff | 518c | 1ddf | 442a
3fff | 118c | 1ddf | 042a
-----|-------|-------|------
ffff | c38d | afde | 7629
7fff | 438d | 2fde | 7629
3fff | 038d | 2fde | 3629

https://en.wikipedia.org/wiki/Universal_hashing#Constructions

It is possible to use more sofisticated "multiply-add-shift":

ph = hashvalue + (table.hash_start as usize);
ph ^= ph >> 32;
ph *= 0xacd5ad43274593b9;
ph += 0x6956ab76ed268a3d;
ph ^= ph >> 32;

But really there is no need for the usecase. Just "multiply-shift" is more than enough.

Now that https://github.com/rust-lang/rust/pull/58623 is merged, the HashMap implementation uses quadratic probing instead of linear or Robin-Hood. Is this bug still open?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nikomatsakis picture nikomatsakis  Â·  236Comments

nikomatsakis picture nikomatsakis  Â·  210Comments

nikomatsakis picture nikomatsakis  Â·  268Comments

Leo1003 picture Leo1003  Â·  898Comments

Mark-Simulacrum picture Mark-Simulacrum  Â·  681Comments