Rescript-compiler: Hashtbl.hash ignores fractional parts of numbers

Created on 15 Jan 2018  路  6Comments  路  Source: rescript-lang/rescript-compiler

Js.log(Hashtbl.hash(1.1));
Js.log(Hashtbl.hash(1.2));
Js.log(Hashtbl.hash(1.3));
Js.log(Hashtbl.hash(2.1));
Js.log(Hashtbl.hash(2.2));
Js.log(Hashtbl.hash(2.3));
Js.log(Hashtbl.hash(3.1));
Js.log(Hashtbl.hash(3.2));
Js.log(Hashtbl.hash(3.3));

output -

-190020389
-190020389
648017920
648017920
648017920
-1994976299
-1994976299
-1994976299

(works fine in utop/rtop)

a little spelunking showed that caml_hash purposely removes the fractional part https://github.com/BuckleScript/bucklescript/blob/8ab4045d6be037d20d0319444ab10e971f80b8b5/lib/js/caml_hash.js#L52

to reproduce - https://reasonml.github.io/en/try.html?reason=FIZwdANg9g5gFACQIYgBYBcBGEypauARjEIEpSBuAKFElkXyxzzSLACZzrbp5k0mufGwDMXGuF4MB2Ia3YlxPevwyyWBBZ0oS6fRuuEKxO5fpnNhIxackqDl1te3c75tY4LWT1IA

discussion

Most helpful comment

I see and understand the reasoning. It did catch me by surprise tho. Should this possibly be documented, and/or could there be a compile/runtime warning, if only in dev?

All 6 comments

this is actually intentional, in very rare cases, you would make float as your key, since float equivalence is hard, using float as key makes your code fragile, you probably want to provide a customized equal and hash function

Check this out in rtop:

Hashtbl.hash(3. *. 1.1); /* 121349504 */
Hashtbl.hash(3.3); /* 680861864 */

The Hashtbl.hash documentation says:

It is guaranteed that if x = y or Pervasives.compare x y = 0, then hash x = hash y.

Interestingly, BuckleScript is _not_ breaking this guarantee. So, in BuckleScript, hash(1.1) == hash(1.1).

The documentation does not guarantee, however, that other input values may not generate the same hash value. (And it obviously can't, because hash collisions are part and parcel of hashing.) It's just that BuckleScript is deliberately colliding hashes of floats with the same whole number portion.

The doc does mention something else that's interesting:

A hash table that is created with ~random:false uses a fixed hash function (Hashtbl.hash) to distribute keys among buckets. As a consequence, collisions between keys happen deterministically. In Web-facing applications or other security-sensitive applications, the deterministic collision patterns can be exploited by a malicious user to create a denial-of-service attack: the attacker sends input crafted to create many collisions in the table, slowing the application down.

(It goes on to explain that setting random to true is more secure against this kind of attack.)

Moral of the story, don't use Hashtbl.hash for user-facing stuff.

it's not a bug, it is just a low quality (depends on your view) hash function

I see and understand the reasoning. It did catch me by surprise tho. Should this possibly be documented, and/or could there be a compile/runtime warning, if only in dev?

closed as not too much can be done on our side

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TheSpyder picture TheSpyder  路  5Comments

chenglou picture chenglou  路  4Comments

glennsl picture glennsl  路  3Comments

paparga picture paparga  路  5Comments

alexfedoseev picture alexfedoseev  路  5Comments