Js.log(Hashtbl.hash(1.1));
Js.log(Hashtbl.hash(1.2));
Js.log(Hashtbl.hash(1.3));
Js.log(Hashtbl.hash(2.1));
Js.log(Hashtbl.hash(2.2));
Js.log(Hashtbl.hash(2.3));
Js.log(Hashtbl.hash(3.1));
Js.log(Hashtbl.hash(3.2));
Js.log(Hashtbl.hash(3.3));
output -
-190020389
-190020389
648017920
648017920
648017920
-1994976299
-1994976299
-1994976299
(works fine in utop/rtop)
a little spelunking showed that caml_hash purposely removes the fractional part https://github.com/BuckleScript/bucklescript/blob/8ab4045d6be037d20d0319444ab10e971f80b8b5/lib/js/caml_hash.js#L52
this is actually intentional, in very rare cases, you would make float as your key, since float equivalence is hard, using float as key makes your code fragile, you probably want to provide a customized equal and hash function
Check this out in rtop:
Hashtbl.hash(3. *. 1.1); /* 121349504 */
Hashtbl.hash(3.3); /* 680861864 */
The Hashtbl.hash documentation says:
It is guaranteed that if
x = yorPervasives.compare x y = 0, thenhash x = hash y.
Interestingly, BuckleScript is _not_ breaking this guarantee. So, in BuckleScript, hash(1.1) == hash(1.1).
The documentation does not guarantee, however, that other input values may not generate the same hash value. (And it obviously can't, because hash collisions are part and parcel of hashing.) It's just that BuckleScript is deliberately colliding hashes of floats with the same whole number portion.
The doc does mention something else that's interesting:
A hash table that is created with
~random:falseuses a fixed hash function (Hashtbl.hash) to distribute keys among buckets. As a consequence, collisions between keys happen deterministically. In Web-facing applications or other security-sensitive applications, the deterministic collision patterns can be exploited by a malicious user to create a denial-of-service attack: the attacker sends input crafted to create many collisions in the table, slowing the application down.
(It goes on to explain that setting random to true is more secure against this kind of attack.)
Moral of the story, don't use Hashtbl.hash for user-facing stuff.
it's not a bug, it is just a low quality (depends on your view) hash function
I see and understand the reasoning. It did catch me by surprise tho. Should this possibly be documented, and/or could there be a compile/runtime warning, if only in dev?
closed as not too much can be done on our side
Most helpful comment
I see and understand the reasoning. It did catch me by surprise tho. Should this possibly be documented, and/or could there be a compile/runtime warning, if only in dev?