Crystal: Hashes mishandle NaN keys

Created on 21 Jan 2017 · 17Comments · Source: crystal-lang/crystal

Hashes (and Sets) fail to find NaN when performing lookup.

h = {Float64::NAN => "value"}  # {nan.0 => "value"}
# h[Float64::NAN]              # Missing hash key: nan.0
h.delete(Float64::NAN)         # nil
h                              # {nan.0 => "value"}

Presumably, the lookup code is confused by NaN == NaN being false.

discussion

Source

ezrast

Most helpful comment

I'll add that it's easy for me to think of programs where I'd compute a key for a hash, and that key would turn out to be NaN for valid reasons (which is to say "not due to bugs").

But if that's the case, I'd almost certainly want to handle those NaNs via some special processing. Maybe I'd map every one of those NaN-keys to a single constant-value key, so all of those cases would map to a single key-value pair. Or maybe I'd do something to store why the key ended up as NaN. But I wouldn't want to store each key-value pair as a separate entry in the hash, where all those entries would be unreachable via standard means.

drosehn on 13 Jun 2017

👍3

All 17 comments

This by itself is not really a bug (Float64::NAN != Float64::NAN, http://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values).

The real bug is that h.delete_if &.nan? will not function properly, because delete_if uses delete with the key, which obviously doesn't work. h.select &.nan? works properly.

Basically, either delete_if on Hash should be modified to not use delete, or it should have a special NaN case, or Crystal should not follow the IEEE 754 standard, which is the least ideal option.

zatherz on 21 Jan 2017

You don't need to break the standard, you just need some means of comparing Set/Hash keys for identity that is distinct from the standards-conformant equality operator. I think it's unreasonable not to consider Set{Float64::NAN}.includes?(Float64::NAN) == false as a bug.

Ruby Hashes, for example, determine identity through the #eql? method (see Hash Keys and Object#eql?). That's clearly not the whole story since Float::NAN.eql? Float::NAN is false over there (I think there's another identity check that happens at the C level before falling back to #eql?) but it shows you what I mean.

ezrast on 22 Jan 2017

== is implemented as an overloadable operator in both Ruby and Crystal.

class Test
  def ==(other : Test)
    true
  end
en

Test.new == Test.new # => true
Test.new != Test.new # => false

class Test2
end

Test2.new == Test2.new # => false
Test2.new != Test2.new # => true

The standard requires NaN to not equal itself.

This is the implementation of Ruby's Hash#delete:

VALUE
rb_hash_delete_entry(VALUE hash, VALUE key)
{
    st_data_t ktmp = (st_data_t)key, val;

    if (!RHASH(hash)->ntbl) {
  return Qundef;
    }
    else if (RHASH_ITER_LEV(hash) > 0 &&
       (st_delete_safe(RHASH(hash)->ntbl, &ktmp, &val, (st_data_t)Qundef))) {
  FL_SET(hash, HASH_DELETED);
  return (VALUE)val;
    }
    else if (st_delete(RHASH(hash)->ntbl, &ktmp, &val)) {
  return (VALUE)val;
    }
    else {
  return Qundef;
    }
}
// some comments omitted here
VALUE
rb_hash_delete(VALUE hash, VALUE key)
{
    VALUE deleted_value = rb_hash_delete_entry(hash, key);

    if (deleted_value != Qundef) { /* likely pass */
  return deleted_value;
    }
    else {
  return Qnil;
    }
}

st_delete_safe calls st_general_delete which calls find_entry which uses PTR_EQUAL which uses EQUAL which looks like this:

#define EQUAL(tab,x,y) ((x) == (y) || (*(tab)->type->compare)((x),(y)) == 0)

This uses some sort of special function called compare which I assume is what causes NaN to equal NaN but only in that specific situation.

There is also a deeper problem: the bucket_index method of Hash which is used by delete overflows with the hash of Float64::NAN (9221120237041090560). It works if you convert the to_u32 and to_i calls to to_u64 and to_i64, for which I'm gonna submit a pull request tomorrow.

From that point it'll take just adding a special case for NaN or using some special method with that special case.

zatherz on 22 Jan 2017

The x and y passed into that EQUAL macro are object ID's, not the actual contents of any variable. Since individual numeric values always have the same object ID throughout the lifetime of a Ruby program, the (x) == (y) part will always be true when x and y refer to the same object or the same number, even if that "number" is NAN. Any equality or hashing methods defined in Ruby space (which eventually get called by that compare function, I believe) are irrelevant at that point. If NAN were a special case you couldn't get away with this: [edit: this is misleading; see MaxLap's comment below]

class Evil
  def ==(other); false; end
  def eql?(other); false; end
  def equal?(other); false; end
  def hash; rand; end
end

e = Evil.new
h = {}
h[e] = "one"
h[e] = "two"  # this looks like it should create a second entry in the hash table
# but ruby handles it properly
puts h  # prints {#<Evil:0x00561482427f38>=>"two"}
h.delete e
puts h  # prints {}

I have no idea if any of this is relevant to how things could be implemented in Crystal, but there it is.

Sorry if my issue was unclear initially. I'm aware of IEEE 754 and didn't mean to suggest that something was wrong with your float implementation. I just wanted to point out some surprising behavior I noticed while teaching myself Crystal.

ezrast on 22 Jan 2017

You're right. That makes a lot of sense.

class Evil
  def ==(other); false; end
  def eql?(other); false; end
  def equal?(other); false; end
  def hash; rand; end
end

e = Evil.new
h = {} of Evil => String
h[e] = "one"
h[e] = "two"
puts h
h.delete e
puts h

Output:

{#<Evil:0xac8fc0> => "one", #<Evil:0xac8fc0> => "two"}
{#<Evil:0xac8fc0> => "one", #<Evil:0xac8fc0> => "two"}

zatherz on 22 Jan 2017

Ruby's handling of your Evil is because of 2 things:

rand returns a 0 to 0.999... float, which is turned to an int, so 0. Meaning it always land in the same bucket. If you add a *1000 to your rand, you will get 2 entries in the hash (most of the time).
Then, ruby does a pointer comparison between the objects before even calling eql? (ruby doesn't call ==)

I'll just add that because 0 and 0.0 have a hash collision, there is the following problem:

puts({ 0 => 1, 0.0 => 1000, 1.0 => 10, 1 => 100 }) #=> {0 => 1000, 1.0 => 10, 1 => 100}

Notice that the 0.0 key is missing, and it's value was assigned to the 0 key. This is the same problem as the NAN not matching, caused by using the == operator. In this case, it's because 0 == 0.0, which is not something that you want in a hash key context!

There needs to be a special method dedicated to hash equality. In ruby, this is the terribly named #eql? method. Please use a more explicit name like hash_same_key? or hash_equal_key?

MaxLap on 22 Jan 2017

👍1

I'm not quite sure I understand the issue here, but my initial reaction is that crystal is handing NaN the way it should be handled, even when it used as a hash keys. The language has to treat every instance of NaN as if it is a different value than all other NaNs, unless it were to keep track of how each NaN was generated. (and that could get pretty painful...)

drosehn on 23 Jan 2017

My original issue is that it is possible to add a key to a hash table that cannot be retrieved through normal means. That seems antithetical to the purposes of Hashes and Sets. That Set{x}.includes?(x) can ever be false strikes me as a least-surprise violation and I don't see the use case for that behavior.

I do see from the discussion here and in chat that I held some incorrect presumptions about how NaN behaves in general and in Ruby, so if this is deemed a non-issue I won't object further. zatherz's and MaxLap's criticisms of the current implementation are still valid, of course.

There's some interesting reading on the issue here, which does support the current paradigm (tl;dr Go buys fully into the practice of making NaN keys irretrievable and hashes them randomly).

ezrast on 24 Jan 2017

👍1

If we accept that NaN should never be equal to itself, then shouldn't the correct behaviour for NaN's hashcode be to raise? There's no way to create a valid hashcode if it is invalid to compare NaN values, so why not raise? There as already some precedent for something like this in that NaN and Infinity raise when trying to serialize to JSON.

RX14 on 24 Jan 2017

👍3

Btw, I meant to add that I agree with @zatherz on this:

The real bug is that h.delete_if &.nan? will not function properly, because delete_if uses delete with the key, which obviously doesn't work. h.select &.nan? works properly.

And that given these three options:

Basically, either (1) delete_if on Hash should be modified to not use delete, or (2) it should have a special NaN case, or (3) Crystal should not follow the IEEE 754 standard, which is the least ideal option.

I would not want to see #3. I was also going to suggest that it might be good to document this issue with NaN keys in Hash#delete_if. However, I do like the idea of raise-ing an error if NaN is used as a key in a Hash. That will make the issue explicitly clear to the programmer at compile time, and then the programmer can decide exactly how it should be handled in their specific situation.

drosehn on 24 Jan 2017

I like the option of raising on NaN hashcode, but isn't there an assumption that hash functions should not raise? I think I'd prefer to go go's way (no pun intended) and have a random hash for NaN, and "fix" delete_if.

spalladino on 11 Feb 2017

Looks like we need to replace == comparisons in insert/find etc. with another one (that's == for most cases except NaN and INF).

akzhan on 13 Jun 2017

To make delete_if work properly, there needs to be a way to delete an entry given just the entry. The only sane way to do that would be to add a previous reference in Entry to the last entry in the bucket's linked list. This seems to me like a sign we should implement this along with open addressing which would make deletion much easier (only have to manage the insertion-order linked list).

I still maintain that NaN#hashcode should raise. I don't think there's an expectation that anything doesn't raise.

RX14 on 13 Jun 2017

NAN and INF are good candidates for keys sometimes. So I prefer to NAN.hash not raised.

akzhan on 13 Jun 2017

It seems to me that it would be very rare that NaN would be useful as the value for a key. Let's say a program is processing data where multiple key-values are NaN. Each one of those will go into a hash as a separate key, since NaN != NaN. And when you try to get the value of somehash[ NaN ], that will never succeed, no matter how many entries there are in the hash where the key is NaN.

It's one thing if you know a key might have the value of NaN, but I'm also thinking about the case where the value of a key is computed, and due to mistakes or oversights the program is creating NaN keys when the programmer is completely unaware that that is possible. And thus the program will run into odd behaviors due to the way that NaN-valued keys should be handled.

I guess I'm thinking about this from my own history. While I can imagine there might be some situation where it is useful to have NaN as a key that behaves as a NaN-value should behave, I have never wanted that. And if I wrote a program here NaN-values were being generated for keys to a hash, then I'd want the language to force me to make it explicit that NaN values were expected.

drosehn on 13 Jun 2017

I'll add that it's easy for me to think of programs where I'd compute a key for a hash, and that key would turn out to be NaN for valid reasons (which is to say "not due to bugs").

drosehn on 13 Jun 2017

👍3

FWIW, Rust solves this by having constraints on the key type that float types don't fulfill. There, HashMap has a K: Hash + Ord constraint (where K is the key type), and f32 / f64 implement neither. You can however still compare floats, because there is the PartialOrd trait, which is implemented for f32 and f64 and which comparison operators are based on.