In Ruby there's Hash#compare_by_identity. What it basically does is to stop comparing things with eql? and use equal?.
In Crystal it would stop using == and compare object_id of Reference types (and I guess we could try to give a compile-time error when calling this method and the key is not a Reference type).
In the compiler many times I want to have a Hash with a key of ASTNode but I don't want to compare nodes by their content, I want to compare by their object_id. In that case I create a Hash of UInt64 (the type of object_id) and then I have to remember to put and ask always for obj.object_id which is a bit annoying and also bug-prone.
I also recently fixed a bug in Ruby's irb where I needed this: https://github.com/ruby/ruby/pull/2555 . I didn't know compare_by_identity existed until nobu suggested it in the issue, and that's where I learned about it.
Now, another alternative is to let Hash hold a general comparer so that one could provide any kind of key comparison. But I think we could still eventually do that, and compare_by_identity would just set a comparer that compares by identity. But maybe we don't need custom comparers except this one (at least in the code I wrote so far, I never needed a custom comparer by I did needed to compare by object_id).
This is very easy to implement in Hash.
Thoughts?
We'd also add Set#compare_by_identity (Ruby has it too).
Some places where we could use this:
(and others...)
Maybe better create new class? ReferenceHash?
Maybe better create new class? ReferenceHash?
It's a valid idea, but it means replicating all the methods from Hash. If we want to avoid that we have to create a base type or module (like we have Indexable) and include that in both. Then code should have this BaseHash as a restriction for maximum generic code.
It's possible, but I think it's more complex than just adding a single method that changes the way Hash works internally.
Eventually we might want a generic interface for maps as well. There are other valid implementations besides a hash table. But I agree, having an additional type for this would be quite complex.
But would it already help to quickly compare object ids before comparing values? I mean this could be a general improvement for other cases as well.
But would it already help to quickly compare object ids before comparing values?
The thing is, from a given key we compute its hash value. Then we can go and look at the hash entries for that hash key. There's no way we can do that for the key and for its object_id at the same time.
The compare_by_identity would change Hash to do key.object_id.hash instead of key.hash and then the algorithm is the same.
I'm fine with this but it should be a constructor argument.
I agree with @RX14
Yes, I was thinking the same thing after I proposed this 馃憤
I'm fine with this but it should be a constructor argument
On second thought, Hash has like three constructors which means we have to add this named argument to all three of them.
I think it's much simpler to have it as a separate method:
comparer= method to set a different comparer, but doing that as a named argument too would bloat the signatures too much, and it would be conflicting with compare_by_identitySo I'll soon go ahead and add it as a method.
Note that in Ruby it's also fine to call this when the Hash already has values, in which case the Hash is simply rehashed. I think this is also great because you can do {a => b, c => d}.compare_by_identity but with a named argument in the constructor you won't be able to use the hash syntax.
Modifying the behaviour after initialization introduces an (potentially unexpected) side effect. I'm not sure how much we should try to avoid that, but it should be considered.
This should definitely have still been a constructor argument :C
This should definitely have still been a constructor argument :C
I agree with that sentiment, but that removes the possibility of using a hash literal with compare_by_identity.
@asterite I can't imagine a usecase? It's not like compare_by_identity is commonly used.
Also, when a hash from a literal is filled with values before compare_by_identity is set, it's less efficient because it needs to be re-hashed. And it even leads to unexpected results when the literal contains keys that are equal but not identical, because initial insert will be based on equality. For example {"foo" => 1, "f" + "oo" => 2}.compare_by_identity results in {"foo" => 2}, which is certainly not intended. So using hash literals with compare_by_identity is broken anyway.