Crystal: [RFC] Add `Hash#compare_by_identity`

Created on 15 Oct 2019 · 14Comments · Source: crystal-lang/crystal

In Ruby there's Hash#compare_by_identity. What it basically does is to stop comparing things with eql? and use equal?.

In Crystal it would stop using == and compare object_id of Reference types (and I guess we could try to give a compile-time error when calling this method and the key is not a Reference type).

In the compiler many times I want to have a Hash with a key of ASTNode but I don't want to compare nodes by their content, I want to compare by their object_id. In that case I create a Hash of UInt64 (the type of object_id) and then I have to remember to put and ask always for obj.object_id which is a bit annoying and also bug-prone.

I also recently fixed a bug in Ruby's irb where I needed this: https://github.com/ruby/ruby/pull/2555 . I didn't know compare_by_identity existed until nobu suggested it in the issue, and that's where I learned about it.

Now, another alternative is to let Hash hold a general comparer so that one could provide any kind of key comparison. But I think we could still eventually do that, and compare_by_identity would just set a comparer that compares by identity. But maybe we don't need custom comparers except this one (at least in the code I wrote so far, I never needed a custom comparer by I did needed to compare by object_id).

This is very easy to implement in Hash.

Thoughts?

discussion topiccollection

Source

asterite

👀1 👍1

All 14 comments

We'd also add Set#compare_by_identity (Ruby has it too).

Some places where we could use this:

(and others...)

asterite on 15 Oct 2019

Maybe better create new class? ReferenceHash?

kostya on 15 Oct 2019

Maybe better create new class? ReferenceHash?

It's a valid idea, but it means replicating all the methods from Hash. If we want to avoid that we have to create a base type or module (like we have Indexable) and include that in both. Then code should have this BaseHash as a restriction for maximum generic code.

It's possible, but I think it's more complex than just adding a single method that changes the way Hash works internally.

asterite on 15 Oct 2019

👍1

Eventually we might want a generic interface for maps as well. There are other valid implementations besides a hash table. But I agree, having an additional type for this would be quite complex.

But would it already help to quickly compare object ids before comparing values? I mean this could be a general improvement for other cases as well.

straight-shoota on 15 Oct 2019

But would it already help to quickly compare object ids before comparing values?

The thing is, from a given key we compute its hash value. Then we can go and look at the hash entries for that hash key. There's no way we can do that for the key and for its object_id at the same time.

The compare_by_identity would change Hash to do key.object_id.hash instead of key.hash and then the algorithm is the same.

asterite on 15 Oct 2019

👍1

I'm fine with this but it should be a constructor argument.

RX14 on 29 Oct 2019

👍1

I agree with @RX14

bcardiff on 29 Oct 2019

Yes, I was thinking the same thing after I proposed this 👍

asterite on 29 Oct 2019

I'm fine with this but it should be a constructor argument

On second thought, Hash has like three constructors which means we have to add this named argument to all three of them.

I think it's much simpler to have it as a separate method:

it's unobtrusive (no need to modify existing API)
it can be documented in a single place
eventually we can add a comparer= method to set a different comparer, but doing that as a named argument too would bloat the signatures too much, and it would be conflicting with compare_by_identity

So I'll soon go ahead and add it as a method.

Note that in Ruby it's also fine to call this when the Hash already has values, in which case the Hash is simply rehashed. I think this is also great because you can do {a => b, c => d}.compare_by_identity but with a named argument in the constructor you won't be able to use the hash syntax.

asterite on 5 Nov 2019

Modifying the behaviour after initialization introduces an (potentially unexpected) side effect. I'm not sure how much we should try to avoid that, but it should be considered.

straight-shoota on 5 Nov 2019

This should definitely have still been a constructor argument :C

RX14 on 13 Nov 2019

This should definitely have still been a constructor argument :C

I agree with that sentiment, but that removes the possibility of using a hash literal with compare_by_identity.

asterite on 13 Nov 2019

@asterite I can't imagine a usecase? It's not like compare_by_identity is commonly used.

RX14 on 13 Nov 2019

Also, when a hash from a literal is filled with values before compare_by_identity is set, it's less efficient because it needs to be re-hashed. And it even leads to unexpected results when the literal contains keys that are equal but not identical, because initial insert will be based on equality. For example {"foo" => 1, "f" + "oo" => 2}.compare_by_identity results in {"foo" => 2}, which is certainly not intended. So using hash literals with compare_by_identity is broken anyway.

straight-shoota on 11 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Compiler crash: "Module validation failed: Terminator found in the middle of a basic block!"

will · 3Comments

Float JSON representation is wrong

RX14 · 3Comments

Proposal: Rename File::Stat to FileInfo

lbguilherme · 3Comments

Gracefully stopping fibers

Papierkorb · 3Comments

Lazy getter macro

asterite · 3Comments