Julia: == for immutables should recursively call == on its fields

Created on 26 Oct 2013  Â·  56Comments  Â·  Source: JuliaLang/julia

This doesn't make much sense:

julia> immutable Foo{T}
         bar::T
       end

julia> Foo("baz") == Foo("baz")
false

julia> Foo("baz").bar == Foo("baz").bar
true

julia> Foo(1) == Foo(1)
true

If the fields are == to each other then the objects should be == to each other.

breaking decision help wanted

Most helpful comment

If we don't call == recursively, I think it should be a method error; then it's at least obvious that you need to define it instead of getting surprising behavior (sometimes much later than needed!) and having to go back and define == because === doesn't give you what you expect.

All 56 comments

I would also argue that for mutable types maybe we should throw an error in the absence of specialized ==.

Tuples already do the right thing:

julia> ("baz",) == ("baz",)
true

Arrays also behave this way

julia> {"abc",5} == {"abc",5}
true

julia> {"abc",5} == {"abc",4}
false

I am not sure if I think it is a good thing to introduce more differences between mutable and immutable types, but a default implementation of error for == on user defined types is something I like. How would you define == for different user defined types?

I can agree that the current default == is not clearly better than the one proposed here. Tuples are an easy case since we know they just "pass through" to their values. Throwing an error is certainly safe; in my view == depends on the abstract meaning of its arguments, so without a definition we really don't know what it means.

If we throw an error if the comparison function is undefined, the code for comparing arbitrary objects will be longer.

try
    return a == b
catch
    return false
end

or

applicable(==, a, b) && a == b

@StefanKarpinski, did this get implemented as part of the hashing changes? I seem to recall not needing == definitions for some immutable types after the changes.

Immutables already call === recursively on fields, so that might have been it. This hasn't been implemented.

But doesn't this definition mean it is?

==(x,y) = x === y

That calls ===, which then calls === recursively. The issue asks for == to call == recursively.

Ah, got it!

https://stackoverflow.com/questions/44324800/how-to-use-haskey-and-in-functions-with-complex-dictionnary-keys-in-julia/44333559#44333559

I still think we should do this. All three equality predicates should probably call themselves recursively.

Should == require equality of the type parameters? Eg is Foo(1) == Foo(1.0) in the example above?

Maybe a simple solution to this issue is to make "hello" === "hello" return true. This would be a patch on the function ===, making it treat string specially.

Making "hello" === "hello" is certainly a life goal, but it's not so simple, unfortunately (#22193).

I'm not a big fan of this change since I think we shouldn't try to guess what == means for an arbitrary type. For example it's not clear what to do with type parameters, and it's not clear whether == should also recur into mutable values. (It's also not easy to implement efficiently; currently we'd need a generated function.)

This seems to be close to a duplicate of https://github.com/JuliaLang/julia/issues/12198#issuecomment-122426956?

That's already fixed on master, but it's certainly a larger issue than string (e.g. BigInts).

That's done --- we now have "abc" === "abc". But surely this is a more general issue, not specific to strings.

Fixing only the examples in this thread is not at all what this issue is about.

If we don't call == recursively, I think it should be a method error; then it's at least obvious that you need to define it instead of getting surprising behavior (sometimes much later than needed!) and having to go back and define == because === doesn't give you what you expect.

Yes I think making it a method error is worth exploring. That was basically the conclusion of #12198 for the hash function.

Most promising approach from triage: only provide a default == fallback if the type satisfies the following recursively defined predicate, P:

  • P is true of strings, symbols, and bits types
  • P is true of tuples of values such that P is true of all its values
  • P is true of immutable types such that P is true of all its fields
  • P is false otherwise

This essentially amounts to only automatically providing == when no one would be surprised by its behavior. Otherwise you'll get a no method error telling you that you need to define it.

Yes that sounds good to me.

Erm, perhaps the above should also have the condition for the bits types and immutables that no custom == definition is provided.

Well, that sort of breaks the plan. If we're going to do that we might as well just call == recursively.

Well that was what I wanted in the first place, so, ok :D

Here a prototype on how a trait-based system could look like (works on 0.6 and 0.7): https://gist.github.com/mauro3/6774cc7ced71b854d4ccb5a9eb513eab

The idea is that two types having an equality relation defined would just do as they do now.
However, if there is no specific equality method then belonging to the trait EqualWhenEqualFields would compare their fields to test for equality, whereas on belonging to EqualWhenEqual would not (with fall-back to ===). The comparison of the fields would then again be according to the same rules. Immutables could be in EqualWhenEqualFields by default, mutables not. So "somewhat" similar to Stefan's rule suggestion.

Note, in the gist, two unequal types with the same fields can compare equal, which is probably not what we want (but easy to change).

I haven't figured out yet how this meshes with hashing.

Thoughts? I could try to race this to the finish line.

I updated the example gist(rev. 2) to include hashing. In a nutshell this would behave like:

struct A
    a
    b
end
# As A is immutable and thus automatically compares by
# comparing == of fields:
A(1,[2])==A(1,[2]) # -> true (currently false)
# because it is by default part of the trait:
EqualityKind(A(1,[2])) == CompareFields() # -> true

# Type A can opt-out of field-wise comparison
EqualityKind(::A) = ManuallyDefinedComparison()
A(1,[2]) == A(1,[2]) # -> false , as now uses the fall-back of == which is ===


mutable struct AM
    a
    b
end
# As AM is mutable and thus falls back to === comparison:
AM(1,[2])==AM(1,[2]) # -> false (currently also false)
# because it is by default part of the trait:
EqualityKind(A(1,[2])) == ManuallyDefinedComparison() # -> true

# AM can opt-in on field-wise comparison
EqualityKind(::AM) = CompareFields()
AM(1,[2])==AM(1,[2]) # -> true

# Hashing follows suit and is defined field-wise if
# type is part of CompareFields:
hash(AM(1,[2])) == hash(AM(1,[2]))  # -> true (currently false)
# If the type is part of ManuallyDefinedComparison then an error is thrown
# (following jb's lead to remove the fall-back hash method)
hash(A(1,[2])) # -> error (currently this give an object_id based hash)

# One can still shoot oneself in the foot like so
struct Bad
    a
end
==(::Bad, ::Bad) = true
# this breaks the hashing as that is still done field-wise as
EqualityKind(Bad(1)) == CompareFields()
# and thus hashing is done field-wise.

This code works with the gist (rev. 2) by replacing == -> eq and hash => ha.

I think this could strike a good middle ground between convenience and safeness, as most == are either working as expected or can be made to do so with a EqualityKind(::AM) = CompareFields() and hashing works then too. The removal of the fall back (#24354) would thus be less of an inconvenience. However, it is still not 100% save with respect to hashing (see above), but I suspect that will be hard to make safe without interfaces.

Alternatives to minimize foot-shooting further at the expense of convenience could be:

  • to make the field-wise comparison and hashing opt-in for all types and have no fall-back hashing for any type.
  • de-couple comparison and hashing trait and make the hashing error by error. Although, that has potential to allow different types of foot-shooting.

Edit: this should probably be thought through with respect to missing, x-ref #24653.

@mauro3, this is really interesting and could be a nice default for comparisons. Could this code live (at least initially) in a package? Maybe with some other operator name, so as not to pirate types.

The gist uses other operator names, so building a package Equality on this could be possible. But I'm not sure what the value of that would be: You'd have to tell people using a package which makes use of Equality to not use == to compare instances of types of that package but eq. No, this would need to be in Base and it is breaking, so either 1.0 or 2.0.

I spent a few brain-cycles over the last few days pondering this issue and will try to summarize here.

First, the war for equality and hashing is fought on many fronts, this is one of them. Other currently active fronts:

  • #9381 "Should in use isequal" (and PR #24563)
  • #12198 "Easy to break hashing" (and its PR #24354 removing the fall-back hash)

Second, here I don't look into the difference between == and isequal and always write isequal (as that is tied to hashing). But I think defaults should apply equally to == as the two should be as similar as possible.

A few observations about current equality handling in Julia:

  • There is a fallback to === for Any, thus any types can be compared.
  • When custom isequal methods are defined, then similar types generally compare their values and return equal even when their types are different. Examples: isequal(SparseVector([1,2]), [1,2]) is true, isequal((1,2), [1,2]) is false as (presumably) Tuples are different from Vectors. A notable exception is isequal(1:2, [1,2]) is false (#13565, as otherwise hashing would need to be inefficient for ranges)
  • there is a tension with hashing as isequal and hash need to be in-sync.

I can see three "reasonable" defaults for isequal and ==:

  1. no default (i.e. throw an error)
  2. === (current default)
  3. field-by-field comparison. This can be subdivided into
    a) only compare "equal" types field by field (what "equal" means needs defining)
    b) compare all types field by field

The default maybe different for different types, here it is suggested that 3a maybe good for immutables but definitely not mutables. Also, I think the rules suggested in above comment are too complicated.

The default needs to either create a compatible hash method, or no hash method (which is what's proposed PR #24354). Ideally, if there is a default isequal & hash and a user defines a new isequal method, then the default hash method should be trashed. But I don't think that is possible in Julia.

Solutions:

  • remove default hash do nothing about isequal. However, Stefan said: "Resolution: remove the default hash method and make sure the API for defining a new hash function is nice and simple to use." Which I think entails an equally easy API to create isequal.
  • Stefan's (old) suggestion of a identity_parts function. But suffers from "oh, if you want to define ==, you should actually define this other non-obvious function instead." ref.
  • A traits system to opt in-to/out-of one of the three defaults above. This also suffers from the "oh, if you want to define ==, you should actually define this other non-obvious function instead." but would have a cleaner implementation.

Forward:

  • add equality/hash stuff to the "Interfaces" section in the docs
  • provide an easy to use field-by-field comparison and hashing function and a suggestion to use it in the doc-string (I'll put a PR together)
  • maybe use a traits system to opt-in/out of one of the three options above

(EDIT: I updated the example gist to take points 1-3 into account)

For what it's worth, I still favor the approach where you define a function that tells you what the "essence" of an object is with reasonable defaults. My reasoning is this: we have these two functions == and hash which both relate to some deeper notion of identity but neither one can be correctly computed from the other. So what do we do in situations like that? We introduce a deeper function from which they can both be computed correctly and tell people to overload that. I don't think that will be terribly confusing to people.

Thanks Stefan for your comments! I had a few more thoughts:

  • equality can be seen as a property/trait of a type alone (this only occurred to me now...): an object needs to specify which are its "properties" which need to be compared (or hashed) and the order in which they need to be compared (or hashed). These "comp.-properties" can be fields, values returned by functions, e.g. getindex, type-parameters (others?).
  • there are some performance tricks to catch un-equal things early on, e.g. comparing the length of two arrays before comparing all their values. Probably these could just be absorbed into the comp.-properties.

So, a general "essence" function would need to be able to somehow specify this. I put a prototype of this together in eq-v2.jl.

In the prototype, the essence function is a Holy-trait (I think it has to be) which also stores the properties which need to be compared as functions:

struct C
    t1
    t2
end
Essence(::Type{C}) = Essence{C}((x->Int, x->getfield(x,:t2), x->getfield(x,:t1), x->2))

This says that the hash would be hash(Int, hash(x.t2, hash(x.t1, hash(2)))). Similarly a comparison returns true if another object also has four comparison-properties and they also return the values (Int, x.t2, x.t1, 2).

I think this could work with having a default field-by-field comparison for immutables. Did you have something like this in mind?

Ok, I don't think this trait-based equality can be done before feature freeze (if it is in 2 days).

However, I'll prepare a PR simply updating the default isequal to use field-by-field comparison for immutables.

@mauro3, are you still working on this? If not, I would like to make a try, as I really want to see this in v0.7. Wondering if this is the right track:

  1. can I use generated functions? they seem to compile into efficient code
  2. the fallbacks seem to work fine
  3. I am not allowing types with different parameters to be equal, should I? What would be the right way?
import Base: ==, hash
using Test

@generated function struct_recursive_equal(a::T, b::T) where T
    mapreduce(F -> :((a.$F == b.$F)), (A, B) -> :($A && $B), fieldnames(a))
end

@generated function struct_recursive_hash(a, h)
    mapreduce(F -> :(a.$F), (A, B) -> :(hash($B, $A)), :h, fieldnames(a))
end

function ==(a::T, b::T) where T
    if Base.isstruct(T) && Base.isimmutable(T)
        struct_recursive_equal(a, b)
    else
        a === b
    end
end

function hash(a::T, h::Uint) where T
    if Base.isstruct(T) && Base.isimmutable(T)
        struct_recursive_hash(a, hash(string(T)))
    else
        hash_uint(3h - object_id(x))
    end
end

struct Foo{S, T}
    a::S
    b::T
end

@test Foo(1, 2) == Foo(1, 2)
@test Foo(1.0, 2) ≠ Foo(1, 2)   # NOTE different

@test hash(Foo(1, 2)) == hash(Foo(1, 2))
@test hash(Foo(1.0, 2)) ≠ hash(Foo(1, 2)) # NOTE different

:+1: Cool, that looks correct (assuming we want to do this).

This should use if @generated (there are some examples in Base).

I'm not sure what to do about type parameters. The safe thing is to require types to match. A fancy option is to ignore type parameters that appear in field types (idea being those are covered by comparing field values).

OK, will do it soon. if @generated is very nice, did not know about it. A question: the hash method should go into hashing.jl, but what about the ==? Is essentials.jl the right place for it?

This would go with the existing == fallback in operators.jl.

I'm not sure what to do about type parameters. The safe thing is to require types to match. A fancy option is to ignore type parameters that appear in field types (idea being those are covered by comparing field values).

That does sound really bad. If you wanted this to be correct for Complex, for example, you would need to know that the parameters should be ignored. But would also need to know that in other cases (e.g. Val{T}), they are significant. And for yet other cases (such as RefArray{T, A<:AbstractArray{T}}, the dependence is through another typevar – although == is perhaps correctly just === for this type?), and finally for some unusually cases, the dependence seems like it may be conditional (either because it's a lower bound like T >: S or the field type is a union like ::Union{T, Nothing}.

Yep, it's a problem. Certainly types with no fields should just be compared by ===. When there are fields, maybe better to ignore type parameters.

@vtjnash 's example of RefArray is worrying --- that's a case where the type is immutable but we want it compared by === by default. Of course the type could be made mutable, but it shouldn't have to be, and that's kind of a gotcha.

I think that defining == to be true only when the type parameters match strictly is going to disappoint a lot of expectations.

What's the canonical way to "normalize" a possibly parametrized type to the type name in v0.7? I can only think of T.name.

Anyhow, I am still happy to do a PR, will do the non-strict version (not comparing parameters) unless there is an objection.

Yes, T.name should work (and can be type inferred). But the more @vtjnash and I discuss this the more we return to thinking it might not be a good idea...

Not T.name, but Base.typename can be used. However, it is definitely not advisable / canonical to rip apart a type like that.

OK, I will wait for the outcome of the discussion then. I think @mauro3's suggestion of enabling == with traits on demand for certain types was a very solid one; hope it is taken up at some point.

The trait idea is orthogonal; the question would remain as to what the default behavior should be in the absence of trait declarations for a new type.

Sorry for not having followed up on my promise, life caught up... I don't think I will have time to put work into this just now. @tpapp, you're welcome to work on the traits-stuff if you fancy.

Anyway, now that I've spent some time thinking about this, my 2-cents:

  • a rule like proposed above is too complicated. I still don't fully understand it.
  • equality should not be recursive. Each field-type needs to decide itself how to be compared. Otherwise, it could happen that a == b even though a.f != b.f or vice-versa.
  • I think it would be confusing to have default-equality behave differently on whether a type is mutable or immutable. I might be wrong, but my impression is that the motivation to have immutables and mutables comparing differently by default, is hash-table use. However, that issue is addressed adequately by removing the fall-back hash method.
  • I agree with above discussion that type parameters should not participate in equality (by default), as that is how (almost?) all == methods work.

Thus, this leaves the two solutions:

  1. default comparison field-by-field for all types, or
  2. no default method defined for comparison

equality should not be recursive. Each field-type needs to decide itself how to be compared. Otherwise, it could happen that a == b even though a.f != b.f or vice-versa.

Do you mean equality should be recursive?

my impression is that the motivation to have immutables and mutables comparing differently by default, is hash-table use

That's not the reason. The reason is that immutable values with the same contents cannot be distinguished, while mutable values with the same contents can be distinguished by mutating one and seeing if the "other" one also changes.

Do you mean equality should be recursive?

This is probably a miss-understanding. I mean that the following is not desirable:

struct A
  a
  b
end
==(a::A,aa::A) = a.a==a.a # disregard b

struct B
  a::A
end

# with recursion:
B(A(1,2)) != B(A(1,1))
# even though
A(1,2) == A(1,1)

Thus I think "field-by-field comparison" describes a system where B(A(1,2))==B(A(1,1)) better than "recursion".

That's not the reason. The reason is that immutable values with the same contents cannot be distinguished, while mutable values with the same contents can be distinguished by mutating one and seeing if the "other" one also changes.

But if I care about this I should use ===. When == comparing two arrays, I don't expect that they stay == forever after.

Thus I think "field-by-field comparison" describes a system where B(A(1,2))==B(A(1,1)) better than "recursion".

Ah, I see now. I don't think anybody has proposed this. We have been talking about making == on B recursively call == on B's a field, which would call the == method defined for A. It's recursive because == would internally call == on a smaller object.

A minimally invasive yet reasonably convenient extension of the status quo could be the following:

  1. 3 possible traits for ==:
    a. requiring strict ===, this is the default
    b. requiring types to match exactly, all fields compared by == then, according to their own methods
    c. only Base.typename is required to match, all fields are compared by ==.
  2. hashing is always compatible with the above definition.

This would preserve existing behavior, avoid spurious equality when not intended (eg making a Dict-like type compare every field), yet allow very convenient one-line extensions for the most frequently requested form of equality.

This would also not be a breaking change, and could be done later. Still, if this is acceptable, I would make a try at this, I would love to see this in v1.0. It is asked for very frequently.

Triage prefers the following solution, which was spelt out above in parts, but not in one place

  • Have the default ==, recursively call isequal
  • Have it ignore any type parameters that appear in the field definitions
  • Have it compare type parameters that do not

This is basically what AutoHashEqual does, except that it seems slightly better
in the (somewhat unusual) case that you have type parameters just used for tagging,
but not in the fields.

I just remembered the original issue calls for doing this for immutable types, but it seems we're leaning towards using it for everything? That brings up (1) object cycles, (2) strange default behavior for things like Pipes.

The way I see it, there are thousands of julia types out there, call the number N. Some number E need to have boilerplate == methods defined. There's a cognitive bias: as soon as you've written that boilerplate 3 or 4 times, it seems annoying, but in fact E ≪ N. If we change the default, then N - E types need a == method that calls === to be safe. Otherwise you can get stack overflows or problematic spurious equalities.

So it was not discussed (much?) on the call, but do we want to limit this to immutables?

Pondering this some more after the call, I'm thinking that the current default of not-equal has been a much better default than field-equality (if we're going to define equals at all). For example, it's seems odd that we'll now need to define ==(a::IO, b::IO) = (a === b) to avoid spurious equivalences, such that (after this change) x in [y] might return true for x !== y.

It seems like there's probably many other cases like that too, such as Condition (will be now considered equal if Tasks with the same starting function + state are waiting on them) and Channel (will be equal if they contain the same data, for example, if they're both empty).

I would continue to favor doing this for mutable types, though I feel less strongly about that. I'm not really concerned about getting StackOverflowErrors for circular references by default, since e.g. the same happens here:

julia> a = Any[]
0-element Array{Any,1}

julia> push!(a, a)
1-element Array{Any,1}:
 Any[Any[#= circular reference @-1 =#]]

julia> a == a
ERROR: StackOverflowError:

But it can cause stack overflows in cases where people were relying on the === fallback.

I think this is still controversial and doesn't need to be done right now. Moving to 2.0.

Was this page helpful?
5 / 5 - 1 ratings