Julia: add a function to test sets equality?

Created on 29 Dec 2017  Â·  11Comments  Â·  Source: JuliaLang/julia

Currently we can do length(a) == length(b) && issubset(a, b) (which replicates ==(a::Set, b::Set)), but this is not ideal.

Should we add a issetequal function? Also, would we want to have sets containing the same elements be isequal ? If so, we could just widen the == signature for Set to AbstractSet, and similarly for hash (this would be a huge performance regression for hash(::BitSet), but this would not be slower than for Set, and speed is maybe not so important there).

And probably similarly for <, ⊆ etc. Currently, there is some inconsistency: for example Set(1) ⊆ BitSet(1) yields true, but using ⊊ is a method error.

Even then, we could still have an issetequal function for arrays and iterators.

I'm not sure how breaking a change this would be for 1.x. I can volunteer to make a PR in the next few days if needed.

collections

Most helpful comment

I think that we probably do want Set([1]) == BitSet([1]) since they are different representations of the same set, although the implementation is trickier than one might expect.

All 11 comments

Yes, I've thought the same before. I remember trying to do things like detect whether two AbstractDicts contain the same keys (the keys may be in different iteration orders, may have been collected, etc) and there's no great answer...

Why not use == for this? Is the only reason the fact that hash would have to be made consistent for all sets, which would be slow for BitSet? I think it matters more to use the natural operator == than being able to hash some kinds of sets efficiently.

We could also find a hashing algorithm which works for Set, but can be efficient for the special case of BitSet, just like we just did for arrays and ranges (https://github.com/JuliaLang/julia/pull/16401). This kind of optimization can always been done after 1.0 since it's not breaking as long as the behavior of == doesn't change.

I think there's more than just an optimization at play here. You might have vector = collect(keys(dict)) and want to do a set-like comparison with another set, vector, tuple, etc. Since == for indexables generally depends on matching key-value pairs, you may sometimes want a separate issetequal(a, b) = (a ⊆ b) && (b ⊆ a) type of operation (which might happen to behave the same as ==(::AbstractSet, ::AbstractSet) but certainly not ==(::AbstractVector, ::AbstractVector)).

Yes, but that doesn't mean AbstractSet objects shouldn't be considered as equal by == when it makes sense. Looks like we need both.

I agree we need both. It would not be obvious for me that we want Set([1]) == BitSet([1]), but this is the direction we take apparently, with the recent [1] == 1:1. issetequal could be 1.x, but ==(::AbstractSet, ::AbstractSet) would be better in 1.0.

That sounds like a good plan to me.

In terms of equality of Set and BitSet, I feel it is crucial to generic programming that an interface for AbstractSet is documented and followed. Set and BitSet should just be implementations of the interface with certain performance characteristics.

If this weren’t true, it would be hard to create functions that dispatch on AbstractSet and understand/predict how that generic code might behave. In which case, better not to have AbstractSet at all (just a bunch of unrelated set-ish implementations).

I think that we probably do want Set([1]) == BitSet([1]) since they are different representations of the same set, although the implementation is trickier than one might expect.

although the implementation is trickier than one might expect

Do you mean the (efficient) implementation of the == method?

I realize also that AbstractDict also implements generic hash and == functions, which also confirms we should just go with this.
Also in the docstring of ==: "Should be implemented for all types with a notion of equality, based on the
abstract value that an instance represents" (emphasize mine)

I suggest adding a generic (and ordered independent) ==(::AbstractSet, ::AbstractSet) to v1.0 so we don't need to make a breaking change to == after that.

Fixed by #25368

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tkoolen picture tkoolen  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments

felixrehren picture felixrehren  Â·  3Comments

TotalVerb picture TotalVerb  Â·  3Comments

iamed2 picture iamed2  Â·  3Comments