Currently we can do length(a) == length(b) && issubset(a, b)
(which replicates ==(a::Set, b::Set)
), but this is not ideal.
Should we add a issetequal
function? Also, would we want to have sets containing the same elements be isequal
? If so, we could just widen the ==
signature for Set
to AbstractSet
, and similarly for hash
(this would be a huge performance regression for hash(::BitSet)
, but this would not be slower than for Set
, and speed is maybe not so important there).
And probably similarly for <
, ⊆
etc. Currently, there is some inconsistency: for example Set(1) ⊆ BitSet(1)
yields true
, but using ⊊
is a method error.
Even then, we could still have an issetequal
function for arrays and iterators.
I'm not sure how breaking a change this would be for 1.x. I can volunteer to make a PR in the next few days if needed.
Yes, I've thought the same before. I remember trying to do things like detect whether two AbstractDict
s contain the same keys (the keys may be in different iteration orders, may have been collect
ed, etc) and there's no great answer...
Why not use ==
for this? Is the only reason the fact that hash
would have to be made consistent for all sets, which would be slow for BitSet
? I think it matters more to use the natural operator ==
than being able to hash some kinds of sets efficiently.
We could also find a hashing algorithm which works for Set
, but can be efficient for the special case of BitSet
, just like we just did for arrays and ranges (https://github.com/JuliaLang/julia/pull/16401). This kind of optimization can always been done after 1.0 since it's not breaking as long as the behavior of ==
doesn't change.
I think there's more than just an optimization at play here. You might have vector = collect(keys(dict))
and want to do a set-like comparison with another set, vector, tuple, etc. Since ==
for indexables generally depends on matching key-value pairs, you may sometimes want a separate issetequal(a, b) = (a ⊆ b) && (b ⊆ a)
type of operation (which might happen to behave the same as ==(::AbstractSet, ::AbstractSet)
but certainly not ==(::AbstractVector, ::AbstractVector)
).
Yes, but that doesn't mean AbstractSet
objects shouldn't be considered as equal by ==
when it makes sense. Looks like we need both.
I agree we need both. It would not be obvious for me that we want Set([1]) == BitSet([1])
, but this is the direction we take apparently, with the recent [1] == 1:1
. issetequal
could be 1.x, but ==(::AbstractSet, ::AbstractSet)
would be better in 1.0.
That sounds like a good plan to me.
In terms of equality of Set
and BitSet
, I feel it is crucial to generic programming that an interface for AbstractSet
is documented and followed. Set
and BitSet
should just be implementations of the interface with certain performance characteristics.
If this weren’t true, it would be hard to create functions that dispatch on AbstractSet
and understand/predict how that generic code might behave. In which case, better not to have AbstractSet
at all (just a bunch of unrelated set-ish implementations).
I think that we probably do want Set([1]) == BitSet([1])
since they are different representations of the same set, although the implementation is trickier than one might expect.
although the implementation is trickier than one might expect
Do you mean the (efficient) implementation of the ==
method?
I realize also that AbstractDict
also implements generic hash
and ==
functions, which also confirms we should just go with this.
Also in the docstring of ==
: "Should be implemented for all types with a notion of equality, based on the
abstract value that an instance represents" (emphasize mine)
I suggest adding a generic (and ordered independent) ==(::AbstractSet, ::AbstractSet)
to v1.0 so we don't need to make a breaking change to ==
after that.
Fixed by #25368
Most helpful comment
I think that we probably do want
Set([1]) == BitSet([1])
since they are different representations of the same set, although the implementation is trickier than one might expect.