julia> s = Set(rand(1_000_000));
julia> s1 = Set([1.0])
Set([1.0])
julia> @time setdiff!(s1, s)
0.049974 seconds (19.83 k allocations: 1.146 MiB)
Set([1.0])
julia> @time setdiff!(s1, s)
0.023034 seconds (4 allocations: 160 bytes)
Set([1.0])
julia> s1
Set([1.0])
julia> @time setdiff(s1, s)
0.012756 seconds (5.46 k allocations: 325.882 KiB)
Set([1.0])
julia> @time setdiff(s1, s)
0.000005 seconds (9 allocations: 656 bytes)
Set([1.0])
setdiff!
iterates through all of b
. setdiff
iterates through all of a
. Both functions could be specialized for b::Set
.
On
Version 0.7.0-DEV.3096
Commit 3216445381* (10 days old master)
If I'm not mistaken, the recently merged #23528 have made setdiff
as slow as setdiff!
:sweat_smile:
The reason is that it's convenient to implement setdiff
in terms of setdiff!
. But as you note, the performance are inversed when you swap a
and b
, so indeed it can be specialized when they are both sets, but there has to be some tuning, to find a good threshold deciding whether you iterate a
or b
. Maybe choosing the smaller one is a good enough start.
@rfourquet is this issue still open? I really want to take this issue.I'm new to this organisation.
Yes, it's still open and help would certainly be welcome.
Okay! I want to work on this issue.Can you please guide me with the required files and changes?
Can you please guide me with the required files and changes?
For the files, you can use the @which
macro, e.g. @which setdiff(Set([1]), Set([2]))
. For the required changes, you need to familiarize yourself with how the functions work now, and I guess introduce two branches depending on the size of the arguments.
Hi @firedranzer, are you still working on this issue? If not, can I start work on it?
Go for it.
As a note, it is for some reason extremely rare for people who ask in issues if they can work on something to actually come through and make a PR. Be the exception :)
I see similar asymmetric problems with symdiff
, intersect
and union
. It might be worthwhile to invest in a more generic solution for this problem.
@laborg: your work on this so far on this is great鈥攊t would be amazing if you wanted to continue to work away at this and make all of these functions cleverer and more efficient!
Most helpful comment
Go for it.
As a note, it is for some reason extremely rare for people who ask in issues if they can work on something to actually come through and make a PR. Be the exception :)