Julia: setdiff!(a,b) slower than setdiff(a,b) when b is large (and viceversa when a is large)

Created on 29 Dec 2017  路  9Comments  路  Source: JuliaLang/julia

julia> s = Set(rand(1_000_000));

julia> s1 = Set([1.0])
Set([1.0])

julia> @time setdiff!(s1, s)
  0.049974 seconds (19.83 k allocations: 1.146 MiB)
Set([1.0])

julia> @time setdiff!(s1, s)
  0.023034 seconds (4 allocations: 160 bytes)
Set([1.0])

julia> s1
Set([1.0])

julia> @time setdiff(s1, s)
  0.012756 seconds (5.46 k allocations: 325.882 KiB)
Set([1.0])

julia> @time setdiff(s1, s)
  0.000005 seconds (9 allocations: 656 bytes)
Set([1.0])

setdiff! iterates through all of b. setdiff iterates through all of a. Both functions could be specialized for b::Set.

On
Version 0.7.0-DEV.3096
Commit 3216445381* (10 days old master)

collections good first issue performance

Most helpful comment

Go for it.

As a note, it is for some reason extremely rare for people who ask in issues if they can work on something to actually come through and make a PR. Be the exception :)

All 9 comments

If I'm not mistaken, the recently merged #23528 have made setdiff as slow as setdiff! :sweat_smile:
The reason is that it's convenient to implement setdiff in terms of setdiff!. But as you note, the performance are inversed when you swap a and b, so indeed it can be specialized when they are both sets, but there has to be some tuning, to find a good threshold deciding whether you iterate a or b. Maybe choosing the smaller one is a good enough start.

@rfourquet is this issue still open? I really want to take this issue.I'm new to this organisation.

Yes, it's still open and help would certainly be welcome.

Okay! I want to work on this issue.Can you please guide me with the required files and changes?

Can you please guide me with the required files and changes?

For the files, you can use the @which macro, e.g. @which setdiff(Set([1]), Set([2])). For the required changes, you need to familiarize yourself with how the functions work now, and I guess introduce two branches depending on the size of the arguments.

Hi @firedranzer, are you still working on this issue? If not, can I start work on it?

Go for it.

As a note, it is for some reason extremely rare for people who ask in issues if they can work on something to actually come through and make a PR. Be the exception :)

I see similar asymmetric problems with symdiff, intersect and union. It might be worthwhile to invest in a more generic solution for this problem.

@laborg: your work on this so far on this is great鈥攊t would be amazing if you wanted to continue to work away at this and make all of these functions cleverer and more efficient!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

omus picture omus  路  3Comments

StefanKarpinski picture StefanKarpinski  路  3Comments

iamed2 picture iamed2  路  3Comments

helgee picture helgee  路  3Comments

i-apellaniz picture i-apellaniz  路  3Comments