Julia: Very Poor Performance of Diagonal + SparseMatrixCSC

Created on 19 Apr 2019 · 9Comments · Source: JuliaLang/julia

There appears to be quite poor performance when adding a Diagonal matrix to a matrix of type SparseMatrixCSC. MWE:

N = 63001
@time Diagonal(ones(N)) + spdiagm(0 => ones(N));
@time spdiagm(0 => ones(N)) +  spdiagm(0 => ones(N));

gives

julia> @time Diagonal(ones(N)) + spdiagm(0 => ones(N))
 10.323352 seconds (77 allocations: 12.579 MiB)
julia> @time spdiagm(0 => ones(N)) + spdiagm(0 => ones(N))
  0.053357 seconds (71 allocations: 12.982 MiB, 80.98% gc time)

@briochemc also reports the same (see #linear-algebra in the julialang Slack)

julia> v = ones(1000) ;

julia> @btime spdiagm(0 => $v) + spdiagm(0 => $v) ; # fast
  85.206 μs (40 allocations: 198.98 KiB)

julia> @btime sparse(Diagonal($v)) + spdiagm(0 => $v) ; # fastest
  55.500 μs (26 allocations: 143.19 KiB)

julia> @btime Diagonal($v) + spdiagm(0 => $v) ; # slow
  5.564 ms (45 allocations: 191.73 KiB)

broadcast performance sparse

Source

angeris

Most helpful comment

I would fully support moving the relevant sparse specializations onto SparseMatrixCSC (your "edit" alternative).

mbauman on 1 Jul 2019

👍4

All 9 comments

Just converting the Diagonal to sparse seems like a good solution? I don't think the solution here is to add a specialized version because there are practically infinite number of combinations of operators and wrappers of abstract arrays that would need to be considered.

KristofferC on 19 Apr 2019

👍2

Just converting the Diagonal to sparse seems like a good solution? I don't think the solution here is to add a specialized version because there are practically infinite number of combinations of operators and wrappers of abstract arrays that would need to be considered.

Is this a case automatically covered by #31563?

abraunst on 19 Apr 2019

This is actually a really easy fix — we're already attempting to convert the Diagonal to sparse but it seems we're doing it in a really poor way.

# This is the function broadcast uses to convert structured array types to sparse
julia> @time SparseArrays.HigherOrderFns._sparsifystructured(Diagonal(ones(63001)));
  8.233826 seconds (42 allocations: 2.963 MiB)

# This is what it uses to do its work — it's a naive O(N^2) algorithm!
julia> @time SparseMatrixCSC(Diagonal(ones(63001)));
  8.216935 seconds (88 allocations: 2.966 MiB)

# But we have this specialization that's not being used!
julia> @time sparse(Diagonal(ones(63001)));
  0.043160 seconds (70.57 k allocations: 5.605 MiB)

mbauman on 19 Apr 2019

👍3

For sure. I think expected behavior here should be the same as calling sparse(Diagonal(v)) before the sum. In particular, as @mbauman mentioned, the resulting matrix is indeed of type SparseMatrixCSC (not dense), but the conversion is extremely slow, which is wholly unexpected.

angeris on 19 Apr 2019

I agree with @mbauman's suggestion to make sure the conversion is fast

I also thought I should point out to @angeris, to be exhaustive, that Diagonal(v) + Diagonal(v) is much faster than all other cases:

julia> v = ones(1000) ;

julia> @btime spdiagm(0 => $v) + spdiagm(0 => $v) ; # fast
  79.443 μs (44 allocations: 199.05 KiB)

julia> @btime sparse(Diagonal($v)) + spdiagm(0 => $v) ; # fastest
  49.983 μs (26 allocations: 143.19 KiB)

julia> @btime Diagonal($v) + spdiagm(0 => $v) ; # slow
  5.556 ms (45 allocations: 191.73 KiB)

julia> @btime Diagonal($v) + Diagonal($v) ; # faster than fastest ;)
  1.310 μs (2 allocations: 7.95 KiB)

So it depends what one plans to do with the matrix because sometimes it is better not to do the conversion to sparse. For example,

julia> v1, v2, v3 = rand(1000), rand(1000), rand(1000) ;

julia> @btime $v1' * Diagonal($v2) * $v3 ;
  1.751 μs (3 allocations: 64 bytes)

julia> @btime $v1' * sparse(Diagonal($v2)) * $v3 ;
  8.863 μs (6 allocations: 31.88 KiB)

briochemc on 20 Apr 2019

@briochemc Indeed! This was an MWE which came up because I was generating a SparseMatrixCSC from a large, 2D laplacian-type matrix (rather than a diagonal one, via spdiagm) and was then adding it to one of type Diagonal :)

Thanks for the latter example... it's super interesting: I didn't realize the performance would be 8 times worse with the sparse conversion!

angeris on 20 Apr 2019

What do you think about that proposal https://github.com/JuliaLang/julia/pull/31563, which would simplify the the solution considerably in view of the problem

there are practically infinite number of combinations of operators and wrappers of abstract arrays

KlausC on 20 Apr 2019

👍1

julia> v1, v2, v3 = rand(1000), rand(1000), rand(1000) ;

julia> @btime $v1' * Diagonal($v2) * $v3 ;
  1.751 μs (3 allocations: 64 bytes)

is fast because it has a special method, which, in a single loop goes through v1, v2 and v3, takes the product and adds them up. That method has a signature (::Adjoint, ::Diagonal, ::(Abstract/Strided)Vector), and won't be affected by whatever we do with the broadcasting. So, how about adding these

SparseMatrixCSC(T::SymTridiagonal) = sparse(T)
SparseMatrixCSC(T::Tridiagonal) = sparse(T)
SparseMatrixCSC(B::Bidiagonal) = sparse(B)
SparseMatrixCSC(D::Diagonal) = sparse(D)

The other, equivalent option is to add in HigherOrderFns

_sparsifystructured(D::Diagonal{<:Number}) = sparse(D)

which would restrict its effect on broadcasting aspects. The equivalence stems from the fact that we have

_sparsifystructured(M::AbstractMatrix) = SparseMatrixCSC(M)

and since the structured matrices are not Matrix but AbstractMatrix, calling sparse on them does not end up in the fast, but in the slow conversion. But I think redirecting the constructor SparseMatrixCSC to the fast specialized sparse seems to make sense more generally. Any opinions? I can make a PR.

Edit: As an alternative, we could move the current, specific code from sparse to the constructors SparseMatrixCSC(::StructuredMatrix), and let sparse call SparseMatrixCSC. I think this is how its done for usual matrices, and we might get rid of specialized sparse methods anyway, since the sparse(::AbstractMatrix) method would handle it and pass to the specialized constructor. Since there are many ways to Rome here, I'd appreciate some advice on which way is more conformant with certain style rules or whatever. In any case, we should treat all structured matrices, not only the Diagonals.