Julia: Very Poor Performance of Diagonal + SparseMatrixCSC

Created on 19 Apr 2019  Â·  9Comments  Â·  Source: JuliaLang/julia

There appears to be quite poor performance when adding a Diagonal matrix to a matrix of type SparseMatrixCSC. MWE:

N = 63001
@time Diagonal(ones(N)) + spdiagm(0 => ones(N));
@time spdiagm(0 => ones(N)) +  spdiagm(0 => ones(N));

gives

julia> @time Diagonal(ones(N)) + spdiagm(0 => ones(N))
 10.323352 seconds (77 allocations: 12.579 MiB)
julia> @time spdiagm(0 => ones(N)) + spdiagm(0 => ones(N))
  0.053357 seconds (71 allocations: 12.982 MiB, 80.98% gc time)

@briochemc also reports the same (see #linear-algebra in the julialang Slack)

julia> v = ones(1000) ;

julia> @btime spdiagm(0 => $v) + spdiagm(0 => $v) ; # fast
  85.206 μs (40 allocations: 198.98 KiB)

julia> @btime sparse(Diagonal($v)) + spdiagm(0 => $v) ; # fastest
  55.500 μs (26 allocations: 143.19 KiB)

julia> @btime Diagonal($v) + spdiagm(0 => $v) ; # slow
  5.564 ms (45 allocations: 191.73 KiB)
broadcast performance sparse

Most helpful comment

I would fully support moving the relevant sparse specializations onto SparseMatrixCSC (your "edit" alternative).

All 9 comments

Just converting the Diagonal to sparse seems like a good solution? I don't think the solution here is to add a specialized version because there are practically infinite number of combinations of operators and wrappers of abstract arrays that would need to be considered.

Just converting the Diagonal to sparse seems like a good solution? I don't think the solution here is to add a specialized version because there are practically infinite number of combinations of operators and wrappers of abstract arrays that would need to be considered.

Is this a case automatically covered by #31563?

This is actually a really easy fix — we're already attempting to convert the Diagonal to sparse but it seems we're doing it in a really poor way.

# This is the function broadcast uses to convert structured array types to sparse
julia> @time SparseArrays.HigherOrderFns._sparsifystructured(Diagonal(ones(63001)));
  8.233826 seconds (42 allocations: 2.963 MiB)

# This is what it uses to do its work — it's a naive O(N^2) algorithm!
julia> @time SparseMatrixCSC(Diagonal(ones(63001)));
  8.216935 seconds (88 allocations: 2.966 MiB)

# But we have this specialization that's not being used!
julia> @time sparse(Diagonal(ones(63001)));
  0.043160 seconds (70.57 k allocations: 5.605 MiB)

For sure. I think expected behavior here should be the same as calling sparse(Diagonal(v)) before the sum. In particular, as @mbauman mentioned, the resulting matrix is indeed of type SparseMatrixCSC (not dense), but the conversion is extremely slow, which is wholly unexpected.

I agree with @mbauman's suggestion to make sure the conversion is fast

I also thought I should point out to @angeris, to be exhaustive, that Diagonal(v) + Diagonal(v) is much faster than all other cases:

julia> v = ones(1000) ;

julia> @btime spdiagm(0 => $v) + spdiagm(0 => $v) ; # fast
  79.443 μs (44 allocations: 199.05 KiB)

julia> @btime sparse(Diagonal($v)) + spdiagm(0 => $v) ; # fastest
  49.983 μs (26 allocations: 143.19 KiB)

julia> @btime Diagonal($v) + spdiagm(0 => $v) ; # slow
  5.556 ms (45 allocations: 191.73 KiB)

julia> @btime Diagonal($v) + Diagonal($v) ; # faster than fastest ;)
  1.310 μs (2 allocations: 7.95 KiB)

So it depends what one plans to do with the matrix because sometimes it is better not to do the conversion to sparse. For example,

julia> v1, v2, v3 = rand(1000), rand(1000), rand(1000) ;

julia> @btime $v1' * Diagonal($v2) * $v3 ;
  1.751 μs (3 allocations: 64 bytes)

julia> @btime $v1' * sparse(Diagonal($v2)) * $v3 ;
  8.863 μs (6 allocations: 31.88 KiB)

@briochemc Indeed! This was an MWE which came up because I was generating a SparseMatrixCSC from a large, 2D laplacian-type matrix (rather than a diagonal one, via spdiagm) and was then adding it to one of type Diagonal :)

Thanks for the latter example... it's super interesting: I didn't realize the performance would be 8 times worse with the sparse conversion!

What do you think about that proposal https://github.com/JuliaLang/julia/pull/31563, which would simplify the the solution considerably in view of the problem

there are practically infinite number of combinations of operators and wrappers of abstract arrays

julia> v1, v2, v3 = rand(1000), rand(1000), rand(1000) ;

julia> @btime $v1' * Diagonal($v2) * $v3 ;
  1.751 μs (3 allocations: 64 bytes)

is fast because it has a special method, which, in a single loop goes through v1, v2 and v3, takes the product and adds them up. That method has a signature (::Adjoint, ::Diagonal, ::(Abstract/Strided)Vector), and won't be affected by whatever we do with the broadcasting. So, how about adding these

SparseMatrixCSC(T::SymTridiagonal) = sparse(T)
SparseMatrixCSC(T::Tridiagonal) = sparse(T)
SparseMatrixCSC(B::Bidiagonal) = sparse(B)
SparseMatrixCSC(D::Diagonal) = sparse(D)

The other, equivalent option is to add in HigherOrderFns

_sparsifystructured(D::Diagonal{<:Number}) = sparse(D)

which would restrict its effect on broadcasting aspects. The equivalence stems from the fact that we have

_sparsifystructured(M::AbstractMatrix) = SparseMatrixCSC(M)

and since the structured matrices are not Matrix but AbstractMatrix, calling sparse on them does not end up in the fast, but in the slow conversion. But I think redirecting the constructor SparseMatrixCSC to the fast specialized sparse seems to make sense more generally. Any opinions? I can make a PR.

Edit: As an alternative, we could move the current, specific code from sparse to the constructors SparseMatrixCSC(::StructuredMatrix), and let sparse call SparseMatrixCSC. I think this is how its done for usual matrices, and we might get rid of specialized sparse methods anyway, since the sparse(::AbstractMatrix) method would handle it and pass to the specialized constructor. Since there are many ways to Rome here, I'd appreciate some advice on which way is more conformant with certain style rules or whatever. In any case, we should treat all structured matrices, not only the Diagonals.

I would fully support moving the relevant sparse specializations onto SparseMatrixCSC (your "edit" alternative).

Was this page helpful?
0 / 5 - 0 ratings