There appears to be quite poor performance when adding a Diagonal
matrix to a matrix of type SparseMatrixCSC
. MWE:
N = 63001
@time Diagonal(ones(N)) + spdiagm(0 => ones(N));
@time spdiagm(0 => ones(N)) + spdiagm(0 => ones(N));
gives
julia> @time Diagonal(ones(N)) + spdiagm(0 => ones(N))
10.323352 seconds (77 allocations: 12.579 MiB)
julia> @time spdiagm(0 => ones(N)) + spdiagm(0 => ones(N))
0.053357 seconds (71 allocations: 12.982 MiB, 80.98% gc time)
@briochemc also reports the same (see #linear-algebra in the julialang Slack)
julia> v = ones(1000) ;
julia> @btime spdiagm(0 => $v) + spdiagm(0 => $v) ; # fast
85.206 μs (40 allocations: 198.98 KiB)
julia> @btime sparse(Diagonal($v)) + spdiagm(0 => $v) ; # fastest
55.500 μs (26 allocations: 143.19 KiB)
julia> @btime Diagonal($v) + spdiagm(0 => $v) ; # slow
5.564 ms (45 allocations: 191.73 KiB)
Just converting the Diagonal
to sparse
seems like a good solution? I don't think the solution here is to add a specialized version because there are practically infinite number of combinations of operators and wrappers of abstract arrays that would need to be considered.
Just converting the
Diagonal
tosparse
seems like a good solution? I don't think the solution here is to add a specialized version because there are practically infinite number of combinations of operators and wrappers of abstract arrays that would need to be considered.
Is this a case automatically covered by #31563?
This is actually a really easy fix — we're already attempting to convert the Diagonal
to sparse
but it seems we're doing it in a really poor way.
# This is the function broadcast uses to convert structured array types to sparse
julia> @time SparseArrays.HigherOrderFns._sparsifystructured(Diagonal(ones(63001)));
8.233826 seconds (42 allocations: 2.963 MiB)
# This is what it uses to do its work — it's a naive O(N^2) algorithm!
julia> @time SparseMatrixCSC(Diagonal(ones(63001)));
8.216935 seconds (88 allocations: 2.966 MiB)
# But we have this specialization that's not being used!
julia> @time sparse(Diagonal(ones(63001)));
0.043160 seconds (70.57 k allocations: 5.605 MiB)
For sure. I think expected behavior here should be the same as calling sparse(Diagonal(v))
before the sum. In particular, as @mbauman mentioned, the resulting matrix is indeed of type SparseMatrixCSC (not dense), but the conversion is extremely slow, which is wholly unexpected.
I agree with @mbauman's suggestion to make sure the conversion is fast
I also thought I should point out to @angeris, to be exhaustive, that Diagonal(v) + Diagonal(v)
is much faster than all other cases:
julia> v = ones(1000) ;
julia> @btime spdiagm(0 => $v) + spdiagm(0 => $v) ; # fast
79.443 μs (44 allocations: 199.05 KiB)
julia> @btime sparse(Diagonal($v)) + spdiagm(0 => $v) ; # fastest
49.983 μs (26 allocations: 143.19 KiB)
julia> @btime Diagonal($v) + spdiagm(0 => $v) ; # slow
5.556 ms (45 allocations: 191.73 KiB)
julia> @btime Diagonal($v) + Diagonal($v) ; # faster than fastest ;)
1.310 μs (2 allocations: 7.95 KiB)
So it depends what one plans to do with the matrix because sometimes it is better not to do the conversion to sparse. For example,
julia> v1, v2, v3 = rand(1000), rand(1000), rand(1000) ;
julia> @btime $v1' * Diagonal($v2) * $v3 ;
1.751 μs (3 allocations: 64 bytes)
julia> @btime $v1' * sparse(Diagonal($v2)) * $v3 ;
8.863 μs (6 allocations: 31.88 KiB)
@briochemc Indeed! This was an MWE which came up because I was generating a SparseMatrixCSC from a large, 2D laplacian-type matrix (rather than a diagonal one, via spdiagm
) and was then adding it to one of type Diagonal
:)
Thanks for the latter example... it's super interesting: I didn't realize the performance would be 8 times worse with the sparse conversion!
What do you think about that proposal https://github.com/JuliaLang/julia/pull/31563, which would simplify the the solution considerably in view of the problem
there are practically infinite number of combinations of operators and wrappers of abstract arrays
julia> v1, v2, v3 = rand(1000), rand(1000), rand(1000) ;
julia> @btime $v1' * Diagonal($v2) * $v3 ;
1.751 μs (3 allocations: 64 bytes)
is fast because it has a special method, which, in a single loop goes through v1
, v2
and v3
, takes the product and adds them up. That method has a signature (::Adjoint, ::Diagonal, ::(Abstract/Strided)Vector)
, and won't be affected by whatever we do with the broadcasting. So, how about adding these
SparseMatrixCSC(T::SymTridiagonal) = sparse(T)
SparseMatrixCSC(T::Tridiagonal) = sparse(T)
SparseMatrixCSC(B::Bidiagonal) = sparse(B)
SparseMatrixCSC(D::Diagonal) = sparse(D)
The other, equivalent option is to add in HigherOrderFns
_sparsifystructured(D::Diagonal{<:Number}) = sparse(D)
which would restrict its effect on broadcasting aspects. The equivalence stems from the fact that we have
_sparsifystructured(M::AbstractMatrix) = SparseMatrixCSC(M)
and since the structured matrices are not Matrix
but AbstractMatrix
, calling sparse
on them does not end up in the fast, but in the slow conversion. But I think redirecting the constructor SparseMatrixCSC
to the fast specialized sparse
seems to make sense more generally. Any opinions? I can make a PR.
Edit: As an alternative, we could move the current, specific code from sparse
to the constructors SparseMatrixCSC(::StructuredMatrix)
, and let sparse
call SparseMatrixCSC
. I think this is how its done for usual matrices, and we might get rid of specialized sparse
methods anyway, since the sparse(::AbstractMatrix)
method would handle it and pass to the specialized constructor. Since there are many ways to Rome here, I'd appreciate some advice on which way is more conformant with certain style rules or whatever. In any case, we should treat all structured matrices, not only the Diagonal
s.
I would fully support moving the relevant sparse
specializations onto SparseMatrixCSC
(your "edit" alternative).
Most helpful comment
I would fully support moving the relevant
sparse
specializations ontoSparseMatrixCSC
(your "edit" alternative).