I'm opening this issue following brief discussions on slack with @mbauman @StefanKarpinski and @MikeInnes
using LinearAlgebra, SparseArrays
rng = MersenneTwister(123456)
A = sparse(I, 3, 3)
B = broadcast(_->randn(rng), A)
makes B
a sparse matrix with entries
3×3 SparseMatrixCSC{Any,Int64} with 9 stored entries:
[1, 1] = 1.82299
[2, 1] = -0.620399
[3, 1] = -0.620399
[1, 2] = -0.620399
[2, 2] = 1.06094
[3, 2] = -0.620399
[1, 3] = -0.620399
[2, 3] = -0.620399
[3, 3] = 0.54373
where for each non-zero entry in the matrix a separate value has been sampled, but a single value has been sampled and used to populate all of the previously-zero elements. This is apparently a design choice, which can be summarised as assuming that a function broadcasted over a sparse array is pure. I believe that it reasonable to revisit this choice as
using Random
rng, rng′ = MersenneTwister(123456), MersenneTwister(123456)
broadcast(_->randn(rng), Matrix(SparseMatrix(I, 3, 3)))
Matrix(broadcast(_->randn(rng′), SparseMatrix(I, 3, 3)))
not yielding the same answers ie. the result of our operations depends crucially on the order in which we convert between (supposedly) equivalent representations of the same matrix.
In short, I would propose changing this behaviour in favour of consistency. (It's possible that the output should automatically be converted to a dense matrix.)
Thanks for putting this issue together. I would tend to agree — the savings from avoiding the repeated function calls and filling with the same value are likely to be completely swamped by the huge performance cost that comes from blowing up the storage of the sparse array. I would suspect that's why this hasn't garnered much attention; it's really something I'd ideally avoid doing in the first place.
A simpler example is SparseMatrixCSC(I, 3, 3) .+ rand.()
.
CC @Sacha0.
by the huge performance cost that comes from blowing up the storage of the sparse array
And what if it isn't blown up (f(0) = 0
)
Ah, right, of course — thanks. We need to assume that f(0)
is constant and side-effect free if we want the O(nnz) optimization when the first evaluation of f(0) == 0
.
I agree that optimising for f(0) == 0
makes a lot of sense, provided that guarantees can be provided regarding its purity as @mbauman points out. We cannot safely assume this in general though.
Without that optimization, broadcasting on sparse arrays is pretty much useless. There is an argument for consistency, but a more important argument to me is usability. Making things consistent and the cost of making them unusable is not a good trade-off in my world.
Striving for f(m::SparseArray) == f(Array(m))
is a good idea, but it always needs to be kept in check by the reason we are using sparse data structures in the first place.
As another example:
julia> spzeros(2,2) * fill(NaN, 2)
2-element Array{Float64,1}:
0.0
0.0
Yup. We simply don't have any generic mechanism that flags or otherwise detects a "pure" function. One interesting thought would be to ask inference if constant propagation led it to infer a constant zero, but that would still be seriously limiting.
We need to assume that f(0) is constant and side-effect free if we want the O(nnz) optimization when the first evaluation of f(0) == 0.
Without that optimization, broadcasting on sparse arrays is pretty much useless.
We simply don't have any generic mechanism that flags or otherwise detects a "pure" function.
^ These :).
Okay, I take your point regarding usability for the case that f(0) == 0
. It is a shame that we don't have a mechanism for determining purity straightforwardly...
Could this assumption regarding purity please be highlighted in both SparseArrays
and broadcast
's documentation / docstrings? If there's an overhaul planned then maybe now is a good time to do it?
Could this assumption regarding purity please be highlighted in both
SparseArrays
andbroadcast
's documentation / docstrings?
Pull requests to make this more explicit would be welcome. You're in a good position to write them since you know what users in a similar situation as you might be looking for.
Most helpful comment
Without that optimization, broadcasting on sparse arrays is pretty much useless. There is an argument for consistency, but a more important argument to me is usability. Making things consistent and the cost of making them unusable is not a good trade-off in my world.
Striving for
f(m::SparseArray) == f(Array(m))
is a good idea, but it always needs to be kept in check by the reason we are using sparse data structures in the first place.As another example:
See also https://github.com/JuliaLang/julia/issues/22733.