IEEE 754 has the unfortunate consequence that floating point numbers don't satisfy x*0 == 0
. That x*0 == 0
holds is fundamental to the decoupling between the symbolic and numerical computations for sparse matrices which is arguably one of the most important optimizations for sparse matrix algorithms. Right now, Julia's sparse linear algebra code uses the sparsity pattern optimization extensively and is therefore not IEEE compliant, e.g.
julia> eye(2)*[NaN,1.0]
2-element Array{Float64,1}:
NaN
NaN
julia> speye(2)*[NaN,1.0]
2-element Array{Float64,1}:
NaN
1.0
Another example is
julia> (-eye(2))[1,2]
-0.0
julia> (-speye(2))[1,2]
0.0
Notice that IEEE compliance for the latter example (with the current sparse matrix implementation) would require storing 2*n^2 + n + 1
instead of 3n + 1
elements. This is also known as MemoryError
.
The main question in this issue is to choose one of the two options below for sparse matrices (and probably all types with structural zeros)
x*0 == 0
for all x
x*0 != 0
for some x
aka IEEE 754where 0
is a structural zero. I think that 1. is the best choice since I believe the symbolic/numerical decoupling is very important when working with sparse matrices and I'm not sure if 2. desirable because of the computational and memory costs or if it is possible to achieve at all. I.e. one thing is that it would be a lot of work to handle the NaN
and Inf
cases in all of our sparse code and that it is unclear who would deliver the hours to do so but what about sparse direct solvers? They optimize over the sparsity pattern without considering NaN
s or Inf
s and I'm not sure how we could handle that case. In theory, we could also choose 2. for some linear algebra functions and 1. for others but I think that would be more confusing and error prone.
However, we'd need to figure out the implications of going with 1. Should we consider throwing more instead of creating NaN
s and Inf
during sparse calculation, e.g. division with zero? What about sparse broadcast which now (I believe) follows IEEE 754? Since broadcast
allows all kinds of operators, things are much more complicated there. If sparse broadcast behaves differently from sparse linear algebra, we'd need to be careful when using broadcast for convenience in the linear algebra code (but I think that would be pretty simple to handle.)
Finally, After discussing this issue in a couple of PRs, I'm pretty confident that we won't be able to reach a unanimous conclusion so when we think we understand the consequences, we'll probably need a vote. Otherwise, we'll keep discussing this everytime a PR touches one of the affected methods.
Option 1 results in undesirable behavioral discrepancies between sparse and dense representations of the same underlying mathematical object. Sparse arrays should be strictly a performance optimization, and not have vastly different behavior than their dense equivalents. If we can do IEEE754 correctly in a dense version of an operation, then we should do simple checks in sparse method implementations to get as compatible as reasonably possible.
Other than broadcast, we haven't especially tried and it's lead to obviously wrong behavior several times. Checking for non finite elements and falling back to a dense implementation can be more correct than what we have. It'll be expensive in corner cases that don't occur very often, but I think that's preferable to throwing or silently calculating a wrong answer.
I forgot to mention that I tested SuiteSparse, Scipy, and MATLAB. All three seem to follow 1. since they give the same result as us for the calculation speye(2)*[NaN,1.0]
. I gave up on PETSc.
Sparse arrays should be strictly a performance optimization, and not have vastly different behavior than their dense equivalents.
I don't agree but that is what this issue eventually will decide
Try speye(2) * Inf
, a simpler case where I don't think you'll see unanimous behavior in other buggy libraries. Last I checked Matlab does make an effort at IEEE754 consistency in some operations.
Given blas and lapack have bugs in some operations when inputs have non finite data, I don't think it's unreasonable to be careful about the corner cases when implementing that with a check and branch is simple, and think more carefully about the costs when the implementation would be more complicated, like for sparse-sparse multiplication or factorizations. Otherwise we're saying sparse-dense inconsistency is desirable because we're not willing to do a better job even when it wouldn't cost much in implementation complexity or runtime cost for the common cases?
+1 for documenting (perhaps as a warning admonition) that sparse linear algebra IEEE is not fulfilled or alt a best effort and if someone cares strongly about it, they could implement it as long as performance doesn't suffer (too much).
Seeing how pretty much the whole sparse linear algebra stack used throughout the decades have made this choice, I don't think we should worry much about it.
I don't agree but that is what this issue eventually will decide
What are they then if not a representation of the same mathematical object? Should inserting a stored zero significantly change the behavior of a sparse array? What should sparse .^ sparse
, sparse .^ 0
, and sparse ./ sparse
return? Are https://github.com/JuliaLang/julia/issues/21515#issuecomment-313868609 and #5824 bugs that should be fixed, or acceptable discrepancies? It's a bit dishonest to say the element type is really behaving as Float64
if we're not following IEEE754 as best as we can.
Negative zero is an interesting case where IEEE754 doesn't follow (a1 == a2 && b1 == b2) implies (op(a1, b1) == op(a2, b2))
. I believe the only non-NaN violation of that is in the sign of an infinity result, but if anyone has any other examples please correct me there (edit: branch cuts of log, or angle, for various quadrants of complex zero would also differ). Consistency of op(sparse(A), sparse(B)) ≈ op(Array(A), Array(B))
may actually be simpler to achieve for infinity and NaN values than for negative zeros.
I don't think being careful about infinity and NaN would be especially high cost in anything BLAS-2-like or simpler.
In IEEE754 0.0
and -0.0
double function as true zero and as (signed) very small number (e.g. 1/(-0.0) == -Inf
follows the small signed number interpretation).
One can argue that structural zeros are not IEEE754 zeros and that the elements of a sparse matrix are either IEEE754 floating point numbers or structural zeros (so strictly speaking a union type).
From this perspective speye(2)*[Inf,1.0] == [Inf, 1.0]
would be a consequence of structural zeros being unsigned true zeros following to some extend the laws of julia's Boolean zeros (== Any[1.0 false;false 1.0]*[Inf,1.0]
)
That seems like a bug in boolean multiplication, Inf * false
should probably be NaN
.
I doubt that + (i==j)*z
should produce NaN
for z = Inf
and i != j
(e.g. false
is not a "zero or very close to zero").
yes it is
julia> convert(Float64, false)
0.0
julia> *(promote(Inf, false)...)
NaN
But there is something to the idea that structural zeros are different from IEEE 0.0
, otherwise -speye(10000, 10000)
would need to be completely full, storing all the -0.0
s, which strikes me as rather impractical.
That seems like a bug in boolean multiplication,
Inf * false
should probably beNaN
.
No, this is intentional: bool*x
does bool ? x : zero(x)
. Whether it should or not is another question, but it was necessary to make im = Complex(false, true)
work. I believe the other way to make im
work is to have a pure Imaginary
type.
We should probably consider embracing the concept of a "strong zero" in the sense that false
is right now, and declare structural zeros as such strong zeros. The problem is that A[i,j]
doesn't tell you whether it acts as such a strong zero or not. Returning false
for getindex
of structural zeros would give a very coherent picture semantically, wouldn't it? Too bad this is likely a no-go performance-wise. But then again blindly accessing elements of a sparse matrix without ensuring beforehand to omit structural zeros is sub-optimal anyway. So combined with a way to efficiently iterate over stored elements...
I'm increasingly inclined to just unconditionally store values assigned into a sparse matrix, regardless of whether they are zero or not. So A[i,j] = ±0.0
would create a stored position even though the values are equal to zero. If you're assigning to enough positions in a sparse matrix that this matters, then you're already hosed anyway. Completely separating the structural computation from the numerical one is such a nice clear line in the sand. Of course, many operations do both, but if the structural part of an operations is completely independent of the computational part, it's much simpler to explain and understand what's happening. My suspicion is that all of the "optimizations" that don't store zeros that are created in a sparse matrix are useless anyway since the whole premise of a sparse structure is that you never actually touch most of its elements. By that premise you're not going to be assigning to most positions, so it's fine to store the ones you do. This viewpoint dovetails nicely with the concept of structural zeros as "strong zeros".
I agree with @tkelman that it would be better to follow IEEE semantics if that's possible without killing performance. Indeed, getindex
does not return false
or a special "strong zero" type when extracting non-stored zeros: it returns 0.0
. If Base methods for sparse arrays do not follow IEEE semantics, they are inconsistent with what user-written algorithms would "naively" compute when working with a sparse arrays just like any other AbstractArray
: they would break the interface.
In practice, I suppose that's generally not an issue, in particular in existing languages/toolkits, which are much less powerful than Julia in terms of abstractions and of the ability to define new types. This inconsistency turn out to be annoying in some corner cases, and these NaN
/Inf
issues are subtle enough that we don't need to aggravate matters.
The downside with strictly following IEEE semantics of course is that -A
would replace all structural zeros of A
with stored -0.0
s, which may be even more annoying than not following it.
Indeed, that would be terrible. Though that could be handled by allowing for a custom default value, which would also fix other issues (notably the x/0
problem related to NaN
).
that could be handled by allowing for a custom default value
still [A -A]
= 🔥
The Union{Bool, Float64}
model describes how sparse matrix algebra behaves, but neither this nor IEEE does force any meaning of the indexing operation, for
The problem is that A[i,j] doesn't tell you whether it acts as such a strong zero or not.
it should be enough for practical purposes to add something like isstored(A, i, j)
isstored
function.
Most helpful comment
I'm increasingly inclined to just unconditionally store values assigned into a sparse matrix, regardless of whether they are zero or not. So
A[i,j] = ±0.0
would create a stored position even though the values are equal to zero. If you're assigning to enough positions in a sparse matrix that this matters, then you're already hosed anyway. Completely separating the structural computation from the numerical one is such a nice clear line in the sand. Of course, many operations do both, but if the structural part of an operations is completely independent of the computational part, it's much simpler to explain and understand what's happening. My suspicion is that all of the "optimizations" that don't store zeros that are created in a sparse matrix are useless anyway since the whole premise of a sparse structure is that you never actually touch most of its elements. By that premise you're not going to be assigning to most positions, so it's fine to store the ones you do. This viewpoint dovetails nicely with the concept of structural zeros as "strong zeros".