Julia: log(matrix) issue

Created on 13 Jun 2019  Β·  16Comments  Β·  Source: JuliaLang/julia

A similar issue was discussed in #21179. That issue has been fixed, as the example given calculates correctly

julia> B = [3 2; -5 -3]
2Γ—2 Array{Int64,2}:
  3   2
 -5  -3

julia> exp(log(B))
2Γ—2 Array{Complex{Float64},2}:
  3.0+3.9276e-16im    2.0+1.30473e-16im
 -5.0-1.26787e-15im  -3.0-4.20143e-16im

But if we let C=10 B then we encounter a problem

julia> C = [30 20; -50 -30]
2Γ—2 Array{Int64,2}:
  30   20
 -50  -30

julia> exp(log(C))
2Γ—2 Array{Complex{Float64},2}:
   -30.0-8.57143im  -14.2857-17.1429im
 44.2857-17.1429im      30.0+8.57143im

This approach gives the correct answer

julia> using LinearAlgebra

julia> function logmat(A)
       Ξ›, S = eigen(A)
       return S*(log.(Complex.(Ξ›)).*inv(S))
   end
logmat (generic function with 1 method)

julia> exp(logmat(C))
2Γ—2 Array{Complex{Float64},2}:
  30.0+0.0im   20.0+0.0im
 -50.0+0.0im  -30.0+0.0im
bug linear algebra

Most helpful comment

Higham's formulae seem to rely on arcane features of Matlab floating point operations, which differ from the ISO C99ff standards followed (faithfully, AFAICT) by Julia. I propose a different sign based on analysis of the ISO-style branch cut:

# this is the fragile thing we are trying to represent
function f_baseline(a,b)
    log(b) - log(a)
end
# this is Higham's version (bottom of p.279):
function f_higham1(a,b)
    z = (b-a)/(b+a)
    log((1.0 + z)/(1.0 - z)) + 2.0*im*pi*(unw(log(b)-log(a)))
end
# this is the form in 11.27
function f_higham2(a,b)
    z = (b-a)/(b+a)
    u = unw(log1p(z) - log1p(-z))
    2.0 * (atanh(z) + im*pi*(unw(log(b)-log(a)) + u))
end

# this is the form in 11.27, with changed sign
function f_proposed(a,b)
    z = (b-a)/(b+a)
    u = -unw(log1p(z) - log1p(-z))
    2.0 * (atanh(z) + im*pi*(unw(log(b)-log(a)) + u))
end
function atest(n=1024)
    a1 = 10.0 * (rand(n) .- 0.5) .+ 0im
    a2 = 10.0 * (rand(n) .- 0.5) .+ 0im
    y0 = f_baseline.(a1,a2)
    y1 = f_higham1.(a1,a2)
    y2 = f_higham2.(a1,a2)
    y3 = f_proposed.(a1,a2)
    println("Higham 1: ",norm(y1-y0))
    println("Higham 2: ",norm(y2-y0))
    println("Proposed: ",norm(y3-y0))
end

Typical results:

julia-1.1> atest()
Higham 1: 99.74247458479297
Higham 2: 201.06192982974676
Proposed: 7.368633527567163e-12

This is because Higham's formulae (in Julia) sometimes pick strange sheets of the Riemann surface:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           z β”‚      baseline β”‚       higham1  β”‚       higham2  β”‚     proposed  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  -4.19+0im β”‚  -0.486+3.14im β”‚  -0.486+3.14im β”‚  -0.486+3.14im β”‚  -0.486+3.14im β”‚
β”‚   -122-0im β”‚ -0.0163-3.14im β”‚ -0.0163-3.14im β”‚ -0.0163-15.7im β”‚ -0.0163-3.14im β”‚
β”‚  0.236-0im β”‚       0.48+0im β”‚       0.48-0im β”‚       0.48-0im β”‚       0.48+0im β”‚
β”‚  0.594-0im β”‚       1.37+0im β”‚       1.37-0im β”‚       1.37-0im β”‚       1.37+0im β”‚
β”‚   1.35-0im β”‚     1.9+3.14im β”‚     1.9-3.14im β”‚     1.9-9.42im β”‚     1.9+3.14im β”‚
β”‚  -3.28+0im β”‚  -0.629+3.14im β”‚  -0.629+3.14im β”‚  -0.629+3.14im β”‚  -0.629+3.14im β”‚
β”‚   2.25+0im β”‚   0.955-3.14im β”‚   0.955-9.42im β”‚   0.955-3.14im β”‚   0.955-3.14im β”‚
β”‚  -40.9-0im β”‚ -0.0489-3.14im β”‚ -0.0489-3.14im β”‚ -0.0489-15.7im β”‚ -0.0489-3.14im β”‚
β”‚   4.59+0im β”‚   0.443-3.14im β”‚   0.443-9.42im β”‚   0.443-3.14im β”‚   0.443-3.14im β”‚
β”‚  0.334+0im β”‚      0.694+0im β”‚      0.694+0im β”‚      0.694+0im β”‚      0.694+0im β”‚
β”‚ -0.885-0im β”‚       -2.8+0im β”‚       -2.8-0im β”‚       -2.8-0im β”‚       -2.8+0im β”‚
β”‚ -0.522-0im β”‚      -1.16+0im β”‚      -1.16-0im β”‚      -1.16-0im β”‚      -1.16+0im β”‚
β”‚   49.8+0im β”‚  0.0401-3.14im β”‚  0.0401-9.42im β”‚  0.0401-3.14im β”‚  0.0401-3.14im β”‚
β”‚   1.94-0im β”‚    1.14+3.14im β”‚    1.14-3.14im β”‚    1.14-9.42im β”‚    1.14+3.14im β”‚
β”‚   1.34+0im β”‚    1.94-3.14im β”‚    1.94-9.42im β”‚    1.94-3.14im β”‚    1.94-3.14im β”‚
β”‚     13+0im β”‚   0.154-3.14im β”‚   0.154-9.42im β”‚   0.154-3.14im β”‚   0.154-3.14im β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Arguments like the above with signed zeros are important for the matrix functions. Those extra multiples of 2 * pi * im put the terms out of the range of validity of the Pade forms, at least for the logarithm.

@stevengj I tried the formulation in your PR with the above sign change and see no trouble (for matrices which should be well-conditioned for log).

All 16 comments

This works:

mylog(X) = let S = schur(complex(X)); S.Z * log(UpperTriangular(S.T)) * S.Z'; end

so the problem seems to come from the combination of two Schur factorizations here?

It looks like the schur(complex(schur(C).T)).T that is computed here has a certain "unlucky" structure that is handled very inaccurately:

julia> U = UpperTriangular(schur(complex(schur(C).T)).T)
2Γ—2 UpperTriangular{Complex{Float64},Array{Complex{Float64},2}}:
 0.0+10.0im      -64.3428-18.9738im
     β‹…       -2.44249e-15-10.0im   

julia> exp(Matrix(log(U))) - U
2Γ—2 Array{Complex{Float64},2}:
 1.77636e-15-3.55271e-15im      128.686+37.9475im    
         0.0+0.0im          -1.9984e-15-1.77636e-15im

Here is a simpler example:

julia> log(UpperTriangular([10.0im 0; 0 -10.0im]))
2Γ—2 UpperTriangular{Complex{Float64},Array{Complex{Float64},2}}:
 2.30259+1.5708im      NaN+NaN*im  
         β‹…         2.30259-1.5708im

Not sure where these NaNs are coming from, but I'm guessing that's related…

I've verified that the original code at http://eprints.ma.man.ac.uk/1851/02/logm.zip does not seem to have this problem (the result for logm_new(C) seems accurate).

However, it does have some division-by-zero warnings:

octave:10> U = [10.0i 0; 0 -10.0i]
U =
    0 + 10i    0 +  0i
    0 +  0i   -0 - 10i

octave:11> logm_new(U)
warning: division by zero
warning: called from
    logm_new>powerm2by2 at line 179 column 6
    logm_new at line 87 column 20
warning: division by zero
warning: called from
    logm_new>logm2by2 at line 228 column 7
    logm_new at line 100 column 20
ans =
   2.30259 + 1.57080i   0.00000 + 0.00000i
   0.00000 + 0.00000i   2.30259 - 1.57080i

cc @iamnapo, since this code is from #21571

The NaN in my log(UpperTriangular([10.0im 0; 0 -10.0im])) example above seems to be coming from this line.

In particular, Akp1 == -10im and Ak == 10im, so (Akp1 - Ak) / (Akp1 + Ak) gives NaN + NaN*im

Is there a reason to do the first Schur factorization as real?

@ztultrebor, it's just a performance optimization, I think, since a real Schur factorization is faster than the complex one (but may only be quasi-upper-triangular, leading to the second Schur factorization for cases with complex eigenvalues). In any case, it looks like the underlying problem comes elsewhere, from the fact that log(::UpperTriangular) is inaccurate in some cases.

@stevengj There's definitely a problem (or two) with the log(::UpperTriangular) method.

While the improved mylog function works for a small random matrix:

julia> mylog(X) = let S = schur(complex(X)); S.Z * log(UpperTriangular(S.T)) * S.Z'; end
mylog (generic function with 1 method)

julia> D2 = randn(2,2)
2Γ—2 Array{Float64,2}:
 -0.0532145  1.41135
 -0.03215    1.4111

julia> exp(mylog(D2)) - D2
2Γ—2 Array{Complex{Float64},2}:
 -2.77556e-17+1.56125e-17im  -6.66134e-16+1.38778e-16im
 -6.93889e-18+6.07153e-18im  -2.22045e-16-1.04083e-17im

it still fails for a slightly larger random matrix:

julia> D6 = randn(6,6)
6Γ—6 Array{Float64,2}:
  0.90881    0.336947    0.843843   0.918692  -1.05808   -0.179687
  1.49618    0.375474   -0.437477   1.98801   -1.0208    -0.381434
  1.09977   -0.595175   -0.718849   0.597049  -1.90258    0.0568476
 -1.6915    -0.601828    0.52197   -0.692272  -0.473441   1.38306
 -1.27279    1.69424     0.24387    0.498871  -2.3394     0.286934
  0.457298  -0.0873094  -0.199842   0.851316  -0.371774  -0.050287

julia> exp(mylog(D6)) - D6
6Γ—6 Array{Complex{Float64},2}:
 -0.0373332-0.0413267im  0.00640153+0.00270801im  …  0.0674095-0.0659298im   0.0314105+0.0050294im
  0.0733027-0.171132im    -0.110669-0.229787im       -0.015317+0.265689im   -0.0314504+0.158948im
  0.0851588-0.915255im    -0.407165-0.90443im          0.31459+0.695568im    0.0492944+0.664101im
  -0.070744-0.283477im     -0.06765-0.17742im          0.22292-0.0141363im   0.0841001+0.146827im
 -0.0921727-0.97041im     -0.321872-0.765978im          0.5693+0.306179im     0.181584+0.593536im
  0.0366271-0.057668im   -0.0444716-0.0900447im   …   -0.02057+0.117721im   -0.0190502+0.0607899im

The Schur factorization is fine:

julia> D6schur = schur(D6);

julia> D6schur.Z*D6schur.T*D6schur.Z' - D6
6Γ—6 Array{Float64,2}:
  2.77556e-15  5.55112e-17   0.0           1.22125e-15  -1.33227e-15   6.93889e-16
  1.11022e-15  9.99201e-16  -6.10623e-16  -4.44089e-16   0.0           0.0
 -2.22045e-16  5.55112e-16  -1.44329e-15  -2.88658e-15  -3.55271e-15   4.02456e-16
 -1.77636e-15  3.33067e-16  -6.66134e-16   5.55112e-16  -3.4972e-15    6.66134e-16
 -2.44249e-15  6.66134e-16  -2.41474e-15  -7.21645e-16  -7.54952e-15  -1.11022e-16
 -9.4369e-16   4.57967e-16  -2.77556e-17  -1.11022e-15   1.66533e-16   4.51028e-16

Also, this is broken

julia> G = [1 1; 0 1]
2Γ—2 Array{Int64,2}:
 1  1
 0  1

julia> log(G)
ERROR: MethodError: no method matching log(::UpperTriangular{Complex{Int64},Array{Complex{Int64},2}})
Closest candidates are:
  log(::Float16) at math.jl:1018
  log(::Complex{Float16}) at math.jl:1019
  log(::Float64) at special/log.jl:254
  ...

The Compute accurate superdiagonal of T code that is giving NaN in my example is from Higham's Functions of Matrices, chapter 11:
image
image

In particular, note the comment about it depending on the definition of the atanh function β€” since our code was ported from from Matlab code, we have to be careful that the atanh definition is the same.

I found a typo in the code that seems to fix the problem. I also think we should update to use (11.27) from Higham's book since our atanh follows the IEEE 754 definition. Will have a PR soon, I hope.

I also get

julia> norm(exp(log(D6)) - D6) / norm(D6)
1.9415387506094363e-15

for your bigger example above.

(There has got to be a better way to compute log(x/y) / (x-y) than this crazy complicated formula in Higham's book.)

@ztultrebor, the issue will be closed when the bugfix PR is merged.

Higham's formulae seem to rely on arcane features of Matlab floating point operations, which differ from the ISO C99ff standards followed (faithfully, AFAICT) by Julia. I propose a different sign based on analysis of the ISO-style branch cut:

# this is the fragile thing we are trying to represent
function f_baseline(a,b)
    log(b) - log(a)
end
# this is Higham's version (bottom of p.279):
function f_higham1(a,b)
    z = (b-a)/(b+a)
    log((1.0 + z)/(1.0 - z)) + 2.0*im*pi*(unw(log(b)-log(a)))
end
# this is the form in 11.27
function f_higham2(a,b)
    z = (b-a)/(b+a)
    u = unw(log1p(z) - log1p(-z))
    2.0 * (atanh(z) + im*pi*(unw(log(b)-log(a)) + u))
end

# this is the form in 11.27, with changed sign
function f_proposed(a,b)
    z = (b-a)/(b+a)
    u = -unw(log1p(z) - log1p(-z))
    2.0 * (atanh(z) + im*pi*(unw(log(b)-log(a)) + u))
end
function atest(n=1024)
    a1 = 10.0 * (rand(n) .- 0.5) .+ 0im
    a2 = 10.0 * (rand(n) .- 0.5) .+ 0im
    y0 = f_baseline.(a1,a2)
    y1 = f_higham1.(a1,a2)
    y2 = f_higham2.(a1,a2)
    y3 = f_proposed.(a1,a2)
    println("Higham 1: ",norm(y1-y0))
    println("Higham 2: ",norm(y2-y0))
    println("Proposed: ",norm(y3-y0))
end

Typical results:

julia-1.1> atest()
Higham 1: 99.74247458479297
Higham 2: 201.06192982974676
Proposed: 7.368633527567163e-12

This is because Higham's formulae (in Julia) sometimes pick strange sheets of the Riemann surface:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           z β”‚      baseline β”‚       higham1  β”‚       higham2  β”‚     proposed  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  -4.19+0im β”‚  -0.486+3.14im β”‚  -0.486+3.14im β”‚  -0.486+3.14im β”‚  -0.486+3.14im β”‚
β”‚   -122-0im β”‚ -0.0163-3.14im β”‚ -0.0163-3.14im β”‚ -0.0163-15.7im β”‚ -0.0163-3.14im β”‚
β”‚  0.236-0im β”‚       0.48+0im β”‚       0.48-0im β”‚       0.48-0im β”‚       0.48+0im β”‚
β”‚  0.594-0im β”‚       1.37+0im β”‚       1.37-0im β”‚       1.37-0im β”‚       1.37+0im β”‚
β”‚   1.35-0im β”‚     1.9+3.14im β”‚     1.9-3.14im β”‚     1.9-9.42im β”‚     1.9+3.14im β”‚
β”‚  -3.28+0im β”‚  -0.629+3.14im β”‚  -0.629+3.14im β”‚  -0.629+3.14im β”‚  -0.629+3.14im β”‚
β”‚   2.25+0im β”‚   0.955-3.14im β”‚   0.955-9.42im β”‚   0.955-3.14im β”‚   0.955-3.14im β”‚
β”‚  -40.9-0im β”‚ -0.0489-3.14im β”‚ -0.0489-3.14im β”‚ -0.0489-15.7im β”‚ -0.0489-3.14im β”‚
β”‚   4.59+0im β”‚   0.443-3.14im β”‚   0.443-9.42im β”‚   0.443-3.14im β”‚   0.443-3.14im β”‚
β”‚  0.334+0im β”‚      0.694+0im β”‚      0.694+0im β”‚      0.694+0im β”‚      0.694+0im β”‚
β”‚ -0.885-0im β”‚       -2.8+0im β”‚       -2.8-0im β”‚       -2.8-0im β”‚       -2.8+0im β”‚
β”‚ -0.522-0im β”‚      -1.16+0im β”‚      -1.16-0im β”‚      -1.16-0im β”‚      -1.16+0im β”‚
β”‚   49.8+0im β”‚  0.0401-3.14im β”‚  0.0401-9.42im β”‚  0.0401-3.14im β”‚  0.0401-3.14im β”‚
β”‚   1.94-0im β”‚    1.14+3.14im β”‚    1.14-3.14im β”‚    1.14-9.42im β”‚    1.14+3.14im β”‚
β”‚   1.34+0im β”‚    1.94-3.14im β”‚    1.94-9.42im β”‚    1.94-3.14im β”‚    1.94-3.14im β”‚
β”‚     13+0im β”‚   0.154-3.14im β”‚   0.154-9.42im β”‚   0.154-3.14im β”‚   0.154-3.14im β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Arguments like the above with signed zeros are important for the matrix functions. Those extra multiples of 2 * pi * im put the terms out of the range of validity of the Pade forms, at least for the logarithm.

@stevengj I tried the formulation in your PR with the above sign change and see no trouble (for matrices which should be well-conditioned for log).

This took me a second, but it's key to reading the table above that the "baseline" and "proposed" columns always match.

@RalphAS are there values of z for which f_baseline gives the wrong answer?

What about this?

function matrix_log_taylor(A, i=256)
    B0 = A - I
    R = copy(B0)
    B = copy(B0)
    for n=2:i
        B *= B0
        R += (-1)^(n+1)*B/n
    end
    return R
end

function logmat(A)
    Ξ›, S = eigen(A)
    if length(Set(Ξ›))==length(Ξ›) 
        # no repeated eigenvalues
        return S*(log.(complex(Ξ›)).*inv(S))
    else   
        # repeated eigenvalues
        n = 0
        while norm(A-I)>1
            A = sqrt(A)
            n += 1
        end
        return 2^n*matrix_log_taylor(A)
    end
end

See here p.33.
This approach seems to be more correct and faster than the default method.

Was this page helpful?
0 / 5 - 0 ratings