A similar issue was discussed in #21179. That issue has been fixed, as the example given calculates correctly
julia> B = [3 2; -5 -3]
2Γ2 Array{Int64,2}:
3 2
-5 -3
julia> exp(log(B))
2Γ2 Array{Complex{Float64},2}:
3.0+3.9276e-16im 2.0+1.30473e-16im
-5.0-1.26787e-15im -3.0-4.20143e-16im
But if we let C=10 B
then we encounter a problem
julia> C = [30 20; -50 -30]
2Γ2 Array{Int64,2}:
30 20
-50 -30
julia> exp(log(C))
2Γ2 Array{Complex{Float64},2}:
-30.0-8.57143im -14.2857-17.1429im
44.2857-17.1429im 30.0+8.57143im
This approach gives the correct answer
julia> using LinearAlgebra
julia> function logmat(A)
Ξ, S = eigen(A)
return S*(log.(Complex.(Ξ)).*inv(S))
end
logmat (generic function with 1 method)
julia> exp(logmat(C))
2Γ2 Array{Complex{Float64},2}:
30.0+0.0im 20.0+0.0im
-50.0+0.0im -30.0+0.0im
This works:
mylog(X) = let S = schur(complex(X)); S.Z * log(UpperTriangular(S.T)) * S.Z'; end
so the problem seems to come from the combination of two Schur factorizations here?
It looks like the schur(complex(schur(C).T)).T
that is computed here has a certain "unlucky" structure that is handled very inaccurately:
julia> U = UpperTriangular(schur(complex(schur(C).T)).T)
2Γ2 UpperTriangular{Complex{Float64},Array{Complex{Float64},2}}:
0.0+10.0im -64.3428-18.9738im
β
-2.44249e-15-10.0im
julia> exp(Matrix(log(U))) - U
2Γ2 Array{Complex{Float64},2}:
1.77636e-15-3.55271e-15im 128.686+37.9475im
0.0+0.0im -1.9984e-15-1.77636e-15im
Here is a simpler example:
julia> log(UpperTriangular([10.0im 0; 0 -10.0im]))
2Γ2 UpperTriangular{Complex{Float64},Array{Complex{Float64},2}}:
2.30259+1.5708im NaN+NaN*im
β
2.30259-1.5708im
Not sure where these NaNs are coming from, but I'm guessing that's relatedβ¦
I've verified that the original code at http://eprints.ma.man.ac.uk/1851/02/logm.zip does not seem to have this problem (the result for logm_new(C)
seems accurate).
However, it does have some division-by-zero warnings:
octave:10> U = [10.0i 0; 0 -10.0i]
U =
0 + 10i 0 + 0i
0 + 0i -0 - 10i
octave:11> logm_new(U)
warning: division by zero
warning: called from
logm_new>powerm2by2 at line 179 column 6
logm_new at line 87 column 20
warning: division by zero
warning: called from
logm_new>logm2by2 at line 228 column 7
logm_new at line 100 column 20
ans =
2.30259 + 1.57080i 0.00000 + 0.00000i
0.00000 + 0.00000i 2.30259 - 1.57080i
cc @iamnapo, since this code is from #21571
The NaN
in my log(UpperTriangular([10.0im 0; 0 -10.0im]))
example above seems to be coming from this line.
In particular, Akp1 == -10im
and Ak == 10im
, so (Akp1 - Ak) / (Akp1 + Ak)
gives NaN + NaN*im
Is there a reason to do the first Schur factorization as real?
@ztultrebor, it's just a performance optimization, I think, since a real Schur factorization is faster than the complex one (but may only be quasi-upper-triangular, leading to the second Schur factorization for cases with complex eigenvalues). In any case, it looks like the underlying problem comes elsewhere, from the fact that log(::UpperTriangular)
is inaccurate in some cases.
@stevengj There's definitely a problem (or two) with the log(::UpperTriangular)
method.
While the improved mylog
function works for a small random matrix:
julia> mylog(X) = let S = schur(complex(X)); S.Z * log(UpperTriangular(S.T)) * S.Z'; end
mylog (generic function with 1 method)
julia> D2 = randn(2,2)
2Γ2 Array{Float64,2}:
-0.0532145 1.41135
-0.03215 1.4111
julia> exp(mylog(D2)) - D2
2Γ2 Array{Complex{Float64},2}:
-2.77556e-17+1.56125e-17im -6.66134e-16+1.38778e-16im
-6.93889e-18+6.07153e-18im -2.22045e-16-1.04083e-17im
it still fails for a slightly larger random matrix:
julia> D6 = randn(6,6)
6Γ6 Array{Float64,2}:
0.90881 0.336947 0.843843 0.918692 -1.05808 -0.179687
1.49618 0.375474 -0.437477 1.98801 -1.0208 -0.381434
1.09977 -0.595175 -0.718849 0.597049 -1.90258 0.0568476
-1.6915 -0.601828 0.52197 -0.692272 -0.473441 1.38306
-1.27279 1.69424 0.24387 0.498871 -2.3394 0.286934
0.457298 -0.0873094 -0.199842 0.851316 -0.371774 -0.050287
julia> exp(mylog(D6)) - D6
6Γ6 Array{Complex{Float64},2}:
-0.0373332-0.0413267im 0.00640153+0.00270801im β¦ 0.0674095-0.0659298im 0.0314105+0.0050294im
0.0733027-0.171132im -0.110669-0.229787im -0.015317+0.265689im -0.0314504+0.158948im
0.0851588-0.915255im -0.407165-0.90443im 0.31459+0.695568im 0.0492944+0.664101im
-0.070744-0.283477im -0.06765-0.17742im 0.22292-0.0141363im 0.0841001+0.146827im
-0.0921727-0.97041im -0.321872-0.765978im 0.5693+0.306179im 0.181584+0.593536im
0.0366271-0.057668im -0.0444716-0.0900447im β¦ -0.02057+0.117721im -0.0190502+0.0607899im
The Schur factorization is fine:
julia> D6schur = schur(D6);
julia> D6schur.Z*D6schur.T*D6schur.Z' - D6
6Γ6 Array{Float64,2}:
2.77556e-15 5.55112e-17 0.0 1.22125e-15 -1.33227e-15 6.93889e-16
1.11022e-15 9.99201e-16 -6.10623e-16 -4.44089e-16 0.0 0.0
-2.22045e-16 5.55112e-16 -1.44329e-15 -2.88658e-15 -3.55271e-15 4.02456e-16
-1.77636e-15 3.33067e-16 -6.66134e-16 5.55112e-16 -3.4972e-15 6.66134e-16
-2.44249e-15 6.66134e-16 -2.41474e-15 -7.21645e-16 -7.54952e-15 -1.11022e-16
-9.4369e-16 4.57967e-16 -2.77556e-17 -1.11022e-15 1.66533e-16 4.51028e-16
Also, this is broken
julia> G = [1 1; 0 1]
2Γ2 Array{Int64,2}:
1 1
0 1
julia> log(G)
ERROR: MethodError: no method matching log(::UpperTriangular{Complex{Int64},Array{Complex{Int64},2}})
Closest candidates are:
log(::Float16) at math.jl:1018
log(::Complex{Float16}) at math.jl:1019
log(::Float64) at special/log.jl:254
...
The Compute accurate superdiagonal of T
code that is giving NaN in my example is from Higham's Functions of Matrices, chapter 11:
In particular, note the comment about it depending on the definition of the atanh
function β since our code was ported from from Matlab code, we have to be careful that the atanh
definition is the same.
I found a typo in the code that seems to fix the problem. I also think we should update to use (11.27) from Higham's book since our atanh
follows the IEEE 754 definition. Will have a PR soon, I hope.
I also get
julia> norm(exp(log(D6)) - D6) / norm(D6)
1.9415387506094363e-15
for your bigger example above.
(There has got to be a better way to compute log(x/y) / (x-y)
than this crazy complicated formula in Higham's book.)
@ztultrebor, the issue will be closed when the bugfix PR is merged.
Higham's formulae seem to rely on arcane features of Matlab floating point operations, which differ from the ISO C99ff standards followed (faithfully, AFAICT) by Julia. I propose a different sign based on analysis of the ISO-style branch cut:
# this is the fragile thing we are trying to represent
function f_baseline(a,b)
log(b) - log(a)
end
# this is Higham's version (bottom of p.279):
function f_higham1(a,b)
z = (b-a)/(b+a)
log((1.0 + z)/(1.0 - z)) + 2.0*im*pi*(unw(log(b)-log(a)))
end
# this is the form in 11.27
function f_higham2(a,b)
z = (b-a)/(b+a)
u = unw(log1p(z) - log1p(-z))
2.0 * (atanh(z) + im*pi*(unw(log(b)-log(a)) + u))
end
# this is the form in 11.27, with changed sign
function f_proposed(a,b)
z = (b-a)/(b+a)
u = -unw(log1p(z) - log1p(-z))
2.0 * (atanh(z) + im*pi*(unw(log(b)-log(a)) + u))
end
function atest(n=1024)
a1 = 10.0 * (rand(n) .- 0.5) .+ 0im
a2 = 10.0 * (rand(n) .- 0.5) .+ 0im
y0 = f_baseline.(a1,a2)
y1 = f_higham1.(a1,a2)
y2 = f_higham2.(a1,a2)
y3 = f_proposed.(a1,a2)
println("Higham 1: ",norm(y1-y0))
println("Higham 2: ",norm(y2-y0))
println("Proposed: ",norm(y3-y0))
end
Typical results:
julia-1.1> atest()
Higham 1: 99.74247458479297
Higham 2: 201.06192982974676
Proposed: 7.368633527567163e-12
This is because Higham's formulae (in Julia) sometimes pick strange sheets of the Riemann surface:
βββββββββββββββ¬ββββββββββββββββ¬βββββββββββββββββ¬βββββββββββββββββ¬ββββββββββββββββ
β z β baseline β higham1 β higham2 β proposed β
βββββββββββββββΌββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββΌββββββββββββββββ€
β -4.19+0im β -0.486+3.14im β -0.486+3.14im β -0.486+3.14im β -0.486+3.14im β
β -122-0im β -0.0163-3.14im β -0.0163-3.14im β -0.0163-15.7im β -0.0163-3.14im β
β 0.236-0im β 0.48+0im β 0.48-0im β 0.48-0im β 0.48+0im β
β 0.594-0im β 1.37+0im β 1.37-0im β 1.37-0im β 1.37+0im β
β 1.35-0im β 1.9+3.14im β 1.9-3.14im β 1.9-9.42im β 1.9+3.14im β
β -3.28+0im β -0.629+3.14im β -0.629+3.14im β -0.629+3.14im β -0.629+3.14im β
β 2.25+0im β 0.955-3.14im β 0.955-9.42im β 0.955-3.14im β 0.955-3.14im β
β -40.9-0im β -0.0489-3.14im β -0.0489-3.14im β -0.0489-15.7im β -0.0489-3.14im β
β 4.59+0im β 0.443-3.14im β 0.443-9.42im β 0.443-3.14im β 0.443-3.14im β
β 0.334+0im β 0.694+0im β 0.694+0im β 0.694+0im β 0.694+0im β
β -0.885-0im β -2.8+0im β -2.8-0im β -2.8-0im β -2.8+0im β
β -0.522-0im β -1.16+0im β -1.16-0im β -1.16-0im β -1.16+0im β
β 49.8+0im β 0.0401-3.14im β 0.0401-9.42im β 0.0401-3.14im β 0.0401-3.14im β
β 1.94-0im β 1.14+3.14im β 1.14-3.14im β 1.14-9.42im β 1.14+3.14im β
β 1.34+0im β 1.94-3.14im β 1.94-9.42im β 1.94-3.14im β 1.94-3.14im β
β 13+0im β 0.154-3.14im β 0.154-9.42im β 0.154-3.14im β 0.154-3.14im β
ββββββββββββββ΄βββββββββββββββββ΄βββββββββββββββββ΄βββββββββββββββββ΄βββββββββββββββββ
Arguments like the above with signed zeros are important for the matrix functions. Those extra multiples of 2 * pi * im
put the terms out of the range of validity of the Pade forms, at least for the logarithm.
@stevengj I tried the formulation in your PR with the above sign change and see no trouble (for matrices which should be well-conditioned for log
).
This took me a second, but it's key to reading the table above that the "baseline" and "proposed" columns always match.
@RalphAS are there values of z
for which f_baseline
gives the wrong answer?
What about this?
function matrix_log_taylor(A, i=256)
B0 = A - I
R = copy(B0)
B = copy(B0)
for n=2:i
B *= B0
R += (-1)^(n+1)*B/n
end
return R
end
function logmat(A)
Ξ, S = eigen(A)
if length(Set(Ξ))==length(Ξ)
# no repeated eigenvalues
return S*(log.(complex(Ξ)).*inv(S))
else
# repeated eigenvalues
n = 0
while norm(A-I)>1
A = sqrt(A)
n += 1
end
return 2^n*matrix_log_taylor(A)
end
end
See here p.33.
This approach seems to be more correct and faster than the default method.
Most helpful comment
Higham's formulae seem to rely on arcane features of Matlab floating point operations, which differ from the ISO C99ff standards followed (faithfully, AFAICT) by Julia. I propose a different sign based on analysis of the ISO-style branch cut:
Typical results:
This is because Higham's formulae (in Julia) sometimes pick strange sheets of the Riemann surface:
Arguments like the above with signed zeros are important for the matrix functions. Those extra multiples of
2 * pi * im
put the terms out of the range of validity of the Pade forms, at least for the logarithm.@stevengj I tried the formulation in your PR with the above sign change and see no trouble (for matrices which should be well-conditioned for
log
).