Turing.jl: log: DomainError with -960.1841700397322

Created on 13 Aug 2020  Â·  22Comments  Â·  Source: TuringLang/Turing.jl

Running the code results in the following error

using Turing, ReverseDiff, Memoization
Turing.setadbackend(:reversediff)
Turing.setrdcache(true)

y1 = rand(10)
A1 = rand(10, 10)

function logsumexp(mat) 
    max_ = maximum(mat, dims=1)
    exp_mat = exp.(mat .- max_) .- (mat .== max_)
    sum_exp_ = sum(exp_mat, dims=1)
    res = log1p.(sum_exp_) .+ max_
end

@model function mwe(y, A, ::Type{T} = Vector{Float64}) where {T}
    n,m = size(A)
    scale = rand(10)
    vec_raw = T(undef, n)
    for i in eachindex(vec_raw)
        vec_raw[i] ~ truncated(Laplace(0, 1), ((10-30)./scale[i]), ((100-30)./scale[i]))
    end
    vec = vec_raw .* scale .+ 30
    μ = logsumexp(A .- vec_raw)[1,:]
    y .~ Normal.(μ, 1)
    return vec
end

model = mwe(y1, A1);
chain = sample(model, NUTS(.65), 300);

Error:

DomainError with -9.892202109140928:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).

Full Stacktrace: https://pastebin.com/GzfY3Qfu

If I change

vec_raw[i] ~ truncated(Laplace(0, 1), ((10-30)./scale[i]), ((100-30)./scale[i]))

to

vec_raw[i] ~ truncated(Laplace(0, 1), 5, 6) # any number for bounds are fine

the model starts sampling but, this is not how I want to paremeterize the model.

The sampling starts if I make the change scale = ones(10) * any integer. The error resumes if I change to scale = rand(10:15, 10).

What's the problem here? Is there a way to fix this?

All 22 comments

As a rule, always start without caching then consider turning it on if you are sure that there are no branches in your code and double check that the results are the same for some toy runs.

Your implementation of logsumexp looks a bit strange to me. Can you check if the error persists with

function logsumexp(mat)
    maxima = maximum(mat, dims=1)
    exp_mat = exp.(mat .- maxima)
    sum_exp_mat = sum(exp_mat, dims=1)
    return maxima .+ log.(sum_exp_mat)
end

?
Maybe you could also just use StatsFuns.logsumexp (not sure how well it works with AD though).

@devmotion the error persists with the logsumexp you have provided. The implementation of logsumexp I'm using is even more numerically stable. Here it was implemented for a vector and I extended it to a matrix.

@mohamed82008 Thank you. Can you explain a bit more on what do branches in code mean? Would disabling caching slow down sampling?

Your implementation is different from the vector version in that it will yield incorrect results if the maxima are not uniquely attained.

@devmotion I didn't understand. Can you provide an example, I want to fix this function.

Edit:

I tested both function with an array with no unique maxima but the answers turned out to be equivalent.

function logsumexp(mat) 
    max_ = maximum(mat, dims=1)
    exp_mat = exp.(mat .- max_) .- (mat .== max_)
    sum_exp_ = sum(exp_mat, dims=1)
    res = log1p.(sum_exp_) .+ max_
end

function logsumexp1(mat)
    maxima = maximum(mat, dims=1)
    exp_mat = exp.(mat .- maxima)
    sum_exp_mat = sum(exp_mat, dims=1)
    return maxima .+ log.(sum_exp_mat)
end

b = ones(10, 10) .* 10
logsumexp(b) == logsumexp1(b) # true
logsumexp(b) ≈ logsumexp1(b) # true

Can you explain a bit more on what do branches in code mean? Would disabling caching slow down sampling?

Branches mean if statements and loops that run for a different number of iterations based on a condition. There could be no if statements in the model body but a function inside can be using one. So it's always best to check the results are identical when turning on caching and using the same seed. Disabling caching slows down sampling yes.

I tested both function with an array with no unique maxima but the answers turned out to be equivalent.

You chose an unfortunate scaling in your example. The following example illustrates that they are not equivalent and your implementation yields incorrect values in such a case:

julia> a = ones(10, 1);

julia> log.(sum(exp.(a); dims = 1))
1×1 Array{Float64,2}:
 3.3025850929940455

julia> logsumexp(a)
1×1 Array{Float64,2}:
 1.0

julia> logsumexp1(a)
1×1 Array{Float64,2}:
 3.302585092994046

BTW did you make sure to start a fresh Julia session when testing the alternative logsumexp implementation? I'm wondering if you accidentally used a cached version of the old implementation.

BTW did you make sure to start a fresh Julia session when testing the alternative logsumexp implementation? I'm wondering if you accidentally used a cached version of the old implementation.

I've used the cached version, my bad.


In your opinion, is the version of logsumexp I'm using really necessary to be used in Turing? Does it give more accuracy in calculations than the logsumexp1? Will it be a significant hit in accuracy if I used logsumexp1? I feel the cases covered by logsumexp are very corner.

I'm sorry, I just don't understand what you mean. I'm just pointing out that there are issues with your implementation of logsumexp - it is just plain wrong mathematically in certain cases, so instead of any potential (I doubt it matters in your example) increased numerical stability you'll get completely incorrect results.

Apart from that I think you should implement your model differently. I'm not sure what your mathematical model is - do you want scale to have a uniform distribution on the unit interval as prior distribution? There is also a mismatch between the number of scale parameters (10) and the number of indices i (n) - I assume it should be n as well? Moreover, it should be written as scale ~ filldist(Uniform(), n) instead of scale = rand(10) - otherwise Turing won't use the correct random number generators and not save the scale parameters correctly. In the definition of the scaled and truncated Laplace distributions you could just write

vec_raw[i] ~ truncated(Laplace(), -20 / scale[i], 70 / scale[i])

or replace the initialization of vec_raw and the for loop with

vec_raw ~ arraydist(@. truncated(Laplace(), -20 / scale, 70 / scale)

(then you can also remove T). Finally you could also write μ = vec(logsumexp(A .- vec_raw)) instead of creating a new array.

@mohamed82008 Is this related to issue in which the support of the distribution cannot be varied.

@devmotion

I'm sorry, I just don't understand what you mean

I completely agree with you on the mathematical incorretness of logsumexp. I should have said the logsumexp function for vectors described in the discourse thread I've posted instead of logsumexp. I wanted to know if I use accurately converted matrix version of that function, will I get any significant numerical accuracy. Does it matter? If it's not the case then, I can go back to using the version you have posted. I see that you said it might not be worth using it for my case.

do you want scale to have a uniform distribution on the unit interval as prior distribution? There is also a mismatch between the number of scale parameters (10) and the number of indices i (n) - I assume it should be n as well?

The model I'm using is more complex, but I narrowed down the cause of the error to the vec_raw[i] ~ .... line. So I posted whatever model definition I was using during that process as an mwe. Removing caching solved the issue for me.

vec_raw ~ arraydist(@. truncated(Laplace(), -20 / scale, 70 / scale)

Thank you for this suggestion. So if I use vec_raw = T(undef, n), will this create and destroy memory every time the model function is called? I'm guessing it would, I just want to know if it's really the case.

@mohamed82008 Is this related to issue in which the support of the distribution cannot be varied.

Yes! I totally missed this. @krishvishal check the discussion in https://github.com/TuringLang/Turing.jl/issues/1270. It should help you fix your model.

Thank you for this suggestion. So if I use vec_raw = T(undef, n), will this create and destroy memory every time the model function is called? I'm guessing it would, I just want to know if it's really the case.

Yes, it will but also the alternative will allocate memory for vec_raw and the array of truncated distributions. It just removes the need for preallocating vec_raw and the type parameter T. It could (untested) be a bit more efficient since the cache of random variables and thenlog probability are updated only once instead of in every step in the for loop.

Yes! I totally missed this. @krishvishal check the discussion in #1270. It should help you fix your model.

@mohamed82008 can you tell me how to pass scale into MyDistribution? Since it is needed to compute bounds for truncation and I also I need the values scale.

Edit:

I've changed your implementation the following way, but it doesn't work.

Error: MethodError: no method matching length(::MyDistribution)

using Distributions, Bijectors

struct MyDistribution <: ContinuousMultivariateDistribution
    n::Int
    scale::Array{Float64,1}
end

function Base.rand(d::MyDistribution, scale::AbstractVector)
    vec_raw = zeros(d.n)
    for i in 1:d.n
        vec_raw[i] = rand(truncated(Laplace(0,1), ((-20) ./ scale[i]), ((70) ./ scale[i])))
    end
    return vec_raw
end

function Distributions.logpdf(d::MyDistribution, scale::AbstractVector, vec_raw::AbstractVector)
    l = 0.0
    for i in 1:d.n
        l += logpdf(truncated(Laplace(0,1), ((20) ./ scale[i]), ((70) ./ scale[i])), vec_raw[i])
    end
    return l
end

struct MyBijector <: Bijectors.Bijector{1} 

end
function (vec_raw::MyBijector)(x::AbstractVector, scale::AbstractVector)
    y = similar(x)
    for i in 1:length(x)
        y[i] = bijector(truncated(Laplace(0,1), ((-20) ./ scale[i]), ((70) ./ scale[i])))(x[i])
    end
    return y
end

function (vec_raw::Inverse{<:MyBijector})(y::AbstractVector, scale::AbstractVector)
    x = similar(y)
    for i in 1:length(y)
        x[i] = inv(bijector(truncated(Laplace(0,1), ((-20) ./ scale[i]), ((70) ./ scale[i]))))(y[i])
    end
    return x
end

function Bijectors.logabsdetjac(vec_raw::MyBijector, x::AbstractVector, scale::AbstractVector)
    l = float(zero(eltype(x)))
    for i in 1:length(x)
        l += logabsdetjac(bijector(truncated(Laplace(0,1), ((-20) ./ scale[i]), ((70) ./ scale[i]))), x[i])
    end
    return l
end
Bijectors.bijector(d::MyDistribution) = MyBijector()
scale = ones(10) .*10
rand(MyDistribution(10, scale), 1)

Doesnt work.

@krishvishal you will need to sample both scale and vec_raw simultaneously. Try the following. I haven't tested it.

using Distributions, Bijectors

struct MyDistribution <: ContinuousMultivariateDistribution
    n::Int
end

function Base.rand(rng::Random.AbstractRNG, d::MyDistribution)
    scale = rand(rng, d.n)
    vec_raw = zeros(d.n)
    for i in 1:d.n
        vec_raw[i] = rand(rng, truncated(Laplace(0,1), ((-20) ./ scale[i]), ((70) ./ scale[i])))
    end
    return [scale; vec_raw]
end

function Distributions.logpdf(d::MyDistribution, x::AbstractVector)
    scale = x[1:d.n]
    l = sum.(logpdf(Normal(), scale))
    vec_raw = x[d.n+1:2*d.n]
    for i in 1:d.n
        l += logpdf(truncated(Laplace(0,1), ((20) ./ scale[i]), ((70) ./ scale[i])), vec_raw[i])
    end
    return l
end

struct MyBijector <: Bijectors.Bijector{1} 
    n::Int
end
function (b::MyBijector)(x::AbstractVector)
    scale = x[1:b.n]
    vec_raw = x[b.n+1:2*b.n]
    y = similar(vec_raw)
    for i in 1:b.n
        y[i] = bijector(truncated(Laplace(0,1), ((-20) ./ scale[i]), ((70) ./ scale[i])))(vec_raw[i])
    end
    return [scale; y]
end

function (ib::Inverse{<:MyBijector})(y::AbstractVector)
    scale = y[1:ib.orig.n]
    y_vec_raw = y[ib.orig.n+1:2*ib.orig.n]
    x = similar(y_vec_raw)
    for i in 1:ib.orig.n
        x[i] = inv(bijector(truncated(Laplace(0,1), ((-20) ./ scale[i]), ((70) ./ scale[i]))))(y_vec_raw[i])
    end
    return [scale; x]
end

function Bijectors.logabsdetjac(b::MyBijector, x::AbstractVector)
    l = float(zero(eltype(x)))
    scale = x[1:b.n]
    vec_raw = x[b.n+1:2*b.n]
    for i in 1:b.n
        l += logabsdetjac(bijector(truncated(Laplace(0,1), ((-20) ./ scale[i]), ((70) ./ scale[i]))), vec_raw[i])
    end
    return l
end
Bijectors.bijector(d::MyDistribution) = MyBijector(d.n)

rand(MyDistribution(5), 1)

Error:

Same error as before.

MethodError: no method matching length(::MyDistribution)
Closest candidates are:
  length(::Sampleable{Multivariate,S} where S<:ValueSupport) at /home/v/.julia/packages/Distributions/dTXqn/src/common.jl:39
  length(::Sampleable) at /home/v/.julia/packages/Distributions/dTXqn/src/common.jl:37
  length(!Matched::Core.SimpleVector) at essentials.jl:596
  ...

Stacktrace:
 [1] length(::MyDistribution) at /home/v/.julia/packages/Distributions/dTXqn/src/common.jl:39
 [2] rand(::Random._GLOBAL_RNG, ::MyDistribution, ::Int64) at /home/v/.julia/packages/Distributions/dTXqn/src/multivariates.jl:70
 [3] rand(::MyDistribution, ::Int64) at /home/v/.julia/packages/Distributions/dTXqn/src/multivariates.jl:69
 [4] top-level scope at In[3]:1

Seems you also have to define

Base.length(d::MyDistribution) = 2 * d.n

The logpdf implementation of MyDistribution seems to be incorrect, it seems you shouldn't compute the loglikelihood of scale based on Normal - or actually sample it from a Normal distribution.

I've made the change.

Seems you also have to define

Base.length(d::MyDistribution) = 2 * d.n

MethodError: no method matching _rand!(::Random._GLOBAL_RNG, ::MyDistribution, ::SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true})
Closest candidates are:
  _rand!(::AbstractRNG, !Matched::MvLogNormal, ::Union{AbstractArray{#s156,1}, AbstractArray{#s156,2}} where #s156<:Real) at /home/v/.julia/packages/Distributions/dTXqn/src/multivariate/mvlognormal.jl:233
  _rand!(::AbstractRNG, ::Sampleable{Multivariate,S} where S<:ValueSupport, !Matched::AbstractArray{T,2} where T) at /home/v/.julia/packages/Distributions/dTXqn/src/multivariates.jl:29
  _rand!(::AbstractRNG, !Matched::Union{Distributions.DirichletCanon, Dirichlet}, ::AbstractArray{#s62,1} where #s62<:Real) at /home/v/.julia/packages/Distributions/dTXqn/src/multivariate/dirichlet.jl:187
  ...

Stacktrace:
 [1] _rand!(::Random._GLOBAL_RNG, ::MyDistribution, ::Array{Float64,2}) at /home/v/.julia/packages/Distributions/dTXqn/src/multivariates.jl:33
 [2] rand at /home/v/.julia/packages/Distributions/dTXqn/src/multivariates.jl:70 [inlined]
 [3] rand(::MyDistribution, ::Int64) at /home/v/.julia/packages/Distributions/dTXqn/src/multivariates.jl:69
 [4] top-level scope at In[2]:1

scale in my model is filldist(Uniform(a, b), n).

The default implementation for rand(d::MultivariateDistribution, n::Int) in https://github.com/JuliaStats/Distributions.jl/blob/253c3a0813e0a08bf41fdf22f83ea540cc89a4c5/src/multivariates.jl#L70-L71 uses the in-place method Distributions._rand!. Since you didn't define it you get an error when you call rand(d::MyDistribution, n::Int). rand(MyDistribution(5)) should work fine though, and AFAICT sampling in your model should only require the latter form (and hence probably also doesn't require you to define length(::MyDistribution)).

Was this page helpful?
0 / 5 - 0 ratings