Take this seemingly fine piece of code:
using InteractiveUtils
using OrdinaryDiffEq, StaticArrays
using DiffEqBase: __init
function loop(u, p, t)
σ = p[1]; ρ = p[2]; β = p[3]
du1 = σ*(u[2]-u[1])
du2 = u[1]*(ρ-u[3]) - u[2]
du3 = u[1]*u[2] - β*u[3]
return SVector{3}(du1, du2, du3)
end
struct DS{IIP, F, U, P}
f::F
u0::U
p::P
t0::eltype(U)
end
DS{IIP}(f,u0,p) where {IIP} =
DS{IIP, typeof(f), typeof(u0), typeof(p)}(f, u0, p, 0.0)
const DEFAULT_SOLVER = Vern9()
const DEFAULT_DIFFEQ_KWARGS = (abstol = 1e-9,
reltol = 1e-9, maxiters = typemax(Int))
_get_solver(a) = haskey(a, :alg) ? a[:alg] : DEFAULT_SOLVER
function integrator(ds::DS{iip}, u0 = ds.u0;
tfinal = Inf, diffeq...) where {iip}
u = u0
prob = ODEProblem{iip}(ds.f, u, (ds.t0, typeof(ds.t0)(tfinal)), ds.p)
(haskey(diffeq, :saveat) && tfinal == Inf) && error("Infinite solving!")
solver = _get_solver(diffeq)
integ = __init(prob, solver; DEFAULT_DIFFEQ_KWARGS...,
save_everystep = false, diffeq...)
return integ
end
u0 = rand(SVector{3})
ds = DS{false}(loop, u0, rand(3))
@show @which OrdinaryDiffEq.Vern9ConstantCache(Float64,Float64)
@show tinteg = integrator(ds, dt = 0.1)
Run it on OrdinaryDiffEq.jl branch precomment: https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/tree/precomment . This branch comments right up to the point of the problem, and at the function line that is a problem it replaces it with @which
:
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/precomment/src/solve.jl#L229
From there you get:
#= D:\OneDrive\Computer\Desktop\test.jl:76 =# @which(OrdinaryDiffEq.Vern9ConstantCache(Float64, Floa
t64)) = (::Type{OrdinaryDiffEq.Vern9ConstantCache})(::Type{T}, ::Type{T2}) where {T<:Union{Float32,
Float64}, T2<:Union{Float32, Float64}} in OrdinaryDiffEq at C:\Users\Chris\.julia\dev\OrdinaryDiffEq
\src\tableaus\verner_tableaus.jl:2400
(real(uBottomEltypeNoUnits), real(tTypeNoUnits)) = (Float64, Float64)
tinteg = integrator(ds, dt=0.1) = (::Type{OrdinaryDiffEq.Vern9ConstantCache})(::Type{T}, ::Type{T2})
where {T<:Union{Float32, Float64}, T2<:Union{Float32, Float64}} in OrdinaryDiffEq at C:\Users\Chris
\.julia\dev\OrdinaryDiffEq\src\tableaus\verner_tableaus.jl:2400
So aha I have isolated what is causing the issue and can cause it directly from the REPL? No.
OrdinaryDiffEq.Vern9ConstantCache(Float64,Float64)
That works from the REPL. So now go to postcomment which is one commit down from precomment. Here's the diff:
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/commit/e9d6b3d5110b97437147403ddc35bd4c646adf31
So instead of calling @which
on that line, we just call it from inside of the package. Can't be too bad, right? Wrong: it never stops compiling. I added a flush
so that way it shows that for sure nothing gets printed and thus it is never ending compilation.
Now this may be related to the bigfloat precompilation PR since the only related commit is https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/commit/b13d4d5945e605cf2e51913dafe74125aa5c523f between OrdinaryDiffEq v4.7.0 and v4.6.0, yet it works on v4.6.0 and not on v4.7.0. But to make matters more confusing, the dispatch that it's actually calling is line 2400 which is https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/tableaus/verner_tableaus.jl#L2399 which was not a dispatch which was changed beyond the removal of Base.@pure
. I hit it with Base.@pure
again and it doesn't compile, so it seems that the code which is actually called is unchanged between these two versions of OrdinaryDiffEq.jl yet the version change causes a precompilation issue. Then on the branch postthenrevert I revert that bigfloat change commit which only effects unrelated dispatches (since now I've tested Base.@pure
with/without separately) and... reverting the commit fixes the issue.
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/commit/24a254840b56a5505182096b76b7fb3a0907bb09
So my guess is that somehow the bigfloat precompilation in a separate dispatch is causing an infinite loop in the correct dispatch but I cannot pretend to understand why. I wonder if @JeffBezanson 's https://github.com/JuliaLang/julia/commit/e0cf3db3dfe9853f2f0d455856dbd4113c7b92b8 had something to do with this (it was supposed to fix bigfloat precompilation?).
In the meantime OrdinaryDiffEq.jl can just revert the tableau change, but that's gotta be the weirdest bug I've "isolated".
Oh, and another complication is that this only happens from a function. If you just call things from the REPL:
using OrdinaryDiffEq, StaticArrays
function f(u, p, t)
σ = p[1]; ρ = p[2]; β = p[3]
du1 = σ*(u[2]-u[1])
du2 = u[1]*(ρ-u[3]) - u[2]
du3 = u[1]*u[2] - β*u[3]
return SVector{3}(du1, du2, du3)
end
u0 = rand(SVector{3})
prob = ODEProblem{false}(f, u0, (0.0, Inf), rand(3))
const DEFAULT_DIFFEQ_KWARGS = (abstol = 1e-9, reltol = 1e-9, maxiters = typemax(Int))
solver = Vern9()
integ = DiffEqBase.__init(prob, solver; DEFAULT_DIFFEQ_KWARGS...,
save_everystep = false)
You're good. So that "MWE" seems convoluted, but the random stuff actually does make a difference and I don't know why. There's something nefarious going on...
e0cf3db only affects BigInts; BigFloats were fixed a while ago.
I'm having a lot of trouble understanding the sequence of steps to reproduce this. Is it just: run the block of code at the top of the OP, on the precomment
branch of OrdinaryDiffEq?
Run the block of code at the top of the OP on the postcomment
branch of OrdinaryDiffEq.jl:
https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/tree/postcomment
The call at the bottom of the diff is the part which when commented out allows it to not compile forever, pointing to that line being the issue: https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/commit/e9d6b3d5110b97437147403ddc35bd4c646adf31
I found another example and this one isn't bigfloat/bigint related at all. I think it's the same thing because it's all about compilation taking for ever, requires extra types involved, and requires a local scope.
The example is that this script works fine:
using DiffEqBiological, StochasticDiffEq
model = @reaction_network rn begin
(d,1000), X ↔ 0
end d
i = 1.0
prob = SDEProblem(model,[1000.0+i],(0.,200.),[i])
sol = solve(prob, EM(), dt = 0.01)
for i in [1., 2., 3., 4., 5.]
prob = SDEProblem(model,[1000.0+i],(0.,200.),[i])
sol = solve(prob, EM(), dt = 0.01)
end
However, this script compiles forever:
using DiffEqBiological, StochasticDiffEq
model = @reaction_network rn begin
(d,1000), X ↔ 0
end d
for i in [1., 2., 3., 4., 5.]
prob = SDEProblem(model,[1000.0+i],(0.,200.),[i])
sol = solve(prob, EM(), dt = 0.01)
end
Thus for some reason, if solve
's first compilation is in a local scope, then it will compile forever.
This is one of my favorite bugs ever. Vern9ConstantCache
takes 396 arguments, many of which are inferred to be Union{Float64,BigFloat}
. When we try to count the number of possible signatures the counter overflows so we think it's within the limit. It then tries to split around 2^100 signatures. Nothing like DiffEq code to thoroughly test the compiler :)
@ChrisRackauckas does this make DiffEq unusable? Need to assess whether merging the fix is 0.7-blocking.
If it's inferring to Union{Float64,BigFloat}
then there's something weird going on. It should only be hitting dispatches for Float64
, since there's separate dispatches for CompiledFloats = Union{Float32,Float64}
which were made to reduce the compile times by specifically avoiding any code with parse(BigFloat,...)
.
@ChrisRackauckas does this make DiffEq unusable? Need to assess whether merging the fix is 0.7-blocking.
This first version with OrdinaryDiffEq.jl? No, it's worked around just by reverting the tableaus and avoiding any big""
. I can live with that.
But examples like https://github.com/JuliaLang/julia/issues/28356#issuecomment-409297092 are more worrisome, since it means that if you try and use (some?) diffeq functionality inside of a for
loop, let
block, function
call, etc. before compiling it directly in the REPL you get compiler hang. I am not sure if it's the same phenomena though: I don't understand this bug at all.
If it's inferring to Union{Float64,BigFloat} then there's something weird going on
The compiler infers whatever it feels like.
Ok, the fix will be merged soon and will almost certainly make it into 1.0 but we don't need to hold up 0.7 for it.
Sounds good to me.
If someone can try this on Windows and see if it fixes JuliaLang/IJulia.jl#693 that would be much appreciated.
I'm was going to give it a try the next time the nightlies built. But given @JeffBezanson 's comments, it seems like the fix is for just the first example, and the second example ( https://github.com/JuliaLang/julia/issues/28356#issuecomment-409297092 ) might be a different behavior, and this second example is what I think is peculiarly similar to the IJulia problem (and I have found some other seemingly random examples of it).
If the second example isn't fixed yet, should this issue be left open, or another opened?
It's not fixed. I'll give it a separate issue to make things more clear. I guess it wasn't the same.
Most helpful comment
This is one of my favorite bugs ever.
Vern9ConstantCache
takes 396 arguments, many of which are inferred to beUnion{Float64,BigFloat}
. When we try to count the number of possible signatures the counter overflows so we think it's within the limit. It then tries to split around 2^100 signatures. Nothing like DiffEq code to thoroughly test the compiler :)