I've run into a very annoying problem whereby a certain package I'm writing would be fast on a fresh REPL, showing few allocations, and them on a different fresh REPL, it would be very very slow, with a huge number of allocations... without touching the code! This is an effort to isolate the problem into a MWE (not sure if it can be made even smaller). To reproduce the issue you will likely need to try several times. For me the behaviour is highly erratic.
So: the following module elsa
, when developed in ./julia/dev
, produces non-deterministic allocations when running test()
in current 0.7-beta (I'm not sure when this began)
module elsa
function test()
for i in 1:10_000
foo!([0.0], 1, 1, 1, 1, rand(), 1, 1, (1, 1))
end
end
@noinline function foo!(vec, x1, x2, x3, x4, x5, x6, x7, (i, j))
vec[i] = i
return nothing
end
end
First run in a fresh REPL
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.7.0-beta.7 (2018-06-25 01:11 UTC)
_/ |\__'_|_|_|\__'_| | Commit 0978251ac5 (0 days old master)
|__/ | x86_64-apple-darwin17.6.0
julia> using elsa; @time elsa.test(); @time elsa.test(); @time elsa.test();
[ Info: Recompiling stale cache file /Users/pablo/.julia/compiled/v0.7/elsa/Jexn.ji for module elsa
0.034960 seconds (122.36 k allocations: 5.917 MiB)
0.007761 seconds (40.00 k allocations: 1.679 MiB, 69.58% gc time)
0.001392 seconds (40.00 k allocations: 1.679 MiB)
Subsequent runs in a fresh REPL (no recompile)
julia> using elsa; @time elsa.test(); @time elsa.test(); @time elsa.test();
0.025599 seconds (107.65 k allocations: 5.603 MiB)
0.000618 seconds (20.00 k allocations: 1.068 MiB)
0.000663 seconds (20.00 k allocations: 1.068 MiB)
Note the change in execution time and allocations.
Sometimes, on a subsequent run, we again get 40k allocations
julia> using elsa; @time elsa.test(); @time elsa.test(); @time elsa.test();
0.029740 seconds (122.36 k allocations: 5.917 MiB)
0.001366 seconds (40.00 k allocations: 1.679 MiB)
0.001490 seconds (40.00 k allocations: 1.679 MiB)
If one modifies the code with e.g. some whitespace to force a recompile, I can trigger a change of allocations much more frequently.
Fresh REPL after adding whitespace to force recompile
julia> using elsa; @time elsa.test(); @time elsa.test(); @time elsa.test();
[ Info: Recompiling stale cache file /Users/pablo/.julia/compiled/v0.7/elsa/Jexn.ji for module elsa
0.025894 seconds (107.69 k allocations: 5.596 MiB)
0.000819 seconds (20.00 k allocations: 1.068 MiB)
0.005586 seconds (20.00 k allocations: 1.068 MiB, 86.66% gc time)
Second run in a fresh REPL (no recompile)
julia> using elsa; @time elsa.test(); @time elsa.test(); @time elsa.test();
0.062763 seconds (122.37 k allocations: 5.923 MiB, 53.05% gc time)
0.001380 seconds (40.00 k allocations: 1.679 MiB)
0.001364 seconds (40.00 k allocations: 1.679 MiB)
Third run
julia> using elsa; @time elsa.test(); @time elsa.test(); @time elsa.test();
0.021969 seconds (107.69 k allocations: 5.596 MiB)
0.000398 seconds (20.00 k allocations: 1.068 MiB)
0.000440 seconds (20.00 k allocations: 1.068 MiB)
You get the idea.
Incidentally, if one runs the module
straight at the REPL, I never see the problem, and allocations are just 10k (half of the minimum above)
julia> module elsa
function test()
for i in 1:10_000
foo!([0.0], 1, 1, 1, 1, rand(), 1, 1, (1, 1))
end
end
@noinline function foo!(vec, x1, x2, x3, x4, x5, x6, x7, (i, j))
vec[i] = i
return nothing
end
end
Main.elsa
julia> @time elsa.test(); @time elsa.test(); @time elsa.test();
0.023286 seconds (90.55 k allocations: 5.052 MiB)
0.000304 seconds (10.00 k allocations: 937.656 KiB)
0.000333 seconds (10.00 k allocations: 937.656 KiB)
X-ref: https://github.com/JuliaLang/julia/issues/25900 (I'm not sure it is related, though, as the using elsa
itself is deterministic in this example)
julia> versioninfo()
Julia Version 0.7.0-beta.7
Commit 0978251ac5 (2018-06-25 01:11 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin17.6.0)
CPU: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Seems to be a type inference issue that has to do with precompilation.
10k allocations (no precompile):
20 โ invoke Elsa.foo!(%11::Array{Float64,1}, 1::Int64, 1::Int64, 1::Int64, 1::Int64, %42::Float64, 1::Int64, 1::Int64, (1, 1)::Tuple{Int64,Int64})
Every argument inferred
20k allocations (after first precompile)
20 โ invoke Elsa.foo!(%11::Any, 1::Any, 1::Int64, 1::Int64, 1::Int64, %42::Float64, 1::Int64, 1::Int64, (1, 1)::Tuple{Int64,Int64})
Some arguments not inferred
40k allocations (recompile):
20 โ invoke Elsa.foo!(%11::Array{Float64,1}, 1::Int64, 1::Int64, 1::Int64, 1::Any, %42::Float64, 1::Int64, 1::Any, (1, 1)::Any)
Some other arguments not inferred...
Very nice MWE
Even something even smaller like
test() = foo!([0.0], 1)
@noinline foo!(vec, i) = vec[i] = i
can show this:
julia> @code_warntype Elsa.test()
Body::Int64
6 1 โ %1 = invoke Base.vect(0.0::Float64)::Array{Float64,1} โ
โ %2 = invoke Elsa.foo!(%1::Array{Float64,1}, 1::Any)::Int64 โ
โโโ return %2
note, 1::Any
Nice! Thanks for looking into it @KristofferC. So it's got to do with inference, but do you know why it fails randomly? It does seem to be related to the tuple destructuring.
I don't think it has to do anything with the tuple, since I can get the ::Any
without any tuples (see my last example).
Also, cannot reproduce on 0.6.
Uhm, I do see how your ultraminimalistic example fails to be inferred, but it doesn't seem to allocate randomly for me. It always behaves the same in that regard.
julia> using elsa; @time elsa.test(); @time elsa.test(); @time elsa.test();
0.016478 seconds (62.47 k allocations: 3.216 MiB)
0.000004 seconds (5 allocations: 256 bytes)
0.000002 seconds (5 allocations: 256 bytes)
Do you see the inference failure with your MWE always, or randomly?
It's a bit random, sometimes the first argument fails to infer aswell
julia> @code_warntype Elsa.test()
Body::Int64
6 1 โ %1 = invoke Base.vect(0.0::Float64)::Array{Float64,1} โ
โ %2 = invoke Elsa.foo!(%1::Any, 1::Any)::Int64 โ
โโโ return %2
cc @vtjnash
May be fixed (at least the inference failure problems) by 4562ae2
Amazing, absolutely genius @vtjnash. Completely solved AFAICT! Not only the MWE, but the real world case too, it seems.
It may be too complicated to explain briefly, but did you understand the root cause (the non-deterministic behaviour in particular)?
Yeah, I saw some uninitialized memory when doing some unrelated work, and traced it back to here.
Most helpful comment
May be fixed (at least the inference failure problems) by 4562ae2