Julia: consistent 32bit CI error

Created on 4 Nov 2018  Β·  25Comments  Β·  Source: JuliaLang/julia

This seems to have started recently

Error During Test at /tmp/julia/share/julia/test/numbers.jl:2329
  Got exception outside of a @test
  return type Int128 does not match inferred return type Any
  Stacktrace:
   [1] error(::String) at ./error.jl:33
   [2] top-level scope at /tmp/julia/share/julia/test/numbers.jl:2372
   [3] top-level scope at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
   [4] top-level scope at /tmp/julia/share/julia/test/numbers.jl:2330
   [5] include at ./boot.jl:317 [inlined]
   [6] include_relative(::Module, ::String) at ./loading.jl:1038
   [7] include at ./sysimg.jl:29 [inlined]
   [8] include(::String) at /tmp/julia/share/julia/test/testdefs.jl:13
   [9] top-level scope at /tmp/julia/share/julia/test/testdefs.jl:22
   [10] top-level scope at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
   [11] top-level scope at /tmp/julia/share/julia/test/testdefs.jl:21
   [12] top-level scope at util.jl:289
   [13] top-level scope at /tmp/julia/share/julia/test/testdefs.jl:19
   [14] eval at ./boot.jl:319 [inlined]
   [15] #runtests#3(::UInt128, ::Function, ::String, ::String, ::Bool) at /tmp/julia/share/julia/test/testdefs.jl:25
   [16] #runtests at ./none:0 [inlined] (repeats 2 times)
   [17] (::getfield(Distributed, Symbol("##112#114")){Distributed.CallMsg{:call_fetch}})() at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:269
   [18] run_work_thunk(::getfield(Distributed, Symbol("##112#114")){Distributed.CallMsg{:call_fetch}}, ::Bool) at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:56
   [19] macro expansion at /home/travis/build/JuliaLang/julia/usr/share/julia/stdlib/v1.1/Distributed/src/process_messages.jl:269 [inlined]
   [20] (::getfield(Distributed, Symbol("##111#113"))

e.g. https://travis-ci.org/JuliaLang/julia/jobs/450035986

32-bit ci linux priority

Most helpful comment

That's a really great repro, and helped narrow this down quickly (spoiler: it wasn't where I expected). Usually it's enough to apply this diff to spot where inference diverges from my expectations:

diff --git a/base/compiler/abstractinterpretation.jl b/base/compiler/abstractinterpretation.jl
index e30b5496b0..65c1d6a2b0 100644
--- a/base/compiler/abstractinterpretation.jl
+++ b/base/compiler/abstractinterpretation.jl
@@ -300,6 +300,11 @@ function abstract_call_method(method::Method, @nospecialize(sig), sparams::Simpl
         newsig = limit_type_size(sig, comparison, sv.linfo.specTypes, sv.params.TUPLE_COMPLEXITY_LIMIT_DEPTH, spec_len)

         if newsig !== sig
+            println()
+            println(sig)
+            println(newsig)
+            print_callstack(sv)
+            println()
             # continue inference, but note that we've limited parameter complexity
             # on this call (to ensure convergence), so that we don't cache this result
             if call_result_unused(sv)

But I should perhaps have attempt to record (or repeat?) this debug session, since this didn't turn up anything, I had to go hunting further. Next I took a look at what was going into the cache:

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 01fb88b024..b26860d341 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -28,6 +27,15 @@ function typeinf(frame::InferenceState)
     # empty!(frames)
     min_valid = frame.min_valid
     max_valid = frame.max_valid
+    if cached && frame.parent !== nothing
+        println()
+        for caller in results
+            println(caller.argtypes, " => ", caller.result)
+            # println(caller.src)
+        end
+        print_callstack(frame)
+        println()
+    end
     if cached || frame.parent !== nothing
         for caller in results
             opt = caller.src

This was suspiciously pointing the finger at something permitting results to go into the cache that were clearly bogus, and had the structure of a self-recursive call on a constant (these are marked [uncached] [limited])β€”which is supposed to get rejected by inferenceβ€”getting added to the cache.

If we sorted the list of specializations m

Yes. While we try to make inference fairly predictable and stable(*), sorting the precompile lists there by something stable (such as Method min_world when iterating over bindings) might help with faster isolation of similar problems.

(*) In particular, we want it to have the property that any particular call to inference (e.g. code_typed or Base.return_types) should not return a _wider_ result than expected due to cache effects. We're just fine with it giving unpredictably _narrower_ answers based on stuff that happened to be done in the past, and just we're just fine with it giving entirely unpredictable answers when running recursively, as part of converging inference (up to the limit that it shouldn't widen the final type given as the answer, but that does mean that the set of possible optimizations and inlining afterwards is expected to be unpredictable in general).

All 25 comments

I just got another of these.

This is reproducible for me on an i686 build from a 64 bit ubuntu 18.04 host. The bug reduces to:

julia> Base.return_types(%, (Int8, UInt128))
1-element Array{Any,1}:
 Any

Well, it makes absolutely no sense, but reverting e2f7b2ffd fixes this for me locally.

More precisely, reverting only the following line seems to fix the problem, but I don't understand why.

diff --git a/base/array.jl b/base/array.jl
index 9e8da6c384..74823710ab 100644
--- a/base/array.jl
+++ b/base/array.jl
@@ -272,7 +272,6 @@ function copyto!(dest::Array{T}, doffs::Integer, src::Array{T}, soffs::Integer,
         throw(BoundsError())
     end
     unsafe_copyto!(dest, doffs, src, soffs, n)
-    return dest
 end

 # Outlining this because otherwise a catastrophic inference slowdown

Edit: having woken my machine up from sleep state, this one line change no longer "fixes" the problem. Which makes sense if there's something nasty and intermittent going on here. Uninitialized memory read or some such thing, perhaps.

It looks like my machine has fallen out of whatever state it was in which allowed me to reproduce this.

What I did when reproducing was to build 099979931c with ARCH=i686 (on ubuntu 18.04), which gave the return_types failure as stated. Then to revert various recent commits and measure the bug with Revise until I found a change which "fixed" the problem.

This was yesterday, but the build of 099979931c no longer exhibits the problem today. I'll chalk that up to it being intermittent (I don't think it's just me loosing my mind :-) ).

Got another window of reproducibility; it looks like the cached inferred code for rem(::UInt128, ::UInt128) has a return type of Any, which may possibly be the underlying cause though I'm not sure.

Applying the following patch:

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 01fb88b024..e1912b71da 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -594,6 +594,11 @@ function typeinf_type(method::Method, @nospecialize(atypes), sparams::SimpleVect
             inf = code.inferred
             if !isa(inf, CodeInfo) || (inf::CodeInfo).inferred
                 i == 2 && ccall(:jl_typeinf_end, Cvoid, ())
+                println("typeinf_type code = ", code)
+                println("typeinf_type ", method, " code.rettype = ", code.rettype)
+                println("typeinf_type ", code.inferred)
+                println("typeinf_type ", ccall(:jl_uncompress_ast, Any, (Any, Any), method,
+                                               code.inferred::Vector{UInt8}))
                 return code.rettype
             end
         end

I get the following output:

julia> using Revise; Revise.track(Base.Core.Compiler);

julia> Base.return_types(rem, (UInt128, UInt128))
typeinf_type code = rem(UInt128, UInt128)
typeinf_type Base.rem(...) code.rettype = Any
typeinf_type Array{UInt8, (105,)}[0x08, 0x03, 0x00, 0x00, 0x00, 0x23, 0x73, 0x65, 0x6c, 0x66, 0x23, 0x00, 0x78, 0x00, 0x79, 0x00, 0x16, 0x84, 0xb0, 0xea, 0x09, 0x03, 0x3c, 0x11, 0x02, 0x1d, 0x11, 0x00, 0x05, 0x02, 0x00, 0x09, 0x03, 0x3c, 0x11, 0x02, 0x1d, 0x11, 0x00, 0x05, 0x03, 0x00, 0x2d, 0x1d, 0x85, 0x03, 0x01, 0x03, 0x02, 0x2c, 0x1d, 0x1c, 0xf4, 0x03, 0x03, 0x09, 0x01, 0x3e, 0x03, 0x04, 0x39, 0x16, 0x84, 0xb0, 0xea, 0x3a, 0x3a, 0x3a, 0x3a, 0x3a, 0x16, 0x84, 0xac, 0x26, 0x00, 0x2e, 0x2e, 0x20, 0x85, 0x4d, 0x2b, 0x0a, 0x03, 0xab, 0x16, 0x01, 0xb0, 0xf6, 0x00, 0x00, 0x00, 0x00, 0x00, 0x16, 0x01, 0xae, 0xf6, 0x00, 0x00, 0x00, 0x01, 0x01, 0x01, 0x01, 0x01]
typeinf_type Core.CodeInfo(code=Array{Any, (5,)}[
  Expr(:invoke, (::Type{Base.GMP.BigInt})(UInt128), Base.BigInt, Core.SlotNumber(id=2)),
  Expr(:invoke, (::Type{Base.GMP.BigInt})(UInt128), Base.BigInt, Core.SlotNumber(id=3)),
  Expr(:call, Base.rem, SSAValue(1), SSAValue(2)),
  Expr(:call, Base.UInt128, SSAValue(3)),
  Expr(:return, SSAValue(4))], codelocs=Array{Int32, (5,)}[1, 1, 1, 1, 1], method_for_inference_limit_heuristics=nothing, ssavaluetypes=Array{Any, (5,)}[
  Any,
  Any,
  Any,
  Any,
  Any], linetable=Array{Core.LineInfoNode, (1,)}[Core.LineInfoNode(mod=Base, method=:rem, file=Symbol("int.jl"), line=778, inlined_at=0)], ssaflags=Array{UInt8, (5,)}[0x00, 0x00, 0x00, 0x00, 0x00], slotflags=Array{UInt8, (3,)}[0x00, 0x00, 0x00], slotnames=Array{Any, (3,)}[
  Symbol("#self#"),
  :x,
  :y], inferred=true, inlineable=false, propagate_inbounds=false, pure=false)
1-element Array{Any,1}:
 Any

Given these BigInt in here, perhaps this comes down to a weird interaction with the patch in 6fa3e4553

Given these BigInt in here, perhaps this comes down to a weird interaction with the patch in 6fa3e45

This would also explain why it's not seen on 64 bit linux, as we don't use GMP for supporting UInt128 in that case.

Here's another oddity - @code_typed and Base.return_types disagree about rem with UInt128:

julia> @code_typed rem(UInt128(1), UInt128(1))
CodeInfo(
778 1 ─ %1 = invoke Base.BigInt(_2::UInt128)::BigInt                  β”‚ 
    β”‚   %2 = invoke Base.BigInt(_3::UInt128)::BigInt                  β”‚ 
    β”‚   %3 = Base.GMP.MPZ.tdiv_r::typeof(tdiv_r)                      β”‚β•» rem
    β”‚   %4 = invoke %3(%1::BigInt, %2::BigInt)::BigInt                β”‚β”‚
    β”‚   %5 = invoke Base.UInt128(%4::BigInt)::UInt128                 β”‚ 
    └──      return %5                                                β”‚ 
) => UInt128

julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 Any

More fun:

$ ./julia

[...]

julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 Any

# Now replace `rem` with exactly the same definition, to cause recompilation: 
julia> Base.eval(:(
           function rem(x::UInt128, y::UInt128)
               return UInt128(rem(BigInt(x), BigInt(y)))
           end
       ))
rem (generic function with 136 methods)

julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 UInt128

So I guess the incorrect return type for rem is being read from the sysimg. Later on we get this incorrect version deserialized, but if we override it with the exact same function definition, the next inference run returns the correct result.

The following patch dumps inferred rettype for rem as it's cached in jl_set_method_inferred.

diff --git a/src/gf.c b/src/gf.c
index ae4a1413b7..414a52e369 100644
--- a/src/gf.c
+++ b/src/gf.c
@@ -432,6 +432,13 @@ JL_DLLEXPORT jl_method_instance_t* jl_set_method_inferred(
     li->functionObjectsDecls.functionObject = NULL;
     li->functionObjectsDecls.specFunctionObject = NULL;
     li->rettype = rettype;
+    if (li->def.method->name == jl_symbol("rem")) {
+        jl_printf(JL_STDOUT, "jl_set_method_inferred rem; rettype = ");
+        jl_static_show(JL_STDOUT, li->rettype);
+        jl_printf(JL_STDOUT, "   ");
+        jl_static_show(JL_STDOUT, li->specTypes);
+        jl_printf(JL_STDOUT, "\n");
+    }
     jl_gc_wb(li, rettype);
     li->inferred = inferred;
     jl_gc_wb(li, inferred);

It demonstrates that the 32 bit build sometimes infers the wrong type for rem(UInt128, UInt128). In particular, applying it and calling

make clean &> /dev/null && make &> build.log && grep 'jl_set_method_inferred.*rettype = .*UInt128, UInt128' build.log

sometimes produces

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}
jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}

and other times produces

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}
jl_set_method_inferred rem; rettype = UInt128   Tuple{typeof(Base.rem), UInt128, UInt128}

I'm not sure whether this depends on being inside the sysimg build or not.

Ok, to reproduce this, you need an i686 build, apply the patch above and to run the sysimg build step several times until you see the uninferred version of rem

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}

twice in the log.

Another change which might have caused this to appear - #29795

Nice work tracking this down, @c42f! @Keno, @vtjnash, @JeffBezansonβ€”any idea what's going on?

Thanks, it's a head scratcher that's for sure. I caught the following stack trace from inside jl_set_method_inferred, just as it sets the bad Any as the rettype. This is still slightly after the damage is done, but almost at the right location. At a high level it's inside jl_compile_hint, so just before the sysimg build is completed.

jl_set_method_inferred rem; rettype = Any   Tuple{typeof(Base.rem), UInt128, UInt128}
rec_backtrace at /home/tcfoster/src/julia-i686/src/stackwalk.c:94
jlbacktrace2 at /home/tcfoster/src/julia-i686/src/stackwalk.c:536
jl_set_method_inferred at /home/tcfoster/src/julia-i686/src/gf.c:441
cache_result at ./compiler/typeinfer.jl:133
typeinf at ./compiler/typeinfer.jl:61
typeinf_edge at ./compiler/typeinfer.jl:499 [inlined]
abstract_call_method at ./compiler/abstractinterpretation.jl:363
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_apply at ./compiler/abstractinterpretation.jl:518
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:566
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1120
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method_with_const_args at ./compiler/abstractinterpretation.jl:198
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:107
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method_with_const_args at ./compiler/abstractinterpretation.jl:198
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:107
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1120
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_edge at ./compiler/typeinfer.jl:499
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_method at ./compiler/abstractinterpretation.jl:363
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:85
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_call at ./compiler/abstractinterpretation.jl:775
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval_call at ./compiler/abstractinterpretation.jl:804
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
abstract_eval at ./compiler/abstractinterpretation.jl:889
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_local at ./compiler/abstractinterpretation.jl:1134
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_nocycle at ./compiler/abstractinterpretation.jl:1190
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf at ./compiler/typeinfer.jl:15
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_ext at ./compiler/typeinfer.jl:574
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
typeinf_ext at ./compiler/typeinfer.jl:611
jl_apply_generic at /home/tcfoster/src/julia-i686/src/gf.c:2233
jl_apply at /home/tcfoster/src/julia-i686/src/julia.h:1571 [inlined]
jl_type_infer at /home/tcfoster/src/julia-i686/src/gf.c:277
jl_compile_hint at /home/tcfoster/src/julia-i686/src/gf.c:1964
jl_compile_specializations at /home/tcfoster/src/julia-i686/src/precompile.c:378
jl_precompile at /home/tcfoster/src/julia-i686/src/precompile.c:388 [inlined]
jl_write_compiler_output at /home/tcfoster/src/julia-i686/src/precompile.c:34
jl_atexit_hook at /home/tcfoster/src/julia-i686/src/init.c:222
main at /home/tcfoster/src/julia-i686/ui/repl.c:222
__libc_start_main at /lib/i386-linux-gnu/libc.so.6 (unknown line)
_start at /home/tcfoster/src/julia-i686/usr/bin/julia (unknown line)

This is no longer what I'd describe as "intermittent"β€”it seems to happen quite consistently now.

I knew I remembered something like this. Finally I've found it: #25804 exposed a similar bug. The "solution" was a simple workaround: c002de7447183deef5ac8860fa79c8ad1311b9d2. Maybe we can just do the same for rem?

I think I have a handle on this one: On 32bit systems, rem(::UInt128, ::Untt128) and div(::UInt128, ::Untt128) convert to BigInt, have the actual computation done, and convert back to UInt128. However, that leads to the following recursion:
BigInt(::UInt128) β†’ ndigits(::UInt128, base=2) β†’ ndigits0z(::UInt128, 2) β†’ ndigits0zpb(::UInt128, 2) β†’ div(::UInt128, 2) β†’ promotion β†’ div(::UInt128, ::UInt128) β†’ BigInt(::UInt128)
It should be noted that ndigits0zpb has a special case for b==2, so the call-chain at run-time is not recursive, but apparently constant-propagation doesn't resolve this at compile-time.

To verify, I've added

function BigInt(x::UInt128)
    x == 0 && return BigInt(Culong(0))
    nd = sizeof(x)<<3 - leading_zeros(x) # instead of ndigits(x, base=2)
    z = MPZ.realloc2(nd)
    s = sign(x)
    s == -1 && (x = -x)
    x = unsigned(x)
    size = 0
    limbnbits = sizeof(Limb) << 3
    while nd > 0
        size += 1
        unsafe_store!(z.d, x % Limb, size)
        x >>>= limbnbits
        nd -= limbnbits
    end
    z.size = s*size
    z
end

which is a copy of BigInt(x::Integer) except for replacing the call to ndigits. At least locally, that makes the test failure go away.

Not sure how to properly fix this, though. Options include:

  • Make sure constant-propagation does its duty here.
  • Implement div(::UInt128, ::UInt128) natively instead of calling out to its BigInt version.

@martinholters Nice! That sounds like part of the puzzle and means the problem started happening with the change at https://github.com/JuliaLang/julia/commit/6fa3e45531a68fae34f31214cbf7e008298995d5#diff-c39f3bbf0cb1c2381243b42647d9ff1dR287

Though it doesn't explain the inconsistency of sometimes getting Any and other times UInt128 inferred for rem(::UInt128, UInt128).

The symptoms are mitigated by https://github.com/JuliaLang/julia/pull/30032 but the underlying inference issue remains.

Thinking about this again, my analysis above doesn't explain anything. If the arguments during recursion don't change, inference is happy with it. Only if the _same_ method gets called with _different_ argument types, problems may appear. I.e. inference may bail out (infer as Any), but where that happens depends on where the recursion loop was entered, so results may vary. I still suspect this is the behavior we're seeing here, but I have no idea where the offending recursion might sneak in.

I ran valgrind against the offending sysimg compile step. In one run, the following log occurred

$ valgrind --tool=memcheck --leak-check=full /home/chris/dev/julia-i686/usr/bin/julia -g1 -O0 -C "pentium4" --output-ji /home/chris/dev/julia-i686/usr/lib/julia/sys.ji.tmp  --startup-file=no --warn-overwrite=yes --sysimage /home/chris/dev/julia-i686/usr/lib/julia/corecompiler.ji sysimg.jl
[...]
Dates  ────────── 52.722144 seconds
DelimitedFiles  ─  1.952113 seconds
==1125== Conditional jump or move depends on uninitialised value(s)
==1125==    at 0x4B51B62: ???
==1125==
Random  ───────── 14.559922 seconds
UUIDs  ──────────  0.259599 seconds
Future  ─────────  0.083137 seconds
LinearAlgebra  ──406.806756 seconds
[...]
==1125== Use --track-origins=yes to see where uninitialised values come from

Which is extremely suspicious on the face of it, but repeated reruns with --track-origins=yes failed to reproduce or give any further clues, so perhaps it's a false positive.

@martinholters I think I've got a repro of this outside of bootstrap. Try the following:

# Force recompilation
julia> Base.eval(:(rem(x::UInt128, y::UInt128) = UInt128(rem(BigInt(x), BigInt(y)))))
rem (generic function with 136 methods)
# Compile directly gives the right result
julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 UInt128
# Now force recompilation, entering inference via `parse`:
julia> Base.eval(:(rem(x::UInt128, y::UInt128) = UInt128(rem(BigInt(x), BigInt(y)))))
julia> parse(UInt128, "1")
julia> Base.return_types(rem, (UInt128, UInt128))
1-element Array{Any,1}:
 Any

I think the reason this is inconsistent from one sysimg compile to the next is that the compilation inside jl_compile_specializations is done in the order of the bindings in the modules. These bindings are stored in a ptrhash hash table, so the ordering of bindings isn't stable as it depends on the addresses which happen to be allocated to the bound objects.

In particular, in test runs I found that rem can be compiled due to at least any of parse, @uint128_str ndigits or Base._base. The first two of these result in the wrong inferred type of Any, while the second two result in UInt128 being inferred.

Any time there is inconsistency in the handling of cycles during inference we are likely to have this inconsistent behavior due to the inconsistent ordering coming from jl_foreach_reachable_mtable.

If we sorted the list of specializations m before traversing it at https://github.com/JuliaLang/julia/blob/9a7c79ce56874f58650810f22e94893554f67f88/src/precompile.c#L375 we might at least make the compilation reproducible.

That's a really great repro, and helped narrow this down quickly (spoiler: it wasn't where I expected). Usually it's enough to apply this diff to spot where inference diverges from my expectations:

diff --git a/base/compiler/abstractinterpretation.jl b/base/compiler/abstractinterpretation.jl
index e30b5496b0..65c1d6a2b0 100644
--- a/base/compiler/abstractinterpretation.jl
+++ b/base/compiler/abstractinterpretation.jl
@@ -300,6 +300,11 @@ function abstract_call_method(method::Method, @nospecialize(sig), sparams::Simpl
         newsig = limit_type_size(sig, comparison, sv.linfo.specTypes, sv.params.TUPLE_COMPLEXITY_LIMIT_DEPTH, spec_len)

         if newsig !== sig
+            println()
+            println(sig)
+            println(newsig)
+            print_callstack(sv)
+            println()
             # continue inference, but note that we've limited parameter complexity
             # on this call (to ensure convergence), so that we don't cache this result
             if call_result_unused(sv)

But I should perhaps have attempt to record (or repeat?) this debug session, since this didn't turn up anything, I had to go hunting further. Next I took a look at what was going into the cache:

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 01fb88b024..b26860d341 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -28,6 +27,15 @@ function typeinf(frame::InferenceState)
     # empty!(frames)
     min_valid = frame.min_valid
     max_valid = frame.max_valid
+    if cached && frame.parent !== nothing
+        println()
+        for caller in results
+            println(caller.argtypes, " => ", caller.result)
+            # println(caller.src)
+        end
+        print_callstack(frame)
+        println()
+    end
     if cached || frame.parent !== nothing
         for caller in results
             opt = caller.src

This was suspiciously pointing the finger at something permitting results to go into the cache that were clearly bogus, and had the structure of a self-recursive call on a constant (these are marked [uncached] [limited])β€”which is supposed to get rejected by inferenceβ€”getting added to the cache.

If we sorted the list of specializations m

Yes. While we try to make inference fairly predictable and stable(*), sorting the precompile lists there by something stable (such as Method min_world when iterating over bindings) might help with faster isolation of similar problems.

(*) In particular, we want it to have the property that any particular call to inference (e.g. code_typed or Base.return_types) should not return a _wider_ result than expected due to cache effects. We're just fine with it giving unpredictably _narrower_ answers based on stuff that happened to be done in the past, and just we're just fine with it giving entirely unpredictable answers when running recursively, as part of converging inference (up to the limit that it shouldn't widen the final type given as the answer, but that does mean that the set of possible optimizations and inlining afterwards is expected to be unpredictable in general).

Great! I was hoping someone with knowledge of inference would step in and fix this. Thanks to Martin as well for giving just the hint I needed to repro it.

Yes. While we try to make inference fairly predictable and stable(*), sorting the precompile lists there by something stable (such as Method min_world when iterating over bindings) might help with faster isolation of similar problems.

How about sorting m using the same method as jl_resort_type_cache? Here's a PR for that: https://github.com/JuliaLang/julia/pull/30095

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dpsanders picture dpsanders  Β·  3Comments

musm picture musm  Β·  3Comments

ararslan picture ararslan  Β·  3Comments

StefanKarpinski picture StefanKarpinski  Β·  3Comments

Keno picture Keno  Β·  3Comments