Julia: Type inference bug can segfault official release of 0.6.0

Created on 12 Aug 2017  路  16Comments  路  Source: JuliaLang/julia

I've been running into a ReadOnlyMemoryError() in a large internal codebase. It is reproducible but fragile.

I managed to reduce it to this. With minor code changes, I can generate a ReadOnlyMemoryError() or cause Julia to segfault. The version provided here reproducibly segfaults Julia when it is executed in a Jupyter notebook, but not when Julia is run in a terminal (so fragile!).

I know this reduced code looks pretty funny, but any attempt to reduce it further made the problem go away (commenting any line or removing any unused package from the using statement makes the segfault/ReadOnlyMemoryError go away).

I modified Jupyter's kernel.json so I could automatically attach gdb and get a backtrace.

I have tested this on 0.6.0 (built on Ubuntu 16.04 with and without debug, as well as official Julia binaries) on multiple machines. I have not tested on Julia master because some of these packages don't yet work there.

Julia Version 0.6.0
Commit 9036443 (2017-06-19 13:05 UTC)
DEBUG build
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

It is unfortunate that this example needs to run in a Jupyter notebook to reproduce. I can provide private access to my Jupyter server if this would simplify matters.

bug inference types and dispatch

Most helpful comment

Adding a 0.6.x tag to this so it doesn't get forgotten.

All 16 comments

I haven't done a full bisection yet, but here's what I have so far:
0.6.0-rc1: no segfault
0.6.0-rc2: no segfault
0.6.0-rc3: segfault
0.6.0: segfault
That suggests the breakage occurred between May 17 and June 7.

Ooh this looks promising. Here's the result of bisection: (@vtjnash)

456198f87d6bf8ab271d3c55744b41da3f0e8f51 is the first bad commit
commit 456198f87d6bf8ab271d3c55744b41da3f0e8f51
Author: Jameson Nash <[email protected]>
Date:   Mon May 22 16:06:48 2017 -0400

    improve inference of methods with Type{leaftype} parameters

    (cherry picked from commit 15e736977bc94a8a8c0cb02465a00ede2fb71254)
    ref #21771

    remove many unneeded pure annotations

    Removing actually may enable inference to get a sharper result,
    since it is no longer being directed to ignore backedges and correctness assumptions

    Replaces pure annotations in promotion with inline

    (cherry picked from commit 76a30fb4968dccab2adcdc3bb5eee00184df0eba)

    handling Base printing of Expr.typ fields not containing a Type

    this allows printing of the Exprs flowing through inference (which might instead have the field set to something like a Const object)

    (cherry picked from commit f97d9e85822b353b7e4b7a665514fab1f556f957)

    apply_type on Type{T} is valid whenever T is valid

    removes performance bug added by 0292c42c

    (cherry picked from commit 8ee2e062cabcfb5956386a25cc7c949c0efca1f4)

:040000 040000 1bd072d73ddfd4b24c273870d875b5019d989631 707964c1b0e94865a224ff85f2802816f68e87d2 M  base
:040000 040000 2ed9b15599ca1fb7a7f37ef3c6d31cede322a911 9f8a204a7f2f31dd5ff931699e6a569a04b77101 M  test

Incidentally, I think there may be a typo in that commit. I think

atype = Const(atyp.parameters[1])

...should have been...

atyp = Const(atyp.parameters[1])

Doesn't resolve the segfault though!

The breaking change was this:

 -    if istopfunction(tm, f, :promote_type) || istopfunction(tm, f, :typejoin)
 -        return Type
 +    if istopfunction(tm, f, :typejoin) || f === return_type
 +        return Type # don't try to infer these function edges directly -- it won't actually come up with anything useful

(this is part of what was cherry picked from commit 76a30fb4968dccab2adcdc3bb5eee00184df0eba)

Applying just this change to the last good commit, I get the segfault. Conversely, I can revert just this change on v0.6.0 final and eliminate the segfault.

This is probably as far as I can go with this. I will defer to the experts about what it means and how it should be fixed. Thanks!

Actually, this is easier to reproduce than I made it sound. The reduced case segfaults on JuliaBox:

image

Perhaps this might entice one of the developers to have a look?

Based on the bisection, this appears to be a type inference bug that can segfault the official release of 0.6.0, so it would be good to fix it! My lab recently published some of the "large internal codebase" as part of a paper, so I'm a bit worried that others will run into this as well.

I'm hesitant to spam the core developers, but perhaps @StefanKarpinski can decide what's best? Thanks!

This is really impressive debugging work! Thanks. I'll try to pester some of the folks that would know better what to do here.

Thanks for pursuing this so thoroughly, Drew. It should be fairly easy to fix with this information.

I've tried backporting various type system and inference fixes to 0.6, but none of them fixed this. @vtjnash How do you feel about just applying this patch to 0.6 to fix the crash:

--- a/base/inference.jl
+++ b/base/inference.jl
@@ -1858,7 +1858,7 @@ function abstract_call(f::ANY, fargs::Union{Tuple{},Vector{Any}}, argtypes::Vect
     t = pure_eval_call(f, argtypes, atype, sv)
     t !== false && return t

-    if istopfunction(tm, f, :typejoin) || f === return_type
+    if istopfunction(tm, f, :typejoin) || istopfunction(tm, f, :promote_type) || f === return_type
         return Type # don't try to infer these function edges directly -- it won't actually come up with anything useful
     elseif length(argtypes) == 2 && istopfunction(tm, f, :typename)

Adding a 0.6.x tag to this so it doesn't get forgotten.

@drewrobson has 0.6.x fixed this issue for you? if not, it'd be nice if this were fixed in 0.6.4. Gadfly users having been experiencing similar possibly related issues.

No, this is not fixed on 0.6.x yet. The original internal codebase fails on every point release of 0.6.x. I just confirmed this on the recently tagged 0.6.4 as well.
The reduced case I provided above does not reliably reproduce the issue across point releases (e.g. it's failing on an earlier line on one of my 0.6.3 installations but not failing at all on another). I can try to reduce the original codebase again on 0.6.4.

does your reduced case fail on 0.7?

Packages are in good enough shape that I can _almost_ test on 0.7 now. I'm just waiting on JuliaMath/Interpolations.jl#215 and then I'll test the original codebase on 0.7.

ping, let us know if it works now, or if we have a smaller repro

I'm happy to say that we haven't seen this ReadOnlyMemoryError or segfault on the 1.x series at all!

The repro was large and fragile, and there have been enough changes in the compiler and packages that it's tough to know if we're creating the precise conditions that caused this. That said, none of our usage patterns over an extended time have triggered this, so I'm increasingly convinced that this is resolved and will close now. Thanks!

Was this page helpful?
0 / 5 - 0 ratings