Julia: Regression from v1.0.2, causes crash on linux with v1.0.3

Created on 5 Jan 2019  路  17Comments  路  Source: JuliaLang/julia

My package now crashes on Julia v1.0.3, when it didn't on v1.0.2. Any suggestions? I don't know how to do a bisection to identify what happened...

Working on 1.0.2:

julia> versioninfo()
Julia Version 1.0.2
Commit d789231e99 (2018-11-08 20:11 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2687W v2 @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, ivybridge)

(v1.0) pkg> add Phylo

(v1.0) pkg> status
    Status `~/.julia/environments/v1.0/Project.toml`
  [aea672f4] Phylo v0.3.2

julia> using Phylo

julia> parsenewick("((,),(,));")
BinaryTree{DataFrames.DataFrame,Dict{String,Any}} with 4 tips, 7 nodes and 6 branches.
Leaf names are Node 1, Node 2, Node 4 and Node 5

Crashing on v1.0.3:

julia> versioninfo()
Julia Version 1.0.3
Commit 099e826241 (2018-12-18 01:34 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2687W v2 @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, ivybridge)

(v1.0) pkg> add Phylo
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
status
  Updating `~/.julia/environments/v1.0/Project.toml`
 [no changes]
  Updating `~/.julia/environments/v1.0/Manifest.toml`
 [no changes]

(v1.0) pkg> status
    Status `~/.julia/environments/v1.0/Project.toml`
  [aea672f4] Phylo v0.3.2

julia> using Phylo

julia> parsenewick("((,),(,));")
signal (11): Segmentation fault
in expression starting at no file:0
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1191
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
emit_invoke at /buildworker/worker/package_linux64/build/src/codegen.cpp:3094
emit_expr at /buildworker/worker/package_linux64/build/src/codegen.cpp:3893
emit_ssaval_assign at /buildworker/worker/package_linux64/build/src/codegen.cpp:3615
emit_stmtpos at /buildworker/worker/package_linux64/build/src/codegen.cpp:3801 [inlined]
emit_function at /buildworker/worker/package_linux64/build/src/codegen.cpp:6262
jl_compile_linfo at /buildworker/worker/package_linux64/build/src/codegen.cpp:1159
jl_fptr_trampoline at /buildworker/worker/package_linux64/build/src/gf.c:1796
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:324
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:430
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:363 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:682
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:806
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x7f6949c40fff)
unknown function (ip: 0xffffffffffffffff)
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:815
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:805
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/builtins.c:622
eval at ./boot.jl:319
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:85
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:117 [inlined]
#28 at ./task.jl:259
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2184
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1537 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:268
unknown function (ip: 0xffffffffffffffff)
Allocations: 38676146 (Pool: 38667798; Big: 8348); GC: 83
Segmentation fault (core dumped)
backport 1.0 regression

Most helpful comment

if the issue can be identified

I'm running a reduce job but there's a lot of code involved so this might take a while.
EDIT: after reducing 40 out of 70KLOC, the segfault has become nondeterministic so I'm not sure this will end anywhere :slightly_frowning_face:

All 17 comments

The code works on v1.0.3 on MacOS and on Windows, and on Julia v1.0.2, v0.7 and v0.6 on all platforms.

Just seen there's a v1.1.0-rc1 too. It's now fixed, sorry... is there going to be another point release before v1.1.0 to fix this?

julia> versioninfo()
Julia Version 1.1.0-rc1.0
Commit ba87aa3962 (2018-12-31 23:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2687W v2 @ 3.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, ivybridge)

(v1.1) pkg> add Phylo


(v1.1) pkg> status
    Status `~/.julia/environments/v1.1/Project.toml`
  [aea672f4] Phylo v0.3.2

julia> using Phylo

julia> parsenewick("((,),(,));")
BinaryTree{DataFrames.DataFrame,Dict{String,Any}} with 4 tips, 7 nodes and 6 branches.
Leaf names are Node 1, Node 2, Node 4 and Node 5

I ran git bisect between v1.0.2 and v1.0.3 using the success condition as @assert parsenewick("((,),(,));") isa Phylo.BinaryTree. It identified #30113, but the chosen commit was one of the broken interim commits and the PR was not squashed on merge, so it's unclear whether that PR is actually to blame. I've restarted the bisect with git bisect skip for that commit, so we'll see.

So, interesting result: I cannot reproduce the failure at all with a source build on this machine (git bisect picked the git bisect bad commit as the failure but it's unrelated). However, I _can_ reproduce with the official binaries. I'm not sure what to make of that.

@staticfloat, ideas?

Most likely it's sensitive to local compiler flags; good things to check are if your local source build is using the same sysimg multiversioning flags; whether you're setting the same MARCH, etc....

It was just a plain make build on Anubis. I can try again later with particular options set to mirror those on the buildbots, but I didn't think we were setting anything too special there.

Just wondering if there are any thoughts about this? It would be nice if it was resolved, or at the very least that I was certain that the next binary release was guaranteed not to break again...

Do the 1.1.0-rc2 binaries work for you?

Yes they do... but so did 1.1.0-rc1 - I guess my concern is whether this was compiled using the same compiler flags as the official 1.0.3 release, since it was only that release (and not the source itself) that seemed to be broken.

Okay, the 1.1.0 release works, so does that mean we should close this and hope that whatever went wrong with the 1.0.3 release won't happen again :)?

We'll still be making 1.0.x releases, so this is still a problem. I don't have time to investigate further at the moment though, unfortunately. Hopefully this can be resolved for 1.0.4 if the issue can be identified.

Fair enough... I'll keep my fingers crossed someone has time. I'm working mostly on MacOS at the moment, so checking things on linux is inconvenient, but I'll try to keep looking into it.

if the issue can be identified

I'm running a reduce job but there's a lot of code involved so this might take a while.
EDIT: after reducing 40 out of 70KLOC, the segfault has become nondeterministic so I'm not sure this will end anywhere :slightly_frowning_face:

I've managed to reduce this example, but I'm not sure I like the result...
This is what's left of the entire codebase with all its dependencies:

# depot/packages/DataFrames/lyCjP/src/abstractdataframe/io.jl
using WeakRefStrings # depot/packages/DataFrames/lyCjP/src/DataFrames.jl
module DataFrames
if VERSION >= v"1.1.0DEV.792" end
include("abstractdataframe/io.jl")
end
# depot/packages/DataValues/cAl6R/src/DataValues.jl
module DataValues
include("scalar/core.jl")
end
# depot/packages/DataValues/cAl6R/src/scalar/core.jl
for b in (:!, )
  @eval begin
      import .$b
      $b(a) = c
  end
end
# depot/packages/IterableTables/xvpnQ/src/IterableTables.jl
module IterableTables
using Requires, TableTraitsUtils
end
# depot/packages/Phylo/g085o/src/newick.jl
using Tokenize
function parsenewick(::Tokenize.Lexers.Lexer, ::c) where c
  "Unexpected $token.kind token '$(untokenize(token))' "
end
parsenewick(::String, ::Type{c}) where c = parsenewick(a, c)
parsenewick(b) = parsenewick(b, NamedTree)
# depot/packages/Phylo/g085o/src/Phylo.jl
module Phylo
include("Tree.jl")
include("newick.jl")
export parsenewick
include("trim.jl")
if VERSION < v"0.7.0" end
end
# depot/packages/Phylo/g085o/src/Tree.jl
using DataFrames
struct BinaryTree end
NamedTree = BinaryTree
# depot/packages/Phylo/g085o/src/trim.jl
using IterableTables
# depot/packages/Requires/9Jse8/src/require.jl

# depot/packages/Requires/9Jse8/src/Requires.jl
module Requires
include("require.jl")
end
# depot/packages/TableTraitsUtils/p4RrX/src/TableTraitsUtils.jl
module TableTraitsUtils
using DataValues
end
# depot/packages/Tokenize/P2B32/src/lexer.jl
module Lexers
struct Lexer end
end
# depot/packages/Tokenize/P2B32/src/Tokenize.jl
module Tokenize
include("token.jl")
include("lexer.jl")
import .Tokens: untokenize
export untokenize
end
# depot/packages/Tokenize/P2B32/src/token.jl
module Tokens
include("token_kinds.jl")
function a()
  for b in instances(Kind)
    if string(b) end
  end
end
a()
struct c e::Kind end
function untokenize(d::c)
  if string(d.e) end
end
end
# depot/packages/Tokenize/P2B32/src/token_kinds.jl
@enum(Kind, end_keywords)
# depot/packages/WeakRefStrings/RmyGQ/src/WeakRefStrings.jl
module WeakRefStrings
struct a <: AbstractString end
Base.thisind(::a, c) = b
end
# main.jl
using Phylo
parsenewick("")

Nothing particularly exciting, really, but creduce doesn't manage to reduce this any further. This includes, e.g., the empty require.jl -- removing it and the include from Requires.jl breaks the repro.

Now for what makes this repro annoying: the segfault is nondeterministic, and requires a couple of runs before triggering. Worse, the segfault only happens when piping the output of julia to a process, even if it's just tee (you can say I "selected" such a repro by testing against julia ... | grep Segfault).
The issue is also precompile-related, and only reproduces when starting with an empty cache (i.e., removing .julia/compiled). And to top it all off, it doesn't reproduce when disabling ASLR.

I tried running against 1.0.3 + ASAN, but it really only reproduces with the binaries. To try it yourself:

$ git clone https://github.com/maleadt/creduce_julia -b julia/30612 .
$ while true;
  do
    echo try;
    rm -rf depot/compiled/v1.0;
    PATH=/path/to/julia-1.0.3/bin:$PATH JULIA_DEPOT_PATH=$(pwd)/depot julia main.jl |& grep Segmentation;
  done
signal (11): Segmentation fault

Verified on cyclops. All code up at https://github.com/maleadt/creduce_julia/tree/julia/30612

Thanks so much for looking into this. I'm feeling a bit dispirited that it seems to be so complicated, and a bit like the only option I have at the moment is prayer that the 1.0.4 binary release won't be afflicted and the problem will silently disappear... what I don't understand is what the difference is between the (presumably?) compiler flags for the binary official releases and the nightlies that might have made this show up (or just the platform they are compiled on?) - presumably there's a script somewhere that does both of these that could be compared?

@maleadt Since this was only an issue in v1.0.x (which isn't likely to see more releases), and not on master (which should soon have another LTS release and has likely already advanced significantly in the various areas this may have failed), did you analysis indicate whether we can close it?

I did not investigate this again with 1.0.4 or 1.0.5, but with another LTS coming up and this issue being fixed on 1.1+ I think we can close this. Maybe @richardreeve can elaborate whether this is still an actual issue with Phylo.jl on any version of Julia.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ararslan picture ararslan  路  3Comments

manor picture manor  路  3Comments

iamed2 picture iamed2  路  3Comments

felixrehren picture felixrehren  路  3Comments

Keno picture Keno  路  3Comments