Rust: LLVM generates incorrect code with -Zprofile

Created on 19 Mar 2020  路  22Comments  路  Source: rust-lang/rust

Instructions to reproduce:

  • git clone [email protected]:servo/html5ever.git
  • cd html5ever/markup5ever
  • CARGO_INCREMENTAL=0 RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Cinline-threshold=0" cargo check
  • The build script of markup5ever segfaults.

The actual function with the bug is an instance std::panicking::try in the proc_macro2 crate.

The cause seems to be an LLVM bug.

full gist

This LLVM IR is generated:

catch.i:                                          ; preds = %.noexc
  %120 = phi i64* [ getelementptr inbounds ([24 x i64], [24 x i64]* @__llvm_gcov_ctr.27, i64 0, i64 11), %.noexc ], !dbg !2861
  %121 = landingpad { i8*, i32 }
          catch i8* null, !dbg !2861
  %122 = load i64, i64* %120, !dbg !2861
  %123 = add i64 %122, 1, !dbg !2861
  store i64 %123, i64* %120, !dbg !2861```

Notice how the phi is inserted before the landingpad instruction. This causes the following asm to be generated:

.LBB27_14: // This is never executed
    .loc    27 0 15 is_stmt 0
    movq    160(%rsp), %rcx
    movl    $1, %esi
.Ltmp379: // Landing pad points to here!!!
    leaq    __llvm_gcov_ctr.27(%rip), %rdi
    addq    $120, %rdi
    .loc    27 274 15
    movq    (%rcx), %r8
    addq    $1, %r8
    movq    %r8, (%rcx)

So basically the initialization of %rcx is getting skipped by the incorrect landing pad, which in turn causes the crash.

Edit by @Amanieu, original bug report follows.


Just updated nightly on my CI machine

nightly-aarch64-unknown-linux-gnu updated - rustc 1.44.0-nightly (f509b26a7 2020-03-18) (from rustc 1.43.0-nightly (c20d7eecb 2020-03-11))

and found out that tests stopped compiling few of dependencies like cssparser or string_cache or html5ever.

It probably happens because of my RUSTFLAGS

CARGO_INCREMENTAL=0;
RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Cinline-threshold=0 -Coverflow-checks=off -Zno-landing-pads";

I created repository with reproduction
https://github.com/Lesiuk/rust-nightly-issue-reproduction

Bissect found that this PR introduced this issue https://github.com/rust-lang/rust/pull/67502

searched toolchains c20d7eecbc0928b57da8fe30b2ef8528e2bdd5be through 3c6f982cc908aacc39c3ac97f31c989f81cc213c
regression in be055d96c4c223a5ad49a0181f0b43bc46781708

Log from test run

Compiling html5ever v0.25.1
error: failed to run custom build command for html5ever v0.25.1

Caused by:
process didn't exit successfully: /Users/XXXXXX/CLionProjects/issue/target/debug/build/html5ever-1a979961379450d7/build-script-build (signal: 6, SIGABRT: process abort signal)
--- stdout
cargo:rerun-if-changed=/Users/XXXXXX/.cargo/registry/src/github.com-1ecc6299db9ec823/html5ever-0.25.1/src/tree_builder/rules.rs

--- stderr
fatal runtime error: failed to initiate panic, error 5

warning: build failed, waiting for other jobs to finish...
error: build failed

A-LLVM A-runtime C-bug E-needs-mcve I-ICE T-compiler requires-nightly

All 22 comments

cc @Amanieu, perhaps related to #67502

Is there a reason you're passing -Zno-landing-pads? I believe in ~all cases you should either be doing -Cpanic=abort today, no-landing-pads is likely to lead to problems... but it would be odd for that to lead to trouble in the compiler I think

Yes. I just updated my description of the issue. Bissect found this PR.

Those flags are recommended by mozilla's grcov tool:
https://github.com/mozilla/grcov

Does this problem still happen with -C panic=abort instead of -Z no-landing-pads?

-Z no-landing-pads will cause catch_unwind to be optimized away with the new implementation. This is why the runtime can't initiate a panic: it can't find a catch for the exception.

I would recommend either using -C panic=abort or just removing -Z no-landing-pads.

Changed my RUSTFLAGS to

-Zprofile -Ccodegen-units=1 -Cinline-threshold=0 -Coverflow-checks=off -C panic=abort

and still crashes.

Minimal reproduction:

  • Clone rust-lang/libc
  • RUSTFLAGS=-Zprofile CARGO_INCREMENTAL=0 cargo check

Output:

   Compiling libc v0.2.68 (/home/amanieu/code/rust-libc)
error: failed to run custom build command for `libc v0.2.68 (/home/amanieu/code/rust-libc)`

Caused by:
  process didn't exit successfully: `/home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build-script-build` (signal: 11, SIGSEGV: invalid memory reference)
--- stdout
cargo:rustc-cfg=freebsd11
cargo:rustc-cfg=libc_priv_mod_use
cargo:rustc-cfg=libc_union
cargo:rustc-cfg=libc_const_size_of
cargo:rustc-cfg=libc_align
cargo:rustc-cfg=libc_core_cvoid
cargo:rustc-cfg=libc_packedN

--- stderr
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x73693724)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x646e6148)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x34616639)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x6124544c)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous run count: corrupt object tag (0x636f6c6c)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x735f646c)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0xf7ebb268)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000001)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x65727573)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)
profiling: /home/amanieu/code/rust-libc/target/debug/build/libc-5d0f6fa6d686682d/build_script_build-5d0f6fa6d686682d.gcda: cannot merge previous GCDA file: corrupt arc tag (0x00000000)

cargo-bisect-rustc points to e2223c94bf433fc38234d1303e88cbaf14755863 as the source of the regression, which seems a bit surprising.

I'm not familiar with -Zprofile -- but I agree that seems like a surprising trace. It's not implausible that there's some bug introduced by that, though, given the heavy use of unsafe code in BTree...

Maybe we could get a stack trace on the SIGSEGV?

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7edd328 in __memmove_avx_unaligned_erms () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7edd328 in __memmove_avx_unaligned_erms () from /usr/lib/libc.so.6
#1  0x000055555558856b in write_bytes (s=0x5555555b7296 "_ZN5alloc11collections5btree4node25Handle$LT$Node$C$Type$GT$9into_node17hf4012ff21ae14a7aE", len=90) at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/llvm-project/compiler-rt/lib/profile/GCDAProfiling.c:178
#2  write_string (s=0x5555555b7296 "_ZN5alloc11collections5btree4node25Handle$LT$Node$C$Type$GT$9into_node17hf4012ff21ae14a7aE") at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/llvm-project/compiler-rt/lib/profile/GCDAProfiling.c:202
#3  llvm_gcda_emit_function (ident=<optimized out>, function_name=0x5555555b7296 "_ZN5alloc11collections5btree4node25Handle$LT$Node$C$Type$GT$9into_node17hf4012ff21ae14a7aE", func_checksum=<optimized out>, use_extra_checksum=0 '\000', cfg_checksum=<optimized out>)
    at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/llvm-project/compiler-rt/lib/profile/GCDAProfiling.c:455
#4  0x0000555555568670 in __llvm_gcov_writeout ()
#5  0x00005555555894b3 in llvm_writeout_files () at /rustc/f509b26a7730d721ef87423a72b3fdf8724b4afa/src/llvm-project/compiler-rt/lib/profile/GCDAProfiling.c:606
#6  0x00007ffff7db7537 in __run_exit_handlers () from /usr/lib/libc.so.6
#7  0x00007ffff7db76ee in exit () from /usr/lib/libc.so.6
#8  0x00007ffff7da002a in __libc_start_main () from /usr/lib/libc.so.6
#9  0x000055555555f24e in _start ()

It does look related to the BTree stuff.

cc @ssomers

#69776 introduced extra calls to Handle::into_node (the function that the profiler is trying to talk about) in case of panic, that would come into play in should_panic test cases (that are excluded by Miri, and don't count on me knowing exactly what happens there). So you might want to try excluding the bunch of should_panic test cases in liballoc/tests/btree/map.rs

PS That doesn't add up - the problem fixed was pointed out by Miri, so it must be able to run should_panic tests now. Or something.

Nevermind that's a false positive. It seems that -Zprofile requires -Ccodegen-units=1 to work properly. The issue is unrelated to the btree code.

OK I think I've narrowed it down to an LLVM bug.

full gist

This LLVM IR is generated:

catch.i:                                          ; preds = %.noexc
  %120 = phi i64* [ getelementptr inbounds ([24 x i64], [24 x i64]* @__llvm_gcov_ctr.27, i64 0, i64 11), %.noexc ], !dbg !2861
  %121 = landingpad { i8*, i32 }
          catch i8* null, !dbg !2861
  %122 = load i64, i64* %120, !dbg !2861
  %123 = add i64 %122, 1, !dbg !2861
  store i64 %123, i64* %120, !dbg !2861```

Notice how the phi is inserted before the landingpad instruction. This causes the following asm to be generated:

.LBB27_14: // This is never executed
    .loc    27 0 15 is_stmt 0
    movq    160(%rsp), %rcx
    movl    $1, %esi
.Ltmp379: // Landing pad points to here!!!
    leaq    __llvm_gcov_ctr.27(%rip), %rdi
    addq    $120, %rdi
    .loc    27 274 15
    movq    (%rcx), %r8
    addq    $1, %r8
    movq    %r8, (%rcx)

So basically the initialization of %rcx is getting skipped by the incorrect landing pad, which in turn causes the crash.

Unnominating since this requires an unstable compiler flag to trigger

Instructions to reproduce:

  • git clone [email protected]:servo/html5ever.git
  • cd html5ever/markup5ever
  • CARGO_INCREMENTAL=0 RUSTFLAGS="-Zprofile -Ccodegen-units=1 -Cinline-threshold=0" cargo check (NOTE: -Cdebuginfo=0 will cause the bug to no longer trigger)

The actual function with the bug is an instance std::panicking::try in the proc_macro2 crate.

-Cdebuginfo=0 basically disables -Zprofile because no gcda files are generated. This miscompilation is kinda sad because mozilla's grcov is the most reliable coverage tool for rust. And if you want to have your rust application very well tested coverage is kinda important.

I will probably stick to nightly-2020-03-11 until It's not fixed but I can't use it indefinitely.

that would come into play in should_panic test cases (that are excluded by Miri

Miri has supported panics for a while now. If there are still panic tests excluded by Miri, that's a bug. Do you have an example of such a test?

Never mind, I excluded should_panic tests when testing myself, since I don't know how to run them successfully.
PS and yes, with today's nightly Miri these tests succeed just like any other. I thought they didn't a month ago or so (memory leaks).

Unnominating since this requires an unstable compiler flag to trigger

AFAICT using these flags is the only way to get code coverage reporting working. A lot of projects (all that I've been associated with) build everything with stable Rust but then do code coverage using nightly Rust + these unstable flags. Temporarily I've reverted back to a 2020-03-11 nightly for this as a workaround, but that workaround will stop working effectively in once Stable Rust is no longer a subset of the nightly-2020-03-11 feature set.

Was this page helpful?
0 / 5 - 0 ratings