Rust: Segfaults/corruption when reading an enum in release mode

Created on 30 Sep 2020  Â·  23Comments  Â·  Source: rust-lang/rust

Reproduction here: https://github.com/cormacrelf/minimal-sigsegv-rust

There's more info in the readme on that repo. Essentially, I have some code that segfaults while debug-printing an enum. The enum has one Output(String) variant, followed by 11 empty variants. The repro only creates one of the empty ones, but it appears the debug routine attempts to read the string variant. It occurs in the context of some mutually recursive functions building a fairly simple datastructure by walking another. I have tried hard to minimise the repro but it could be better, but it's a bit fragile so I don't know what else I can remove. With a quick glance at the commit that bisect-rustc found, it looks like it could be related to optimising match arms, so that could be a start.

Edit: for some linkage, the commit below merges #76308, which 're-enables SimplifyArmIdentity'.

cargo bisect-rustc

searched nightlies: from nightly-2020-05-08 to nightly-2020-09-23
regressed nightly: nightly-2020-09-09
searched commits: from https://github.com/rust-lang/rust/commit/0e2c1281e909ca38479b97962fc9248f75d66412 to https://github.com/rust-lang/rust/commit/5099914a16a215794ad243df0cc7a05d91d168e0
regressed commit: https://github.com/rust-lang/rust/commit/5a6b426e3471382c0385c11b09c2d6b37f70ac49


bisected with cargo-bisect-rustc v0.5.2

Host triple: x86_64-apple-darwin
Reproduce with:

cargo bisect-rustc --start 2020-05-08 --end 2020-09-23 --with-cargo --access github -- run --release

Note that the ref_sequence function is the one in which the debug segfaults, and it is debugging a larger structure. I have also seen it segfault when cloning just the EdgeData enum, which is the String+11x one mentioned above.

Backtrace from rust-lldb

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x0000000100002588 minimal`_$LT$$RF$T$u20$as$u20$core..fmt..Debug$GT$::fmt::h797ace08a5294f9f + 24
    frame #1: 0x000000010002e163 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 [inlined] core::fmt::builders::DebugTuple::field::_$u7b$$u7b$closure$u7d$$u7d$::hb6c824bec58214a9 at builders.rs:347:17 [opt]
    frame #2: 0x000000010002e115 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 [inlined] core::result::Result$LT$T$C$E$GT$::and_then::h2b3c0ffee0d30aef at result.rs:708 [opt]
    frame #3: 0x000000010002e109 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 at builders.rs:334 [opt]
    frame #4: 0x000000010000459f minimal`_$LT$$RF$T$u20$as$u20$core..fmt..Debug$GT$::fmt::h2b06d94161eb79e0 + 95
    frame #5: 0x000000010002e163 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 [inlined] core::fmt::builders::DebugTuple::field::_$u7b$$u7b$closure$u7d$$u7d$::hb6c824bec58214a9 at builders.rs:347:17 [opt]
    frame #6: 0x000000010002e115 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 [inlined] core::result::Result$LT$T$C$E$GT$::and_then::h2b3c0ffee0d30aef at result.rs:708 [opt]
    frame #7: 0x000000010002e109 minimal`core::fmt::builders::DebugTuple::field::h5368cface0fdb245 at builders.rs:334 [opt]
    frame #8: 0x00000001000037e8 minimal`_$LT$minimal..ref_ir..RefIR$u20$as$u20$core..fmt..Debug$GT$::fmt::hfde74232d0e0d5d0 + 104
    frame #9: 0x000000010002e910 minimal`core::fmt::write::h0ce880d33cd2a300 at mod.rs:1080:17 [opt]
    frame #10: 0x0000000100015034 minimal`_$LT$$RF$std..io..stdio..Stderr$u20$as$u20$std..io..Write$GT$::write_fmt::h2508a17060d785a3 [inlined] std::io::Write::write_fmt::hdeaf6be33655ecfd at mod.rs:1517:15 [opt]
    frame #11: 0x0000000100014fe9 minimal`_$LT$$RF$std..io..stdio..Stderr$u20$as$u20$std..io..Write$GT$::write_fmt::h2508a17060d785a3 at stdio.rs:838 [opt]
    frame #12: 0x0000000100015596 minimal`std::io::stdio::_eprint::h8f478498c62451a2 [inlined] _$LT$std..io..stdio..Stderr$u20$as$u20$std..io..Write$GT$::write_fmt::h4bdcbad34c0de397 at stdio.rs:812:9 [opt]
    frame #13: 0x0000000100015539 minimal`std::io::stdio::_eprint::h8f478498c62451a2 [inlined] std::io::stdio::print_to::_$u7b$$u7b$closure$u7d$$u7d$::h805d34d32e3d72cc at stdio.rs:947 [opt]
    frame #14: 0x0000000100015404 minimal`std::io::stdio::_eprint::h8f478498c62451a2 [inlined] std::thread::local::LocalKey$LT$T$GT$::try_with::h81d8f4dd4c7ef03e at local.rs:271 [opt]
    frame #15: 0x00000001000153dc minimal`std::io::stdio::_eprint::h8f478498c62451a2 [inlined] std::io::stdio::print_to::h3e10e6f958040f4c at stdio.rs:936 [opt]
    frame #16: 0x00000001000153dc minimal`std::io::stdio::_eprint::h8f478498c62451a2 at stdio.rs:975 [opt]
    frame #17: 0x0000000100002c3b minimal`minimal::disamb::element_ref_ir_impl::h81621c2e3d855e9c + 1035
    frame #18: 0x000000010000423f minimal`minimal::main::h79d5d72421065e13 + 1375
    frame #19: 0x00000001000022fa minimal`std::sys_common::backtrace::__rust_begin_short_backtrace::h1ea652726f9417f0 + 10
    frame #20: 0x000000010000235c minimal`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h254a4aa9b725da66 (.llvm.9306634996709519214) + 12
    frame #21: 0x000000010001a290 minimal`std::rt::lang_start_internal::h5e1feb19f4099625 [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::h31ef09f9bee131c1 at function.rs:259:13 [opt]
    frame #22: 0x000000010001a286 minimal`std::rt::lang_start_internal::h5e1feb19f4099625 [inlined] std::panicking::try::do_call::h693e4d9f7366f6d8 at panicking.rs:381 [opt]
    frame #23: 0x000000010001a286 minimal`std::rt::lang_start_internal::h5e1feb19f4099625 [inlined] std::panicking::try::h58e51096fb939dc8 at panicking.rs:345 [opt]
    frame #24: 0x000000010001a286 minimal`std::rt::lang_start_internal::h5e1feb19f4099625 [inlined] std::panic::catch_unwind::h4406f92fae32aea8 at panic.rs:382 [opt]
    frame #25: 0x000000010001a286 minimal`std::rt::lang_start_internal::h5e1feb19f4099625 at rt.rs:51 [opt]
    frame #26: 0x0000000100004539 minimal`main + 41
    frame #27: 0x00007fff7f3603d5 libdyld.dylib`start + 1
    frame #28: 0x00007fff7f3603d5 libdyld.dylib`start + 1

A-mir-opt C-bug I-unsound đź’Ą T-compiler requires-nightly

Most helpful comment

A slight modification to @tmiasko's repro and it will segfault in release mode:

fn main() {
    println!("{:?}", f(Some(A::Y)));
}

#[derive(Eq, Debug, PartialEq)]
pub enum A {
    X(String),
    Y,
}

#[derive(Eq, Debug, PartialEq)]
pub enum B {
    B(Option<A>),
}

pub fn f(a: Option<A>) -> B {
    if let Some(x) = a {
        let y = x;
        return B::B(Some(y));
    }
    B::B(None)
}

I can open a PR disabling this optimization so this doesn't hit beta next week.

All 23 comments

Can't reproduce with rustc 1.48.0-nightly (381b445ff 2020-09-29).

I don’t have MacOS, so I can’t test this myself easily. Here’s some tags though:
@rustbot modify labels: O-macos, T-compiler, E-needs-mcve. And someone might wanna add I-unsound.

_Edit:_ To clarify, I’m not seeing any segfault on Windows or Linux (both 7f7a1cbfd 2020-09-27). Oddly enough, I’m getting Edge(Some(NotUsed)) on Windows, and Edge(Some(Locator)) on Linux. And I get Edge(Some(LocatorLabel)) without --release on both (also with --release on stable or beta or in miri, for that matter). Is this program supposed to be deterministic?

@cormacrelf Perhaps you could test your less minimized version on Linux or Windows if you have access to those (since you said “the segfault started to change a bit”, so maybe the original version also segfaults on e.g. Linux).

Yes, the program is deterministic. I have seen similar wrong variants as well in the original code on macOS, so I'll go out in a limb and say that's the same bug on Linux. I can run a docker image if you like but I think you've already found it. The fragility I was referring to was in terms of "add a debug statement or delete some line, the issue goes away or pops up on a different input instead". But once you have it it produces the same segfault again and again. Given the complexity of the repro I'm not surprised that platform differences including calling conventions etc would affect reproducibility.

I did run in Miri and it didn't find anything amiss, and produced correct output, ie the full Edge(Some(LocatorLabel)).

@bjorn3 Yeah I should have made it more obvious about macOS. I can still reproduce with:

rustc 1.48.0-nightly (7f7a1cbfd 2020-09-27)
binary: rustc
commit-hash: 7f7a1cbfd3b55daee191247770627afab09eece2
commit-date: 2020-09-27
host: x86_64-apple-darwin
release: 1.48.0-nightly
LLVM version: 11.0

Okay, I guess if it is supposed to be deterministic, this means I can then confirm the problem for linux and windows. I’m currently trying to reduce the example further, keeping the different behavior between --release and debug on linux. @rustbot modify labels: -O-macos

Just got a segfault on Linux with a reduced example.

SimplifyArmIdentity moves an assignment to a local before StorageLive:

fn main() {
    f(None);
}

pub enum A {
    X,
    Y,
}

pub enum B {
    B(Option<A>),
}

pub fn f(a: Option<A>) -> B {
    if let Some(x) = a {
        let y = x;
        return B::B(Some(y));
    }
    B::B(None)
}
--- main.f.005-007.SimplifyArmIdentity.before.mir
+++ main.f.005-007.SimplifyArmIdentity.after.mir
@@ -1,2 +1,2 @@
-// MIR for `f` before SimplifyArmIdentity
+// MIR for `f` after SimplifyArmIdentity

@@ -13,6 +13,6 @@
     scope 1 {
-        debug x => _4;                   // in scope 1 at src/main.rs:15:17: 15:18
+        debug x => ((_7 as Some).0: A);  // in scope 1 at src/main.rs:15:17: 15:18
         let _6: A;                       // in scope 1 at src/main.rs:16:13: 16:14
         scope 2 {
-            debug y => _6;               // in scope 2 at src/main.rs:16:13: 16:14
+            debug y => ((_7 as Some).0: A); // in scope 2 at src/main.rs:16:13: 16:14
         }
@@ -47,9 +47,4 @@
         StorageLive(_6);                 // scope 1 at src/main.rs:16:13: 16:14
-        _6 = move _4;                    // scope 1 at src/main.rs:16:17: 16:18
+        _7 = move _1;                    // scope 2 at src/main.rs:17:21: 17:28
         StorageLive(_7);                 // scope 2 at src/main.rs:17:21: 17:28
-        StorageLive(_8);                 // scope 2 at src/main.rs:17:26: 17:27
-        _8 = move _6;                    // scope 2 at src/main.rs:17:26: 17:27
-        ((_7 as Some).0: A) = move _8;   // scope 2 at src/main.rs:17:21: 17:28
-        discriminant(_7) = 1;            // scope 2 at src/main.rs:17:21: 17:28
-        StorageDead(_8);                 // scope 2 at src/main.rs:17:27: 17:28
         ((_0 as B).0: std::option::Option<A>) = move _7; // scope 2 at src/main.rs:17:16: 17:29

cc @wesleywiser

The label regression-from-stable-to-nightly might also be appropriate.

A slight modification to @tmiasko's repro and it will segfault in release mode:

fn main() {
    println!("{:?}", f(Some(A::Y)));
}

#[derive(Eq, Debug, PartialEq)]
pub enum A {
    X(String),
    Y,
}

#[derive(Eq, Debug, PartialEq)]
pub enum B {
    B(Option<A>),
}

pub fn f(a: Option<A>) -> B {
    if let Some(x) = a {
        let y = x;
        return B::B(Some(y));
    }
    B::B(None)
}

I can open a PR disabling this optimization so this doesn't hit beta next week.

Marking this as P-critical based on the wg-prioritization discussion

We can probably turn bugs like this into ICEs by extending the MIR validator (unless another optimization interferes before this hits codegen)

Because I didn’t feel like aborting my own minimization attempt, here’s the (heavily) reduced version of OPs example that I got:

#[derive(Debug)]
enum MyEnum {
    Variant1(Vec<u8>),
    Variant2,
    Variant3,
    Variant4,
}

fn f(arg1: &bool, arg2: &bool, arg3: bool) -> MyStruct {
    if *arg1 {
        println!("{:?}", f(&arg2, arg2, arg3));
        MyStruct(None)
    } else {
        match if arg3 { Some(MyEnum::Variant3) } else { None } {
            Some(t) => {
                let ah = t;
                return MyStruct(Some(ah));
            }
            _ => MyStruct(None)
        }
    }
}

#[derive(Debug)]
struct MyStruct(Option<MyEnum>);

fn main() {
    let arg1 = true;
    let arg2 = false;
    f(&arg1, &arg2, true);
}

Causing a segfault on linux with nightly.

@wesleywiser Your modified example does not consistently segfault. It only does so for nightlies newer than 2020-09-27 or so. The difference between Some and None being printed _does_ however start from the same commit that OPs bisection identified. (Actually it does not “segfault” at all but gives me an “invalid pointer” error.)

@steffahn

Your modified example does not consistently segfault. It only does so for nightlies newer than 2020-09-27 or so

That sounds about right to me since #76844 should have been in nightly 2020-09-26. Prior to that PR, the mis-optimization was disabled in #76837 which should have been in nightly-2020-09-19 or so. Between ~2020-09-19 and ~2020-09-26 the issue should not appear.

Does your repro work on the playground? When I try it with the nightly available, it works correctly.

We can probably turn bugs like this into ICEs by extending the MIR validator (unless another optimization interferes before this hits codegen)

Opened https://github.com/rust-lang/rust/pull/77369 with an implementation of this. It does indeed catch this miscompilation.

Does your repro work on the playground?

It doesn’t seem to work in the playground. Even the (mostly) untouched original code that OP provided doesn’t fail in the playground. (I.e. it correctly displays Some(LocatorLabel).)

To demonstrate some online reproduction: godbolt.org does show the segfault (exit code 139).

I can reproduce with this playground based on one of the MCVEs:

stderr

   Compiling playground v0.0.1 (/playground)
    Finished release [optimized] target(s) in 0.85s
     Running `target/release/playground`
munmap_chunk(): invalid pointer
timeout: the monitored command dumped core
/playground/tools/entrypoint.sh: line 11:     7 Aborted                 timeout --signal=KILL ${timeout} "$@"

stdout

B(None)

Perhaps change the title of the issue to "... in release mode"?

Is this still open so that the underlying issue is fixed? (The PR to temporarily disable the optimization was merged.)

We should leave this issue open to track fixing the optimization but we can remove some of the tags. Once a nightly goes out with the fix, we should also confirm it's no longer repros on the nightly channel. (I was going to wait for that confirmation before changing any labels)

Both repros work correctly on the latest nightly (2020-10-02), so I'm adjusting the tags accordingly. Thanks @cormacrelf for reporting this issue!

@wesleywiser Did you mean to remove P-critical?

@camelid Yeah

Was this page helpful?
0 / 5 - 0 ratings

Related issues

behnam picture behnam  Â·  3Comments

mcarton picture mcarton  Â·  3Comments

pedrohjordao picture pedrohjordao  Â·  3Comments

jmegaffin picture jmegaffin  Â·  3Comments

drewcrawford picture drewcrawford  Â·  3Comments