(Moving this to a separate issue because PR #50882 is already full of noise)
Reduced STR:
diff --git a/src/liballoc/alloc.rs b/src/liballoc/alloc.rs
index 84bd275..4df7730 100644
--- a/src/liballoc/alloc.rs
+++ b/src/liballoc/alloc.rs
@@ -172,7 +172,7 @@ pub(crate) unsafe fn box_free<T: ?Sized>(ptr: Unique<T>) {
// We do not allocate for Box<T> when T is ZST, so deallocation is also not necessary.
if size != 0 {
let layout = Layout::from_size_align_unchecked(size, align);
- dealloc(ptr as *mut u8, layout);
+ Global.dealloc(NonNull::new_unchecked(ptr).cast(), layout);
}
}
x.py test src/libstd --stage 1)What happens then is that sync::once::tests::wait_for_force_to_finish fails with:
thread '<unnamed>' panicked at 'assertion failed: t1.join().is_ok()', libstd/sync/once.rs:582:9
The disassembly for wait_for_force_to_finish only contains one call to std::thread::JoinHandle::join, instead of the expected two when the code is not miscompiled. That call is followed by a test that jumps to a panic when the result of JoinHandle::join is ... Ok(()):
93371: e8 aa fd 07 00 callq 113120 <_ZN41_$LT$std..thread..JoinHandle$LT$T$GT$$GT
$4join17h0fb3b129ec2e38aeE>
93376: 48 89 c3 mov %rax,%rbx
93379: 49 89 d7 mov %rdx,%r15
9337c: 48 85 db test %rbx,%rbx
9337f: 74 1b je 9339c <_ZN3std4sync4once5tests24wait_for_force_to_fin
ish17h6199051fcaa3ff6aE+0x21c>
That JoinHandle::join returns a Result<(), Box<Any>>, and Ok(()) is represented as (0, whatever). The destination of that jump is the panic code.
At some point, I'm not entirely sure with what state of the tree, it was even better. The error was:
thread '<unnamed>' panicked at 'assertion failed: t2.join().is_ok()', libstd/sync/once.rs:583:9
And there were two calls to std::thread::JoinHandle::join, as expected, but they didn't have the same result handling:
93c79: e8 32 c5 07 00 callq 1101b0 <_ZN41_$LT$std..thread..JoinHandle$LT$T$GT$$GT$4join17hc6e7f9bb7d72483aE>
93c7e: 48 89 c3 mov %rax,%rbx
93c81: 49 89 d7 mov %rdx,%r15
93c84: 48 85 db test %rbx,%rbx
93c87: 75 5e jne 93ce7 <_ZN3std4sync4once5tests24wait_for_force_to_finish17h13c0ef8dd5eb6a3aE+0x267>
(snip)
93c9f: e8 0c c5 07 00 callq 1101b0 <_ZN41_$LT$std..thread..JoinHandle$LT$T$GT$$GT$4join17hc6e7f9bb7d72483aE>
93ca4: 48 89 c3 mov %rax,%rbx
93ca7: 49 89 d7 mov %rdx,%r15
93caa: 48 85 db test %rbx,%rbx
93cad: 74 1b je 93cca <_ZN3std4sync4once5tests24wait_for_force_to_finish17h13c0ef8dd5eb6a3aE+0x24a>
The destination of both jumps is panic code. The first jump, corresponding to t1.join().is_ok() is correct, and the second, corresponding to t2.join().is_ok() is broken, thus the test failure.
Even better: this doesn't happen when compiling with the bundled llvm. It also doesn't happen when extracting the test from libstd and compiling with a faulty compiler. It seems the fact that it's part of libstd, and that most of libstd is compiled along the test, plays a role.
I'm trying to get the corresponding mir and llvm-ir.
There are, interestingly, tons of differences between the mir between a build that works, and a build that doesn't. Both with rust master + the patch quoted above, the only difference being one was built with system llvm 5, and the other with the bundled llvm. I'm puzzled.
Some more data:
So this could very well be a case of llvm versions being broken (or compiler integration of said versions).
Even more data:
- With static system llvm trunk built manually: not broken
- With static system llvm 6.0 built manually: not broken
- With shared system llvm 6.0 built manually: not broken
- With shared system llvm 5.0 built manually: not broken
All the above built with clang 5.0. So... tried again with GCC 5.4:
- With shared system llvm 5.0 built manually: not broken
So... this is only happening with Ubuntu builds of llvm (and presumably Debian)?
@sylvestre do you know what's special about them?
Ok, It's too late and I'm tired, all those last attempts were without the patch applied, so they don't count. I'll try again tomorrow.
Not sure. Some ideas:
This reminds me that it would nice to be able to duplicate/"virtualize" lang items for testing purposes, so the old and new standard library (fragments) are completely separated. I should open an issue door that.
So, restarting afresh:
So at least this is consistent with the initial results with ubuntu system llvm and bundled llvm. Time for a bisect.
Bisection identified https://github.com/llvm-mirror/llvm/commit/a57b9dfaee5a3485318bf0d1785e0a2e81310f01 as having fixed this. Which leads me to more questions:
The LLVM patch needs to be backported officially to new 5.* and 6.* patch versions if it fixes miscopilations IMO.
I don't think a new dot release of 5 will happen, 6 is also unlikely.
However, I don't mind taking the patch to debian & ubuntu packages.
Having an upstream bug would help also.
Some more notes:
Needless to say, #50882 is blocked until the latter is addressed in some way.
travis provides llvm 6 too: https://github.com/travis-ci/apt-source-whitelist/pull/373/files
It's not really a matter of what travis provides, it's a matter of what minimal version of system llvm rustc supports, and as of #51899, it's llvm 5.
Interestingly, if the minimal supported version was still 3.9, this would have gone undetected.
Guess why beta is not affected? Because https://github.com/rust-lang/llvm/pull/106 applied https://github.com/llvm-mirror/llvm/commit/a57b9dfaee5a3485318bf0d1785e0a2e81310f01
Backported on the llvm packages https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/commit/1b9a00759063d68cb8b646df91807043643c33a7
Will update this patch when it is uploaded
It appears the miscompilation doesn't happen after #55426.
Most helpful comment
It appears the miscompilation doesn't happen after #55426.