Rust: Regression in rust-nightly 2018-12-02

Created on 8 Dec 2018  ยท  22Comments  ยท  Source: rust-lang/rust

Summary of the issue reported in tiberius at https://github.com/steffengy/tiberius/issues/87.

Issue

When building tiberius with the latest nightlies,
an UDP send using tokio's UdpSockets mysteriously fails on windows:

MIOW CompletionPort::get_many returns an Error (only for release builds)

Io(Os { code: 10047, kind: Other, message: "An address incompatible with the requested protocol was used." })

Reproduction

Checkout tiberius (or built against the version on crates.io) with the following file.
This might only reproduce on windows (in release builds!).
I have not yet confirmed on linux, the code and deps are different though so a linux build might not trigger it - the part returning the error is miow.

main.rs

use tiberius::SqlConnection;

fn main() -> Result<(), Box<std::error::Error>> {
    let future = SqlConnection::connect("server=tcp:127.0.0.1\\MSSQLSERVER;database=tibtest;integratedsecurity=sspi;trustservercertificate=true");
    tokio::executor::current_thread::block_on_all(future).unwrap(); // <- panic here

    println!("Done.");
    Ok(())
}

Bisect

Bisected to commit d3ed34824c31f303b98855fee6fc9213dff1f295
@RalfJung since he might have an idea how this changeset could affect things.

Confirmed it with

  • Command cargo-bisect-rustc --start d09466ceb1374bd0ff1c490bfd50133b8ca67558 --end 4a45578bc58ff262864f72680cc02e83f5d2f5b3 --script=test.bat -v --prompt
searched toolchains d09466ceb1374bd0ff1c490bfd50133b8ca67558 through 4a45578bc58ff262864f72680cc02e83f5d2f5b3
regression in d3ed34824c31f303b98855fee6fc9213dff1f295
  • test.bat
    cargo run --release main.rs

Observations

I could not produce a minimal testcase that reproduces this, since removing code or abstractions always led to removing the error no matter how unrelated it seemed. My first guess therefore was it's somehow optimization related - since the error only seems to occur in release builds - but for me that doesn't quite align with the bisected commit.

A-LLVM I-unsound ๐Ÿ’ฅ P-high T-compiler regression-from-stable-to-beta

Most helpful comment

Submitted https://reviews.llvm.org/D55485 for review.

All 22 comments

Now that is odd. Unfortunately I do not have a good idea, nor do I have a Windows machine to even try and reproduce this. EDIT: On Linux, this code just starts and then blocks (which is expected, I have no MSSQL server running).

@eddyb @nikomatsakis @alexcrichton any idea what might be going on here?

(which is expected, I have no MSSQL server running)

It reproduces without a SQL server running on windows.
Basically it uses UDP to do SQL server instance discovery and the error is returned during
the sending of this discovery packet (before even knowing where to connect to).

But it still makes sense that it doesn't reproduce on linux, since the dependencies
around tokio are vastly different.

Assuming the mentioned commit isn't intrinsically buggy, a possible cause here could be that the use of &mut causes us to emit dereferenceable attributes, leading to different optimization behavior.

@nikic How would you track that down?
Comparing LLVM bitcode/IR of stable (working) with nightly or the generated ASM itself would be my first idea, I might look into that soon.

@steffengy My general recommendation for tracking down optimization bugs (assuming it even is one here) would be a) check whether it still reproduces with -C codegen-units=1, if so use that, otherwise specify -Z no-parallel-llvm and b) use -C llvm-args=-opt-bisect-limit=N to find the lowest value of N for which the issue occurs via bisection. The printed log will then show you which optimization for which function is responsible (though not necessarily the root cause). It's then possible to use -C llvm-args=-filter-print-funcs=mangled_function_name in conjunction with -C llvm-args=-print-before=optname and -C llvm-args=-print-after=optname to view the IR transformation.

If you already know which function is the problematic one, then instead of bisecting it may be easier to directly look at the IR transformation log -C llvm-args=-filter-print-funcs=mangled_function_name -C llvm-args=-print-after-all and see which one introduces the fault transform.

a) It reproduces with either option.
b) from my understanding -C llvm-args=-opt-bisect-limit=0 should disable all optimizations, which would align with the log outptut. It still reproduces unless I omit --release.
(cargo rustc --release -- -C codegen-units=1 -C llvm-args=-opt-bisect-limit=0)

Also f I set codegen-units=1 in Cargo.toml (not via the command line) it surprisingly does not reproduce.

[profile.release]
codegen-units = 1

Also if i set opt-level=2 in Cargo.toml, it does not reproduce.

@steffengy I believe cargo rustc applies the passed flags on the the last compilation, but all your dependencies are still going to be compiled with default flags. In this case the issue is likely introduced in one of the dependencies, so setting flags only on the final compilation doesn't make much of a difference.

Yeah, using RUSTFLAGS=-C codegen-units = 1 I get the same behavior as with the toml.
It does reproduce with RUSTFLAGS=-Z no-parallel-llvm now.
Adding -C llvm-args=-opt-bisect-limit=0 makes it work, so it aleast seems optimization related.

I think multiple compiler invocations (for each dep) make it pretty hard to narrow this down, since the bisectting limit is applied to each invocation for each dependency. If I would figure out the minimum pass number X, I still have to check which invocation actually causes the damage?
Also the log for limit=0 is 2GB, so quite a bit to bisect...

The rustc invocation for the tiberius crate yields the errorneous behavior.
I tracked it down using a rustc wrapper that passes -C llvm-args=-opt-bisect-limit=208995 to only the rustc invocation compiling the tiberius crate. This reproduces the issue. Passing 208994 makes the isuse disappear.
All other dependencies are compiled with full optimizations and no additional flags.

This means I tracked it down to the following pass of the tiberius-compilation:

BISECT: running pass (208995) X86 Avoid Store Forwarding Blocks on function 
(_ZN126_$LT$tiberius..SqlConnection$LT$alloc..boxed..Box$LT$$LP$dyn
$u20$tiberius..BoxableIo$u20$$u2b$$u20$$u27$static$RP$$GT$$GT$$GT$7
connect17h74d4893aded596c4E)

IR dumped before/after this pass (thanks to @nikic):
https://gist.github.com/steffengy/ed6d926798eb2ff7d1f9991dc1a37663


IR-Diff

--- ir_before.txt       2018-12-08 21:02:00.903516800 +0100
+++ ir_after.txt        2018-12-08 21:00:36.770815800 +0100
@@ -1,4 +1,4 @@
-# *** IR Dump Before X86 Avoid Store Forwarding Blocks ***:
+# *** IR Dump After X86 Avoid Store Forwarding Blocks ***:
 # Machine code for function _ZN126_$LT$tiberius..SqlConnection$LT$alloc..boxed..Box$LT$$LP$dyn$u20$tiberius..BoxableIo$u20$$u2b$$u20$$u27$static$RP$$GT$$GT$$GT$7connect17h74d4893aded596c4E: IsSSA, TracksLiveness
 Frame Objects:
   fi#0: size=1, align=1, at location [SP+8]
@@ -2054,10 +2054,20 @@
 ; predecessors: %bb.146
   successors: %bb.148(0x7ffff800), %bb.336(0x00000800); %bb.148(100.00%), %bb.336(0.00%)

-  %961:vr128 = MOVUPSrm %4:gr64, 1, $noreg, -4, $noreg :: (dereferenceable load 16 from %ir.1165, align 4)
+  %1272:gr16 = MOV16rm %4:gr64, 1, $noreg, -4, $noreg :: (dereferenceable load 2 from %ir.1165, align 4)
+  %1273:gr8 = MOV8rm %4:gr64, 1, $noreg, -2, $noreg :: (dereferenceable load 1 from %ir.1165 + 2, align 4)
+  %1274:gr32 = MOV32rm %4:gr64, 1, $noreg, 1, $noreg :: (dereferenceable load 4 from %ir.1165 + 3)
+  %1275:gr16 = MOV16rm %4:gr64, 1, $noreg, 5, $noreg :: (dereferenceable load 2 from %ir.1165 + 7, align 4)
+  %1276:gr32 = MOV32rm %4:gr64, 1, $noreg, 7, $noreg :: (dereferenceable load 4 from %ir.1165 + 9)
+  %1277:gr8 = MOV8rm %4:gr64, 1, $noreg, 11, $noreg :: (dereferenceable load 1 from %ir.1165 + 13, align 4)
   %962:vr128 = MOVUPSrm %4:gr64, 1, $noreg, 12, $noreg :: (dereferenceable load 16 from %ir.1165 + 16, align 4)
   MOVAPSmr %stack.22, 1, $noreg, 16, $noreg, killed %962:vr128 :: (store 16 into %ir.1166 + 16)
-  MOVAPSmr %stack.22, 1, $noreg, 0, $noreg, killed %961:vr128 :: (store 16 into %ir.1166)
+  MOV16mr %stack.22, 1, $noreg, 0, $noreg, killed %1272:gr16 :: (store 2 into %ir.1166, align 16)
+  MOV8mr %stack.22, 1, $noreg, 2, $noreg, killed %1273:gr8 :: (store 1 into %ir.1166 + 2, align 16)
+  MOV32mr %stack.22, 1, $noreg, 5, $noreg, killed %1274:gr32 :: (store 4 into %ir.1166 + 3, align 16)
+  MOV16mr %stack.22, 1, $noreg, 9, $noreg, killed %1275:gr16 :: (store 2 into %ir.1166 + 7, align 16)
+  MOV32mr %stack.22, 1, $noreg, 11, $noreg, killed %1276:gr32 :: (store 4 into %ir.1166 + 9, align 16)
+  MOV8mr %stack.22, 1, $noreg, 15, $noreg, killed %1277:gr8 :: (store 1 into %ir.1166 + 13, align 16)
   EH_LABEL <mcsymbol >
   ADJCALLSTACKDOWN64 32, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
   $rcx = COPY %673:gr64
@@ -2406,10 +2416,20 @@
 ; predecessors: %bb.171
   successors: %bb.173(0x7ffff800), %bb.337(0x00000800); %bb.173(100.00%), %bb.337(0.00%)

-  %882:vr128 = MOVUPSrm %4:gr64, 1, $noreg, -4, $noreg :: (dereferenceable load 16 from %ir.1363, align 4)
+  %1278:gr16 = MOV16rm %4:gr64, 1, $noreg, -4, $noreg :: (dereferenceable load 2 from %ir.1363, align 4)
+  %1279:gr8 = MOV8rm %4:gr64, 1, $noreg, -2, $noreg :: (dereferenceable load 1 from %ir.1363 + 2, align 4)
+  %1280:gr32 = MOV32rm %4:gr64, 1, $noreg, 1, $noreg :: (dereferenceable load 4 from %ir.1363 + 3)
+  %1281:gr16 = MOV16rm %4:gr64, 1, $noreg, 5, $noreg :: (dereferenceable load 2 from %ir.1363 + 7, align 4)
+  %1282:gr32 = MOV32rm %4:gr64, 1, $noreg, 7, $noreg :: (dereferenceable load 4 from %ir.1363 + 9)
+  %1283:gr8 = MOV8rm %4:gr64, 1, $noreg, 11, $noreg :: (dereferenceable load 1 from %ir.1363 + 13, align 4)
   %883:vr128 = MOVUPSrm %4:gr64, 1, $noreg, 12, $noreg :: (dereferenceable load 16 from %ir.1363 + 16, align 4)
   MOVAPSmr %stack.22, 1, $noreg, 16, $noreg, killed %883:vr128 :: (store 16 into %ir.1364 + 16)
-  MOVAPSmr %stack.22, 1, $noreg, 0, $noreg, killed %882:vr128 :: (store 16 into %ir.1364)
+  MOV16mr %stack.22, 1, $noreg, 0, $noreg, killed %1278:gr16 :: (store 2 into %ir.1364, align 16)
+  MOV8mr %stack.22, 1, $noreg, 2, $noreg, killed %1279:gr8 :: (store 1 into %ir.1364 + 2, align 16)
+  MOV32mr %stack.22, 1, $noreg, 5, $noreg, killed %1280:gr32 :: (store 4 into %ir.1364 + 3, align 16)
+  MOV16mr %stack.22, 1, $noreg, 9, $noreg, killed %1281:gr16 :: (store 2 into %ir.1364 + 7, align 16)
+  MOV32mr %stack.22, 1, $noreg, 11, $noreg, killed %1282:gr32 :: (store 4 into %ir.1364 + 9, align 16)
+  MOV8mr %stack.22, 1, $noreg, 15, $noreg, killed %1283:gr8 :: (store 1 into %ir.1364 + 13, align 16)
   EH_LABEL <mcsymbol >
   ADJCALLSTACKDOWN64 32, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
   $rcx = COPY %673:gr64
@@ -3832,11 +3852,13 @@
   %1093:gr32 = MOV32rm %stack.63, 1, $noreg, 0, $noreg :: (dereferenceable load 4 from %ir.2004)
   MOV32mr %stack.83, 1, $noreg, 0, $noreg, killed %1093:gr32 :: (store 4 into %ir.2006)
   %1094:vr128 = MOVAPSrm %stack.30, 1, $noreg, 0, $noreg :: (dereferenceable load 16 from %ir.2003)
-  %1095:vr128 = MOVAPSrm %stack.30, 1, $noreg, 16, $noreg :: (dereferenceable load 16 from %ir.2003 + 16)
+  %1284:gr64 = MOV64rm %stack.30, 1, $noreg, 16, $noreg :: (dereferenceable load 8 from %ir.2003 + 16, align 16)
+  %1285:gr64 = MOV64rm %stack.30, 1, $noreg, 24, $noreg :: (dereferenceable load 8 from %ir.2003 + 24, align 16)
   %1096:vr128 = MOVAPSrm %stack.30, 1, $noreg, 32, $noreg :: (dereferenceable load 16 from %ir.2003 + 32)
   %1097:vr128 = MOVAPSrm %stack.30, 1, $noreg, 48, $noreg :: (dereferenceable load 16 from %ir.2003 + 48)
   MOVAPSmr %stack.22, 1, $noreg, 0, $noreg, killed %1094:vr128 :: (store 16 into %ir.2005)
-  MOVAPSmr %stack.22, 1, $noreg, 16, $noreg, killed %1095:vr128 :: (store 16 into %ir.2005 + 16)
+  MOV64mr %stack.22, 1, $noreg, 16, $noreg, killed %1284:gr64 :: (store 8 into %ir.2005 + 16, align 16)
+  MOV64mr %stack.22, 1, $noreg, 24, $noreg, killed %1285:gr64 :: (store 8 into %ir.2005 + 24, align 16)
   MOVAPSmr %stack.22, 1, $noreg, 32, $noreg, killed %1096:vr128 :: (store 16 into %ir.2005 + 32)
   MOVAPSmr %stack.22, 1, $noreg, 48, $noreg, killed %1097:vr128 :: (store 16 into %ir.2005 + 48)
   %1098:vr128 = MOVAPSrm %stack.30, 1, $noreg, 64, $noreg :: (dereferenceable load 16 from %ir.2003 + 64)

Wow, nice work! This transformation does look rather suspect to me. The first two 16-byte load/store pairs are broken up into load/store sequences which only end up copying 14-bytes, the final two bytes seem to be disappearing into thin air. Unless there is some clever reason for that (e.g. that copy is already present somewhere in the surrounding IR) that would seem like a bug.

Before looking closer into what is happening there, it might be worthwhile to check whether this is the same issue as https://bugs.llvm.org/show_bug.cgi?id=38743, which is the one known issue in this pass that I'm aware of.

Assuming that this tooling works on Windows, it should be possible to check for this using:

cargo install rustup-toolchain-install-master
rustup-toolchain-install-master -a d3ed34824c31f303b98855fee6fc9213dff1f295
cargo +d3ed34824c31f303b98855fee6fc9213dff1f295-alt build --release

If this triggers a "Wrong size division" assertion in X86AvoidSFBPass::buildCopies(), it's probably the same issue.

No that doesn't result in any assertions. cargo-bisect-rustc also uses CI-builds,
so we'd have seen it there already - I still tried to make sure.

My current idea would be to try to identify that transformation in e.g. --emit=llvm-ir
and then verify with lli if changing that has any effect, but not sure how feasible that is yet.

No that doesn't result in any assertions. cargo-bisect-rustc also uses CI-builds,
so we'd have seen it there already - I still tried to make sure.

Thanks! I believe that by default cargo-bisect-rustc uses the normal dist CI builds without LLVM assertions, which is why it's still worthwhile to try with an assertion-enabled alt toolchain.

As you suggest, it would be helpful to get the LLVM IR for this function from --emit=llvm-ir. If the miscompilation still exists after explicitly running it through llc, then it should be possible to construct a reduced test case using bugpoint (though it will need a bit of manual setup, I'm thinking something along the lines of using FileCheck to match for one of those load sequences).

Another note on the MIR diff above, something I hadn't noticed before: The MOV displacements and MMO offsets don't line up. the MOV displacement jumps from -2 to 1 (diff 3), while the MMO offset goes from 2 to 3 (diff 1). Both of these can't be correct at the same time.

I think I have a reduced test case:

define i8 @test_offset(i8* %base) #0 {
entry:
  %z = alloca [128 x i8], align 16
  %gep0 = getelementptr inbounds i8, i8* %base, i64 7
  store volatile i8 0, i8* %gep0
  %gep1 = getelementptr inbounds i8, i8* %base, i64 5
  %bc1 = bitcast i8* %gep1 to i16*
  store volatile i16 0, i16* %bc1
  %gep2 = getelementptr inbounds i8, i8* %base, i64 1
  %bc2 = bitcast i8* %gep2 to i32*
  store volatile i32 0, i32* %bc2

  %y1 = getelementptr inbounds i8, i8* %base, i64 -4
  %y2 = bitcast [128 x i8]* %z to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %y2, i8* %y1, i64 16, i1 false)

  %gep4 = getelementptr inbounds [128 x i8], [128 x i8]* %z, i64 0, i64 4
  %ret = load i8, i8* %gep4
  ret i8 %ret
}

; Function Attrs: argmemonly nounwind
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1) #1

attributes #0 = { "target-cpu"="core-avx2" }

Running this through llc gives:

    pushq   %rax
    .cfi_def_cfa_offset 16
    movb    $0, 7(%rdi)
    movw    $0, 5(%rdi)
    movl    $0, 1(%rdi)
    movzwl  -4(%rdi), %eax
    movw    %ax, -128(%rsp)
    movb    -2(%rdi), %al
    movb    %al, -126(%rsp)
    movl    1(%rdi), %eax
    movl    %eax, -123(%rsp)
    movzwl  5(%rdi), %eax
    movw    %ax, -119(%rsp)
    movb    7(%rdi), %al
    movb    %al, -117(%rsp)
    movl    8(%rdi), %eax
    movl    %eax, -116(%rsp)
    movb    -124(%rsp), %al
    popq    %rcx
    .cfi_def_cfa_offset 8
    retq

Which does not copy the -1..1 range, if I'm reading this correctly.

Submitted https://reviews.llvm.org/D55485 for review.

@nikic
Just built a nightly-stage2 without changes where it expectedly crashes.
Applying your patch to LLVM and rebuilding stage2 fixes it.
So the submitted LLVM patch indeed fixes it, great work!

@nikic That's awesome :D

Patch has landed upstream, so we can include this in the next batch of LLVM cherry-picks.

@nikic I don't want to put pressure here, just to thank you both @steffengy and @nikic for your hard work on this. BTW, when can I expect this LLVM patch to be included in the nightly compiler?

Based on the timing, I'm assuming this regression also made it into beta, which was cut around 4th/5th Dec.

T-compiler triage. P-high. Assigning to @nikic (...if I can)

Was this page helpful?
0 / 5 - 0 ratings