This issue tracks the undoing of the -Zmutable-alias=no default introduced in https://github.com/rust-lang/rust/pull/54639 on account of a bug in LLVM. cc @nagisa
(Deja vu?)
I’m still working on figuring out the underlying issue. The interesting ticket is https://github.com/rust-lang/rust/issues/54462.
Using @nagisa's minimal reproduction:
Minimised test case with no unsafe code (make sure to compile with 1 codegen unit!):
fn linidx(row: usize, col: usize) -> usize {
    row * 1 + col * 3
}
fn swappy() -> [f32; 12] {
    let mut mat = [1.0f32, 5.0, 9.0, 2.0, 6.0, 10.0, 3.0, 7.0, 11.0, 4.0, 8.0, 12.0];
    for i in 0..2 {
        for j in i+1..3 {
            if mat[linidx(j, 3)] > mat[linidx(i, 3)] {
                    for k in 0..4 {
                            let (x, rest) = mat.split_at_mut(linidx(i, k) + 1);
                            let a = x.last_mut().unwrap();
                            let b = rest.get_mut(linidx(j, k) - linidx(i, k) - 1).unwrap();
                            ::std::mem::swap(a, b);
                    }
            }
        }
    }
    mat
}
fn main() {
    let mat = swappy();
    assert_eq!([9.0, 5.0, 1.0, 10.0, 6.0, 2.0, 11.0, 7.0, 3.0, 12.0, 8.0, 4.0], mat);
}
I was able to bisect LLVM's optimization passes to find the one causing the error.
Running this command results in a working executeable (replace bug.rs with the name of the file you saved the reproduction in).
rustc -Z no-parallel-llvm -C codegen-units=1 -O -Z mutable-noalias=yes -C llvm-args=-opt-bisect-limit=2260 bug.rs
While running this command results in a broken executable (the `assert_eq`` fails):
rustc -Z no-parallel-llvm -C codegen-units=1 -O -Z mutable-noalias=yes -C llvm-args=-opt-bisect-limit=2261 bug.rs
For this file, optimization 2261 corresponds to Global Value Numbering on function (_ZN3bug6swappy17hdcc51d0e284ea38bE)
Bisecting LLVM revisions (using llvmlab bisect) narrows it down to r305936-r305938, presumably r305938:
[BasicAA] Use MayAlias instead of PartialAlias for fallback.
Note that this is a pretty old change, from June 2017.
Edit: Looking at the commit description, it seems likely that the bug existed prior to that, but was masked by BasicAA preventing later alias passes from running, which is what the commit fixed.  The case involves checking aliasing between a pair of getelementptr instructions where the compiler knows they have the same base address but doesn't know the offsets.
Edit2: Also, passing -enable-scoped-noalias=false as an LLVM option prevents the miscompilation.  (This is not surprising since that disables noalias handling altogether, but just in case it helps…)
From a look at the pre-GVN IR, I feel like the root cause here might be in loop unrolling, depending on whether my understanding of how LLVM aliasing annotations work is correct.
Consider a code like
int *a, *b;
for (int i = 0; i < 4; i++) {
    a[i & 1] = b[i & 1];
}
where a[i & 1] and b[i & 1] do not alias within a single iteration, but a and b in general may alias.
In LLVM IR this would go something like:
define void @test(i32* %addr1, i32* %addr2) {
start:
    br label %body
body:
    %i = phi i32 [ 0, %start ], [ %i2, %body ]
    %j = and i32 %i, 1
    %addr1i = getelementptr inbounds i32, i32* %addr1, i32 %j
    %addr2i = getelementptr inbounds i32, i32* %addr2, i32 %j
    %x = load i32, i32* %addr1i, !alias.scope !2
    store i32 %x, i32* %addr2i, !noalias !2
    %i2 = add i32 %i, 1
    %cmp = icmp slt i32 %i2, 4
    br i1 %cmp, label %body, label %end
end:
    ret void
}
!0 = !{!0}
!1 = !{!1, !0}
!2 = !{!1}
If we run this through -loop-unroll we get:
define void @test(i32* %addr1, i32* %addr2) {
start:
  br label %body
body:                                             ; preds = %start
  %x = load i32, i32* %addr1, !alias.scope !0
  store i32 %x, i32* %addr2, !noalias !0
  %addr1i.1 = getelementptr inbounds i32, i32* %addr1, i32 1
  %addr2i.1 = getelementptr inbounds i32, i32* %addr2, i32 1
  %x.1 = load i32, i32* %addr1i.1, !alias.scope !0
  store i32 %x.1, i32* %addr2i.1, !noalias !0
  %x.2 = load i32, i32* %addr1, !alias.scope !0
  store i32 %x.2, i32* %addr2, !noalias !0
  %addr1i.3 = getelementptr inbounds i32, i32* %addr1, i32 1
  %addr2i.3 = getelementptr inbounds i32, i32* %addr2, i32 1
  %x.3 = load i32, i32* %addr1i.3, !alias.scope !0
  store i32 %x.3, i32* %addr2i.3, !noalias !0
  ret void
}
!0 = !{!1}
!1 = distinct !{!1, !2}
!2 = distinct !{!2}
Note how all four copies of the loop use aliasing metadata over the same aliasing domain. Instead of being noalias within a single iteration, it's noalias across the whole function.
Finally, -scoped-noalias -gvn gives us:
define void @test(i32* %addr1, i32* %addr2) {
start:
  %x = load i32, i32* %addr1, !alias.scope !0
  store i32 %x, i32* %addr2, !noalias !0
  %addr1i.1 = getelementptr inbounds i32, i32* %addr1, i32 1
  %addr2i.1 = getelementptr inbounds i32, i32* %addr2, i32 1
  %x.1 = load i32, i32* %addr1i.1, !alias.scope !0
  store i32 %x.1, i32* %addr2i.1, !noalias !0
  store i32 %x, i32* %addr2, !noalias !0
  store i32 %x.1, i32* %addr2i.1, !noalias !0
  ret void
}
!0 = !{!1}
!1 = distinct !{!1, !2}
!2 = distinct !{!2}
And this will result in incorrect results if a = b + 1.
It's possible to reproduce this issue from C with the following code:
#include "stdio.h"
void copy(int * restrict to, int * restrict from) {
    *to = *from;
}
void test(int *a, int *b) {
    for (int i = 0; i < 4; i++) {
        copy(&b[i & 1], &a[i & 1]);
    }
}
int main() {
    int ary[] = {0, 1, 2};
    test(&ary[1], &ary[0]);
    printf("%d %d %d\n", ary[0], ary[1], ary[2]);
    return 1;
}
With Clang 6.0 this prints 2 2 2 at -O0 and 1 2 2 at -O3. I'm not sure if this code is legal under restrict semantics in C, but I think it should be legal under the stricter noalias semantics of LLVM.
I reduced it to a simple C test case (compile at -O3 and -O0 and compare output):
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
__attribute__((always_inline))
static inline void copy(int *restrict a, int *restrict b) {
    assert(a != b);
    *b = *a;
    *a = 7;
}
__attribute__((noinline))
void floppy(int mat[static 2], size_t idxs[static 3]) {
    for (int i = 0; i < 3; i++) {
        copy(&mat[i%2], &mat[idxs[i]]);
    }
}
int main() {
    int mat[3] = {10, 20};
    size_t idxs[3] = {1, 0, 1};
    floppy(mat, idxs);
    printf("%d %d\n", mat[0], mat[1]);
}
Note that if you remove restrict, the C equivalent of noalias, behavior is correct.  Yet even then, the assert(a != b) passes, proving that no UB can occur due to calling it with restrict.
What's happening is:
for (int i = 0; i < 3; i++) {
    mat[idxs[i]] = mat[i%2]; mat[i%2] = 7;
}
mat[idxs[0]] = mat[0]; mat[0] = 7; /* from copy(&mat[0%2], &mat[idxs[0]]) */
mat[idxs[1]] = mat[1]; mat[1] = 7; /* from copy(&mat[1%2], &mat[idxs[1]]) */
mat[idxs[2]] = mat[0]; mat[0] = 7; /* from copy(&mat[2%2], &mat[idxs[2]]) */
mat[0] cannot alias with mat[idxs[1]] or mat[1], ergo it cannot have been changed between mat[0] = 7; and mat[idxs[2]] = mat[0];, ergo it's safe for global value numbering to optimize the latter to mat[idxs[2]] = 7;.  But mat[0] does alias with mat[idxs[1]], because idxs[1] == 0.  And we did not promise it wouldn't, because on the second iteration when &mat[idxs[1]] is passed to copy, the other argument is &mat[1].  So why does LLVM think it can't?
Well, it has to do with the way copy is inlined.  The noalias function attribute is turned into !alias.scope and !noalias metadata on the load and store instructions, like:
  %8 = load i32, i32* %0, align 4, !tbaa !8, !alias.scope !10, !noalias !13
  store i32 %8, i32* %7, align 4, !tbaa !8, !alias.scope !13, !noalias !10
  store i32 7, i32* %0, align 4, !tbaa !8, !alias.scope !10, !noalias !13
Normally, if a function is inlined multiple times, each copy gets its own unique IDs for alias.scope and noalias, indicating that each call represents its own 'inequality' relationship* between the pair of arguments marked noalias (restrict at C level), which may have different values for each call.
However, in this case, first the function is inlined into the loop, then the inlined code is duplicated when the loop is unrolled – and this duplication does not change the IDs.  Because of this, LLVM thinks none of the a's can alias with any of the b's, which is false, because a from the first and third calls aliases with b from the second call (all pointing to &mat[0]).
Amazingly, GCC also miscompiles this, with different output.  (clang and GCC at -O0 both output 7 10; clang at -O3 outputs 7 7; GCC at -O3 outputs 10 7.)  Uh, I really hope I didn't screw something up and add UB after all, but I don't see how...
* It's a bit more complicated than that, but in this case, since copy does not use any pointer arithmetic and writes to both pointers, the inequality a != b is necessary and sufficient for a call not to be UB.
Heh, looks like I raced with @nikic to find the same explanation. Their test case is slightly nicer :)
That's some really great timing ^^ We reached the same conclusion with nearly the same reduced test case at the same time :)
To fix this, probably something along the lines of https://github.com/llvm-mirror/llvm/blob/54d4881c352796b18bfe7314662a294754e3a752/lib/Transforms/Utils/InlineFunction.cpp#L801 needs to be also be done in LoopUnrollPass.
I've submitted an LLVM bug report for this issue at https://bugs.llvm.org/show_bug.cgi?id=39282.
And – just mentioning this for completeness – I submitted a bug report to GCC since it also miscompiled my C test case: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87609
Triage: If I'm reading this correctly, the LLVM fix was accepted (https://reviews.llvm.org/D9375). I'm unsure what that means for actually merging into LLVM, or when that happened; someone more knowledgeable about LLVM's revision process should check if the issue is fixed now (and for what versions).
It's not merged, the review process was a bit odd and the person who OK'ed it don't review patches anymore.
There's been a call for testing with the "full restrict" patch set. It would probably be valuable if someone tried out re-enabling noalias in Rust on top of an llvm that has this patch set applied.
I'm willing to try to do that.
I have access to moderately powerful servers (48 HT cores, 128G of ram) and I think I can manage to build everything correctly with the patch.
Once I have a working toolchain, which crates would you recommend to try?
I finished merging  all rust-specific commits on llvm's upstream master and then applied the patch.
Here is the resulting branch: https://github.com/PaulGrandperrin/llvm-project/tree/llvm-master-with-rustlang-patches-and-D69542
I'll now try compiling the toolchain and try it
First make sure you revert: https://github.com/rust-lang/rust/pull/54639 so that Rust is actually emitting noalias.
A good first thing to try is something like:
pub fn adds(a: &mut i32, b: &mut i32) {
    *a += *b;
    *a += *b;
}
and confirm that it compiles to something like:
example::adds:
        mov     eax, dword ptr [rsi]
        add     eax, eax
        add     dword ptr [rdi], eax
        ret
and not
example::adds:
        mov     eax, dword ptr [rdi]
        add     eax, dword ptr [rsi]
        mov     dword ptr [rdi], eax
        add     eax, dword ptr [rsi]
        mov     dword ptr [rdi], eax
        ret
Next, ensure that the code from https://github.com/rust-lang/rust/issues/54462#issue-362850708 no longer miscompiles.
@jrmuizel Note that #54639 does include @nagisa's minimal reproducer for #54462 as a new compiler test. Maybe a full revert is not in order?
I don't think it is important to include it or not because AFAIK it is only about changing some default flag (which can be overridden with -Zmutable-noalias=yes) and I can manually compile the test file mentioned.
Just so you know, I'm still trying to build LLVM but as of now I get a compilation error with or without the rust specific patches applied (ie: just upstream's llvm master + the patch fails too):
In file included from /usr/include/c++/8/cmath:45,
                 from /opt/rust/src/llvm-project/llvm/include/llvm-c/DataTypes.h:28,
                 from /opt/rust/src/llvm-project/llvm/include/llvm/Support/DataTypes.h:16,
                 from /opt/rust/src/llvm-project/llvm/include/llvm/ADT/Hashing.h:47,
                 from /opt/rust/src/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:12,
                 from /opt/rust/src/llvm-project/llvm/include/llvm/Transforms/Utils/NoAliasUtils.h:16,
                 from /opt/rust/src/llvm-project/llvm/lib/Transforms/Utils/NoAliasUtils.cpp:13:
/opt/rust/src/llvm-project/llvm/lib/Transforms/Utils/NoAliasUtils.cpp: In function ‘void llvm::cloneNoAliasScopes(llvm::ArrayRef<llvm::MetadataAsValue*>, llvm::DenseMap<llvm::MDN
ode*, llvm::MDNode*>&, llvm::DenseMap<llvm::MetadataAsValue*, llvm::MetadataAsValue*>&, llvm::StringRef, llvm::LLVMContext&)’:
/opt/rust/src/llvm-project/llvm/lib/Transforms/Utils/NoAliasUtils.cpp:174:30: error: no matching function for call to ‘llvm::AliasScopeNode::AliasScopeNode(double)’
         llvm::AliasScopeNode SNAN(MD);
                              ^~~~
In file included from /opt/rust/src/llvm-project/llvm/include/llvm/IR/TrackingMDRef.h:16,
                 from /opt/rust/src/llvm-project/llvm/include/llvm/IR/DebugLoc.h:17,
                 from /opt/rust/src/llvm-project/llvm/include/llvm/IR/Instruction.h:21,
                 from /opt/rust/src/llvm-project/llvm/include/llvm/IR/BasicBlock.h:22,
                 from /opt/rust/src/llvm-project/llvm/include/llvm/IR/Instructions.h:27,
                 from /opt/rust/src/llvm-project/llvm/include/llvm/Transforms/Utils/NoAliasUtils.h:22,
                 from /opt/rust/src/llvm-project/llvm/lib/Transforms/Utils/NoAliasUtils.cpp:13:
/opt/rust/src/llvm-project/llvm/include/llvm/IR/Metadata.h:1446:12: note: candidate: ‘llvm::AliasScopeNode::AliasScopeNode(const llvm::MDNode*)’
   explicit AliasScopeNode(const MDNode *N) : Node(N) {}
            ^~~~~~~~~~~~~~
/opt/rust/src/llvm-project/llvm/include/llvm/IR/Metadata.h:1446:12: note:   no known conversion for argument 1 from ‘double’ to ‘const llvm::MDNode*’
/opt/rust/src/llvm-project/llvm/include/llvm/IR/Metadata.h:1445:3: note: candidate: ‘constexpr llvm::AliasScopeNode::AliasScopeNode()’
   AliasScopeNode() = default;
   ^~~~~~~~~~~~~~
/opt/rust/src/llvm-project/llvm/include/llvm/IR/Metadata.h:1445:3: note:   candidate expects 0 arguments, 1 provided
/opt/rust/src/llvm-project/llvm/include/llvm/IR/Metadata.h:1441:7: note: candidate: ‘constexpr llvm::AliasScopeNode::AliasScopeNode(const llvm::AliasScopeNode&)’
 class AliasScopeNode {
       ^~~~~~~~~~~~~~
/opt/rust/src/llvm-project/llvm/include/llvm/IR/Metadata.h:1441:7: note:   no known conversion for argument 1 from ‘double’ to ‘const llvm::AliasScopeNode&’
/opt/rust/src/llvm-project/llvm/include/llvm/IR/Metadata.h:1441:7: note: candidate: ‘constexpr llvm::AliasScopeNode::AliasScopeNode(llvm::AliasScopeNode&&)’
/opt/rust/src/llvm-project/llvm/include/llvm/IR/Metadata.h:1441:7: note:   no known conversion for argument 1 from ‘double’ to ‘llvm::AliasScopeNode&&’
/opt/rust/src/llvm-project/llvm/lib/Transforms/Utils/NoAliasUtils.cpp:177:31: error: request for member ‘getName’ in ‘__builtin_nans(((const char*)""))’, which is of non-class ty
pe ‘double’
         auto ScopeName = SNAN.getName();
                               ^~~~~~~
/opt/rust/src/llvm-project/llvm/lib/Transforms/Utils/NoAliasUtils.cpp:187:39: error: request for member ‘getDomain’ in ‘__builtin_nans(((const char*)""))’, which is of non-class
type ‘double’
             const_cast<MDNode *>(SNAN.getDomain()), Name);
                                       ^~~~~~~~~
[ 75%] Building CXX object lib/Target/Hexagon/CMakeFiles/LLVMHexagonCodeGen.dir/RDFCopy.cpp.o
make[2]: *** [lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/build.make:635: lib/Transforms/Utils/CMakeFiles/LLVMTransformUtils.dir/NoAliasUtils.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
Maybe the LLVM master already went out of sync with the patch (which could happen considering how big the patch is and how fast the LLVM master is moving) and you need to build against an older revision of the LLVM master? It would definitely be worthwhile to ping the patch author about this.
That's exactly what's I'm doing, trying to find an older revision that builds with the patch :-)
I'm now sure the problem is not with llvm/master but is from the patch.
The oldest commit from llvm/master that is still compatible with the patch is https://github.com/llvm/llvm-project/commit/5b99c189b3bfc0faa157f7ca39652c0bb8c315a7 but even that far back the patch fails to compile.
I'm too tired and lazy to try to understand the C++ now, I'll try again tomorrow.
In the meantime, can someone contact the author of the patch to ask for help?
I don't think you will be able to easily use master LLVM with Rust (after you actually build it) without patching rustllvm. AFAIK it supports only releases 6-9 right now.
@mati865 I tried first to apply the patch on rust's llvm-9 fork but it was very much not trivial too...
The patch is apparently rebased on top of llvm/llvm-project@82d3ba87d06f9e2abc6e27d8799587d433c56630. Does it build for you if you apply on top of that?
@jrmuizel thanks I'll try that!
In the meantime, I've been able to successfully adapt rustllvm to build with llvm master.
Ping @PaulGrandperrin , any updates?
Realistically speaking, what is the estimated timeline and the overall chances of this patch ever getting merged, considering its size?
@MSxDOS You'd want to ask the LLVM developers. In general I would expect that the size of a patch matters less than the owners' desire to see it merged, so the question to ask is how much LLVM wants to see it land.
Here's the latest status that I've seen: https://reviews.llvm.org/D69542#1836439
i assume at some point it will cease to be relevant if https://github.com/bytecodealliance/cranelift~~ https://github.com/bjorn3/rustc_codegen_cranelift works out?
@leeoniya, there's no short term plan to use cranelift for opt builds. cranelift doesn't have a bunch of the optimization work that would be needed to make that possible.
I was surprised to find out just how conservative the compiler is with respect to assuming there could be aliasing without this option. For example:
fn baz(s: &mut S) {
    if s.y < 10 {
        s.x = foo();
    }
    if s.y < 5 {
        s.x = foo();
    }
}
Because it accesses the struct members through a &mut, it assumes that s.x and s.y can alias, so requires two memory accesses instead of one for s.y. That is really unfortunate, when you consider how many times member reads/writes via &mut must be interleaved in a typical program.
Edit: based on some testing, this doesn't seem to affect all such reads/writes, which isn't surprising, because it would probably kill performance if it did. Still, using -Z mutable-noalias fixes the double memory access in the above example, so some cases may be busted.
@PaulGrandperrin there's a new version of this patch up at https://reviews.llvm.org/D69542 based off of llvm@9fb46a452d4e5666828c95610ceac8dcd9e4ce16. Are you willing to try getting it running again?
Most helpful comment
I reduced it to a simple C test case (compile at -O3 and -O0 and compare output):
Note that if you remove
restrict, the C equivalent ofnoalias, behavior is correct. Yet even then, theassert(a != b)passes, proving that no UB can occur due to calling it withrestrict.What's happening is:
mat[0]cannot alias withmat[idxs[1]]ormat[1], ergo it cannot have been changed betweenmat[0] = 7;andmat[idxs[2]] = mat[0];, ergo it's safe for global value numbering to optimize the latter tomat[idxs[2]] = 7;.But
mat[0]does alias withmat[idxs[1]], becauseidxs[1] == 0. And we did not promise it wouldn't, because on the second iteration when&mat[idxs[1]]is passed tocopy, the other argument is&mat[1]. So why does LLVM think it can't?Well, it has to do with the way
copyis inlined. Thenoaliasfunction attribute is turned into!alias.scopeand!noaliasmetadata on the load and store instructions, like:Normally, if a function is inlined multiple times, each copy gets its own unique IDs for alias.scope and noalias, indicating that each call represents its own 'inequality' relationship* between the pair of arguments marked
noalias(restrictat C level), which may have different values for each call.However, in this case, first the function is inlined into the loop, then the inlined code is duplicated when the loop is unrolled – and this duplication does not change the IDs. Because of this, LLVM thinks none of the
a's can alias with any of theb's, which is false, becauseafrom the first and third calls aliases withbfrom the second call (all pointing to&mat[0]).Amazingly, GCC also miscompiles this, with different output. (clang and GCC at -O0 both output
7 10; clang at -O3 outputs7 7; GCC at -O3 outputs10 7.) Uh, I really hope I didn't screw something up and add UB after all, but I don't see how...* It's a bit more complicated than that, but in this case, since
copydoes not use any pointer arithmetic and writes to both pointers, the inequalitya != bis necessary and sufficient for a call not to be UB.