A user reported a code example that shows that in Go 1.7 there is bounds check elimination and in rust not.
https://users.rust-lang.org/t/bounds-check-elimination-in-go-1-7/7008
A reduced test case would help here.
The thread has a simple test case. Is this not sufficient? Maybe the author of the thread could write something here. I linked this bug report in the thread.
It's not clear from the thread exactly which bounds checks are not being eliminated that are eliminated by Go. A reduced test case would ideally point out a single array access that has a bounds check in Rust but not in Go.
There are many examples in the linked thread, and a quick read doesn't make it obvious where there are bounds checks to eliminate. Clearly there are optimization opportunities here, but this thread needs reduced examples with a clear description of which bounds checks are not being removed.
Disclaimer: I know very little Rust and even less Go. I'm somewhat familiar with LLVM, though, so I took a look, comparing the original Go results at http://www.tapirgames.com/blog/golang-1.7-bce to the Rust as given in the forum post.
I'm using rustc
1.11.0 and LLVM 3.8.1.
Note that the Rust translations do not seem to be quite equivalent: the Go code seems to perform no-op accesses on array slices whereas the Rust versions increment the array values. This complicates the generated code heavily, so to simplify the analysis I replaced increment ops like s[i] += 1
with no-op accesses like s[i]
. This did not seem to have an effect on the amount of bounds checks in the generated code but made my life easier.
I'll go through the functions in order.
rustc -C opt-level=3
eliminates the same bounds checks as Go does, as is to be expected for these simple cases.
Here's the discrepancy. The Go code:
func f5(s []int) {
for i := range s {
_ = s[i]
_ = s[i:len(s)]
_ = s[:i+1]
}
}
The Rust translation from the forum post, with increment operations removed and usize
changed to isize
to be pedantic:
fn f5(s: &mut [isize]) {
for i in 0 .. s.len() {
s[i];
for j in i .. s.len() { s[j]; }
for j in 0 .. i + 1 { s[j]; }
}
}
Looking at the output of rustc -C opt-level=3 --emit llvm-ir
, the first access and the first inner loop are completely optimized out here, but the second inner loop remains, and there is a bounds check on the s[j]
access therein.
Constructing what I believe to be equivalent C we can see whether this is a rustc
or LLVM problem:
#include <stdlib.h>
void f5(long* s, size_t len) {
for (size_t i = 0; i < len; i++) {
for (size_t j = 0; j <= i + 1; j++) {
if (j >= len) {
abort();
}
}
}
}
The LLVM IR output by clang -O3 -S -emit-llvm
is practically identical, so we can't blame rustc
here: LLVM doesn't realize that j >= len
will never be true here.
Going back to the original Go, we see that it's constructing array slices instead of doing individual element access like the Rust translation. So we could write a Rust version of the function that loops over slices instead:
fn f5(s: &mut [isize]) {
for i in 0 .. s.len() {
s[i];
let len = s.len();
for x in s[i .. len].iter_mut() { *x; }
for x in s[0 .. i + 1].iter_mut() { *x; }
}
}
No bounds checks are emitted for the above code, but there's still a no-op loop that is not eliminated. I'm not sure why that happens but I'm sure it's tied with the bounds check issues.
Going even closer to the original Go:
fn f5(s: &mut [isize]) {
for i in 0 .. s.len() {
s[i];
let len = s.len();
&s[i .. len];
&s[0 .. i + 1];
}
}
The bounds check is back: LLVM doesn't realize that i + 1 > len
cannot be true.
Codegen is practically identical to f5
, no further insights available here.
Same results as Go, no problems here.
I made some incorrect conclusions here, please see my comment below for better ones. LLVM bug 810 is not strongly related to this and my examples aren't entirely relevant.
In the end I think this boils down to the now over ten years old LLVM bug 810. (There may be more specific bugs for these kinds of use cases related to array bounds checks but I didn't find any.) LLVM simply doesn't have the smarts required to perform reasoning like "if i < n
, then we know that !(i + 1 > n)
". The following C function, which LLVM 3.8.1 can't optimize to a no-op, demonstrates this fairly succintly (doing the subtraction with a signed int is required so that LLVM can assume that it does not overflow — I believe that's how Rust's integers work as well?):
#include <stdlib.h>
void check(unsigned int i, unsigned int len) {
if (i < len) {
if (i > (int)len - 1) {
abort();
}
}
}
Note that if the inner condition is changed to the equivalent i >= len
, LLVM _does_ know that it's false, presumably because it recognizes it as the negation of i < len
. This suggests that if LLVM were to canonicalize i > x - 1
and i + 1 > x
as i >= x
this particular case would be solved. But this more obtuse example would still have a problem:
fn obtuse(s: &mut [isize]) {
if s.len() >= 2 {
for i in 0 .. s.len() {
if i < s.len() - 2 {
&s[0 .. i + 2];
}
}
}
}
Just like this C code does:
#include <stdlib.h>
void check(unsigned int i, unsigned int len) {
if (len >= 2) {
if (i < len - 2) {
if ((int)i + 2 > len) {
abort();
}
}
}
}
For what it's worth, GCC 6.2.1 can't eliminate the checks in any of the above C snippets either, so this doesn't seem like a major deficiency in LLVM. In the end, kudos to the Go folks for beating industry standard C compilers, but this seems like LLVM's problem, not Rust's.
So, does it make any sense to extend the old LLVM bug report?
@Deewiant Thank you, very much!!!
No, I don't think it does. I thought about this some more and did my research more thoroughly and came to the conclusion that while a fix for the old bug 810 might be helpful it's not really pertinent to this case. I'll edit my comment.
Instead, this is a combo of two things:
i + 1 > x
as i >= x
(under the assumption that i + 1
does not overflow), but only for signed comparisons! I submitted a patch for the unsigned case, so hopefully this will be taken care of soon: https://reviews.llvm.org/D24700. In terms of the Rust examples this would allow recognizing that i + 1 > s.len()
is equivalent to i >= s.len()
, which is then recognized as the negation of i < s.len()
and so can be assumed to be false inside the loop.i + 1
does not overflow even though we know that i < s.len()
. Previously I assumed that Rust integers had undefined behaviour on overflow, but I noticed that the emitted LLVM didn't match that assumption and Myths and Legends about Integer Overflow in Rust confirmed that this is not the case. Therefore we need LLVM to infer that overflow is not possible here, so that it can proceed with the canonicalization of i + 1 > x
to i >= x
. I filed LLVM#30428 for this because I'm not sure how it should be implemented.For the general case such as my more obtuse example where I used len - 2
and i + 2
, I submitted LLVM#30429, but that one's not important for this issue.
I appreciate your efforts! – Thank you, very much!
@eddyb I wonder if the root cause of this issue is related to the one of #13018.
Real world code sometimes cannot just use the slice iterator etc and must instead rely on using ranges of indices, it would be nice to be able to fix this, though I suspect this is a pretty difficult task.
Godbolt playground with f5
and obtuse
, where we can see the bound checks still exist in current nightly: https://godbolt.org/g/e4NpSz
Cc @rust-lang/wg-codegen
Perhaps miri could help here to more generally find where bounds checks can be removed? I don't know much about how, but perhaps if it could figure out the way to push up the index as much as it can then it can interpret it and figure out whether it's still out of bounds in that situation?
Most helpful comment
Disclaimer: I know very little Rust and even less Go. I'm somewhat familiar with LLVM, though, so I took a look, comparing the original Go results at http://www.tapirgames.com/blog/golang-1.7-bce to the Rust as given in the forum post.
I'm using
rustc
1.11.0 and LLVM 3.8.1.Note that the Rust translations do not seem to be quite equivalent: the Go code seems to perform no-op accesses on array slices whereas the Rust versions increment the array values. This complicates the generated code heavily, so to simplify the analysis I replaced increment ops like
s[i] += 1
with no-op accesses likes[i]
. This did not seem to have an effect on the amount of bounds checks in the generated code but made my life easier.I'll go through the functions in order.
f1, f2, f3, f4
rustc -C opt-level=3
eliminates the same bounds checks as Go does, as is to be expected for these simple cases.f5
Here's the discrepancy. The Go code:
The Rust translation from the forum post, with increment operations removed and
usize
changed toisize
to be pedantic:Looking at the output of
rustc -C opt-level=3 --emit llvm-ir
, the first access and the first inner loop are completely optimized out here, but the second inner loop remains, and there is a bounds check on thes[j]
access therein.Constructing what I believe to be equivalent C we can see whether this is a
rustc
or LLVM problem:The LLVM IR output by
clang -O3 -S -emit-llvm
is practically identical, so we can't blamerustc
here: LLVM doesn't realize thatj >= len
will never be true here.Going back to the original Go, we see that it's constructing array slices instead of doing individual element access like the Rust translation. So we could write a Rust version of the function that loops over slices instead:
No bounds checks are emitted for the above code, but there's still a no-op loop that is not eliminated. I'm not sure why that happens but I'm sure it's tied with the bounds check issues.
Going even closer to the original Go:
The bounds check is back: LLVM doesn't realize that
i + 1 > len
cannot be true.f6
Codegen is practically identical to
f5
, no further insights available here.f7, f8, f9
Same results as Go, no problems here.
Conclusions
I made some incorrect conclusions here, please see my comment below for better ones. LLVM bug 810 is not strongly related to this and my examples aren't entirely relevant.
In the end I think this boils down to the now over ten years old LLVM bug 810. (There may be more specific bugs for these kinds of use cases related to array bounds checks but I didn't find any.) LLVM simply doesn't have the smarts required to perform reasoning like "if
i < n
, then we know that!(i + 1 > n)
". The following C function, which LLVM 3.8.1 can't optimize to a no-op, demonstrates this fairly succintly (doing the subtraction with a signed int is required so that LLVM can assume that it does not overflow — I believe that's how Rust's integers work as well?):Note that if the inner condition is changed to the equivalent
i >= len
, LLVM _does_ know that it's false, presumably because it recognizes it as the negation ofi < len
. This suggests that if LLVM were to canonicalizei > x - 1
andi + 1 > x
asi >= x
this particular case would be solved. But this more obtuse example would still have a problem:Just like this C code does:
For what it's worth, GCC 6.2.1 can't eliminate the checks in any of the above C snippets either, so this doesn't seem like a major deficiency in LLVM. In the end, kudos to the Go folks for beating industry standard C compilers, but this seems like LLVM's problem, not Rust's.