Godbolt: https://rust.godbolt.org/z/PXY1lr
Basically the issue is that you compute s.position + n
, check that it doesn't overflow, and then you verify that s.position + n
is greater than s.position
, which it of course always is, since you checked for an overflow! This is probably an LLVM optimization bug, but I'm starting here :-)
Command line arguments: -C opt-level=3 -C debug-assertions
Source:
pub struct S1<'a> {
data: &'a [u8],
position: usize,
}
pub fn f1<'a>(s: &'a mut S1, n: usize) -> &'a [u8] {
let d = &s.data[s.position..s.position+n];
s.position += n;
return d;
}
ASM:
example::f1:
push rax
mov rax, qword ptr [rdi + 16]
add rsi, rax
jb .LBB0_4
mov rdx, rsi
sub rdx, rax
jb .LBB0_5
mov rcx, qword ptr [rdi + 8]
cmp rcx, rsi
jb .LBB0_6
add rax, qword ptr [rdi]
mov qword ptr [rdi + 16], rsi
pop rcx
ret
.LBB0_4:
lea rdi, [rip + .L__unnamed_1]
call qword ptr [rip + _ZN4core9panicking5panic17h8f422ca74d186618E@GOTPCREL]
ud2
.LBB0_5:
mov rdi, rax
call qword ptr [rip + _ZN4core5slice22slice_index_order_fail17h7ab7c9c6113ae4b8E@GOTPCREL]
ud2
.LBB0_6:
mov rdi, rsi
mov rsi, rcx
call qword ptr [rip + _ZN4core5slice20slice_index_len_fail17h3a0f2c4c66447e54E@GOTPCREL]
ud2
str.0:
.ascii "/tmp/compiler-explorer-compiler119124-62-359u5j.hrbs4/example.rs"
str.1:
.ascii "attempt to add with overflow"
.L__unnamed_1:
.quad str.1
.quad 28
.quad str.0
.quad 64
.long 8
.long 33
Slightly reduced testcase:
define i64 @test(i64* %p, i64 %n) nounwind {
%x = load i64, i64* %p
%y = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %x, i64 %n)
%z = extractvalue { i64, i1 } %y, 1
br i1 %z, label %bb3, label %bb1
bb1:
%a = extractvalue { i64, i1 } %y, 0
%b = icmp ult i64 %a, %x
br i1 %b, label %bb3, label %bb2
bb2:
store i64 %a, i64* %p
ret i64 0
bb3:
ret i64 1
}
declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)
LLVM does not realize that %z
and %b
are the same here.
Ok, looks like the thing to do is file an LLVM bug -- do you know off hand if they're ok with Rust reproducers, or do they prefer C ones?
Ok, ended up converting to C because that seemed easy: https://bugs.llvm.org/show_bug.cgi?id=40845
Submitted https://bugs.llvm.org/show_bug.cgi?id=40846 additionally for the IR pattern above (which is an issue separate from the missing add.with.overflow+sub.with.overflow combining). Just verified that fixing this does indeed eliminate the slice_index_order_fail
check in this example.
@nikic am I correct in reading that this has been waiting on a review of the LLVM patch?
@alex Needs a rebase. More generally though, I think this is fighting a losing battle. uadd.with.overflow and usub.with.overflow are not considered canonical IR for a reason: They optimize badly in the middle end.
I think we'd be better off with generating the canonical variants instead (which are a + b; (a + b) < a
and a - b; a < b
). LLVM will still form uadd.with.overflow/usub.with.overflow during CGP, but this should optimize better in the middle end (including avoiding the issue here).
As we already have a good benchmark for the overhead of overflow checks in #58475, I think it might be worth trying to rerun that without using the unsigned overflow intrinsics.
From the second LLVM bug above:
Fixed in https://github.com/llvm/llvm-project/commit/8db5143b1a1521915c842ebef23cb9fe8fe607ce.
I'm keeping this issue open, because the negated variation of this is not handled yet.
馃帀
馃帀 once this makes it's way into Rust's LLVM (is there a good way to track this?) I'll verify that the assembly in this bug was improved, and if so I'll rebase the rustc patch and we can get some new numbers!
Ok, confirmed that the recent LLVM10 update removes this branch: https://rust.godbolt.org/z/B38vDX
Shall this be closed?
Actually, that was probably premature. We should add a codegen test.
@Mark-Simulacrum is there a good example of what a test looks like? I can spin one up pretty easily if someone can point me at how to add one.
@alex Codegen tests are located in src/test/codegen. One test that is thematically related: slice-position-bounds-check.rs. They use LLVM FileCheck to verify generated LLVM IR.
Codegen tests can be run with:
./x.py test --stage 1 src/test/codegen
Thanks much, that's exactly what I needed! PR is up now.
Most helpful comment
Submitted https://bugs.llvm.org/show_bug.cgi?id=40846 additionally for the IR pattern above (which is an issue separate from the missing add.with.overflow+sub.with.overflow combining). Just verified that fixing this does indeed eliminate the
slice_index_order_fail
check in this example.