I tried this code:
Cargo.toml:
[package]
name = "bmp-minimal"
version = "0.1.0"
authors = ["Malloc Voidstar <[email protected]>"]
edition = "2018"
[dependencies]
[profile.release]
lto = true
codegen-units = 1
main.rs:
use std::io::Write;
fn main() {
let mut out = Vec::with_capacity(51200);
for val in 0u8..=255 {
out.write_all(&[val, val, val, 0]).unwrap();
}
for _ in (0..1).rev() {
for _ in 0..1 {
out.write_all(&[0]).unwrap();
}
}
println!("{} bytes written", out.len());
}
(extremely minimized form of grayscale BMP encoding from image crate)
Command: cargo run --release
I expected to see this happen: 1025 bytes written
Instead, this happened:
memory allocation of 26843545600 bytes failederror: process didn't exit successfully: `target\release\bmp-minimal.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)
rustc --version --verbose:
rustc 1.45.0 (5c1f21c3b 2020-07-13)
binary: rustc
commit-hash: 5c1f21c3b82297671ad3ae1e8c942d2ca92e84f2
commit-date: 2020-07-13
host: x86_64-pc-windows-msvc
release: 1.45.0
LLVM version: 10.0
It appears to work fine on ARM Linux in my limited testing.
This minimized form works/doesn't work on the following versions:
|Rust version|Works?|
|-|-|
|1.45 (LLVM 10.0)|No|
|1.44.1|No|
|1.43.1|Yes|
|1.42.0|No|
|1.41.1|No|
|1.40.0|No|
|1.39.0|No|
|1.38.0 (LLVM 9.0)|No|
|1.37.0|Yes|
|1.36.0|Yes|
|1.35.0|Yes|
|1.34.0|Yes|
If the LTO and codegen-units settings are removed it works on all above versions.
I originally noticed this when a project that worked on 1.44.1 suddenly started failing with huge memory allocations on 1.45, so this table specifically applies to this repro, not whatever the underlying bug is.
@rustbot ping llvm
Hey LLVM ICE-breakers! This bug has been identified as a good
"LLVM ICE-breaking candidate". In case it's useful, here are some
[instructions] for tackling these sorts of bugs. Maybe take a look?
Thanks! <3
cc @camelid @comex @cuviper @DutchGhost @hanna-kruppe @hdhoang @heyrutvik @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique @vgxbj
It appears to work fine on ARM Linux in my limited testing.
@rustbot ping windows
Hey Windows Group! This bug has been identified as a good "Windows candidate".
In case it's useful, here are some [instructions] for tackling these sorts of
bugs. Maybe take a look?
Thanks! <3
cc @arlosi @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @retep998 @rylev @sivadeilra
Works fine on Windows-gnu:
$ cargo run --release
Compiling bmp-minimal v0.1.0 (C:\Users\mateusz\AppData\Local\Temp\tmp)
Finished release [optimized] target(s) in 3.71s
Running `target\release\bmp-minimal.exe`
1025 bytes written
With MSVC it endlessly allocates memory.
@rustbot modify labels: -O-windows +O-windows-msvc
A bit smaller,
use std::io::Write;
fn main() {
let mut out = Vec::new();
for val in 0u8..=255 {
out.write_all(&[val, val]).unwrap();
}
out.write_all(&[0]).unwrap();
}
[profile.release]
lto = true
codegen-units = 1
Another datapoint: adding -C llvm-args=-cvp-dont-add-nowrap-flags makes it work.
That flag changed from default true to false about a year ago which kinda lines up with the repro chart above (minus 1.43.1). There was an LLVM bug about that causing some miscompilations last I checked.
cc @nikic who I believe worked on that upstream.
@luqmana Nowrap handling in LFTR post-inc canonicalization was fixed when the CVP flag was enabled, so this is most likely a new issue. Can you share how the optimized IR looks like on Windows? In particular it would be interesting to see whether it already infinitely loops there, or whether this goes wrong in the backend (e.g. in LSR).
> $Env:RUSTFLAGS="-C llvm-args=-cvp-dont-add-nowrap-flags"
> cargo run --release
Compiling bmp-minimal v0.1.0 (C:\X\bmp-minimal)
Finished release [optimized] target(s) in 2.39s
Running `target\release\bmp-minimal.exe`
memory allocation of 13421772800 bytes failederror: process didn't exit successfully: `target\release\bmp-minimal.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)
> rustc --version
rustc 1.45.0 (5c1f21c3b 2020-07-13)
Still fails on my end, unless I'm doing something wrong.
Assigning P-critical as discussed as part of the Prioritization Working Group procedure and removing I-prioritize.
Reproduced on LLVM master and simplified: Run opt -loop-reduce -S on this:
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
declare void @dummy(i8 zeroext)
define void @test() {
br label %1
1:
%2 = phi i8 [ 0, %0 ], [ %4, %1 ]
call void @dummy(i8 %2)
%3 = icmp eq i8 %2, -1
%4 = add nuw i8 %2, 1
br i1 %3, label %5, label %1
5:
ret void
}
The icmp gets changed to comparing the output of the 8-bit addition to 0, even though it will be poison if the input was 255 (because the add is marked as nuw).
Heh, so this is indeed the same as the LFTR bug, just for LSR. Surprised it took this long to surface.
Thanks @comex, that's a lot simpler than the repro I ended up getting down to and it also demonstrates it's not windows specific.
@AlyoshaVasilieva commented:
> $Env:RUSTFLAGS="-C llvm-args=-cvp-dont-add-nowrap-flags" > cargo run --release Compiling bmp-minimal v0.1.0 (C:\X\bmp-minimal) Finished release [optimized] target(s) in 2.39s Running `target\release\bmp-minimal.exe` memory allocation of 13421772800 bytes failederror: process didn't exit successfully: `target\release\bmp-minimal.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN) > rustc --version rustc 1.45.0 (5c1f21c3b 2020-07-13)Still fails on my end, unless I'm doing something wrong.
I'm no windows expert, but you might be doing something wrong...
Or at least, here is my local transcript:
c:\Users\pfxbo\issue_74498>set RUSTFLAGS=
set RUSTFLAGS=
c:\Users\pfxbo\issue_74498>echo %RUSTFLAGS%
echo %RUSTFLAGS%
%RUSTFLAGS%
c:\Users\pfxbo\issue_74498>cargo run --release
cargo run --release
Compiling issue_74498 v0.1.0 (C:\Users\pfxbo\issue_74498)
Finished release [optimized] target(s) in 2.25s
Running `target\release\issue_74498.exe`
memory allocation of 53687091200 bytes failederror: process didn't exit successfully: `target\release\issue_74498.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)
c:\Users\pfxbo\issue_74498>set RUSTFLAGS=-Cllvm-args=-cvp-dont-add-nowrap-flags
set RUSTFLAGS=-Cllvm-args=-cvp-dont-add-nowrap-flags
c:\Users\pfxbo\issue_74498>echo %RUSTFLAGS%
echo %RUSTFLAGS%
-Cllvm-args=-cvp-dont-add-nowrap-flags
c:\Users\pfxbo\issue_74498>cargo run --release
cargo run --release
Compiling issue_74498 v0.1.0 (C:\Users\pfxbo\issue_74498)
Finished release [optimized] target(s) in 2.18s
Running `target\release\issue_74498.exe`
1025 bytes written
c:\Users\pfxbo\issue_74498>cargo --version
cargo --version
cargo 1.44.1 (88ba85757 2020-06-11)
c:\Users\pfxbo\issue_74498>rustc --version
rustc --version
rustc 1.44.1 (c7087fe00 2020-06-17)
c:\Users\pfxbo\issue_74498>
Having said that, I did see some evidence in some other runs that maybe setting this flag doesn't resolve the issue in all cases. I haven't gotten to the bottom of those observations yet; I just wanted to note that I had observed some sensitivity to the flag.
@pnkfelix I believe the arg is being set correctly, because:
> $Env:RUSTFLAGS="-C llvm-args=-cvp-dont-add-nowrap-flags-GARBAGE-GARBAGE"
> cargo run --release
error: failed to run `rustc` to learn about target-specific information
Caused by:
process didn't exit successfully: `rustc - --crate-name ___ --print=file-names -C llvm-args=-cvp-dont-add-nowrap-flags-GARBAGE-GARBAGE --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit code: 1)
--- stderr
rustc: Unknown command line argument '-cvp-dont-add-nowrap-flags-GARBAGE-GARBAGE'. Try: 'rustc --help'
rustc: Did you mean '--cvp-dont-add-nowrap-flags'?
> $Env:RUSTFLAGS="-C llvm-args=-cvp-dont-add-nowrap-flags"
> cargo run --release
Finished release [optimized] target(s) in 0.07s
Running `target\release\bmp-minimal.exe`
memory allocation of 13421772800 bytes failederror: process didn't exit successfully: `target\release\bmp-minimal.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)
If RUSTFLAGS wasn't being set properly, then the garbage arg wouldn't cause an issue.
Above is using Powershell; I also tried using cmd using your commands:
>set RUSTFLAGS=-Cllvm-args=-cvp-dont-add-nowrap-flags
>echo %RUSTFLAGS%
-Cllvm-args=-cvp-dont-add-nowrap-flags
>cargo run --release
Compiling bmp-minimal v0.1.0 (C:\X\bmp-minimal)
Finished release [optimized] target(s) in 4.46s
Running `target\release\bmp-minimal.exe`
memory allocation of 13421772800 bytes failederror: process didn't exit successfully: `target\release\bmp-minimal.exe` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)
and received the same failure.
I see you are using Rust 1.44.1 in your example. I tried using that arg on Rust 1.44.1 and it does indeed cause a successful run. On 1.45, however, it fails with or without the arg. (of course, there may be some other thing causing the run to succeed/fail; this is way outside my knowledge base)
Here's the reduced IR of the minimal rust code here: https://github.com/rust-lang/rust/issues/74498#issuecomment-660722030, a bit simplified and with some renaming
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.26.28806"
define i32 @main() unnamed_addr {
entry:
%proc_heap = tail call i8* @GetProcessHeap()
%vec_ptr0 = tail call i8* @HeapAlloc(i8* %proc_heap, i32 0, i64 8)
br label %loop_hdr
loop_hdr: ; preds = %loop_write, %entry
%vec_ptr1 = phi i8* [ %vec_ptr0, %entry ], [ %vec_ptr2, %loop_write ]
%vec_cap1 = phi i64 [ 8, %entry ], [ %new_vec_cap1, %loop_write ]
%vec_len1 = phi i64 [ 0, %entry ], [ %new_vec_len1, %loop_write ]
%val = phi i8 [ 0, %entry ], [ %1, %loop_write ]
%0 = icmp eq i8 %val, 255
%1 = add nuw i8 %val, 1
%2 = sub i64 %vec_cap1, %vec_len1
%3 = icmp ult i64 %2, 2
br i1 %3, label %check_realloc, label %pre_loop_write
check_realloc: ; preds = %loop_hdr
%len_plus_two_checked = tail call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %vec_len1, i64 2)
%checked_res = extractvalue { i64, i1 } %len_plus_two_checked, 1
br i1 %checked_res, label %exit, label %realloc
realloc: ; preds = %check_realloc
%len_plus_two = extractvalue { i64, i1 } %len_plus_two_checked, 0
%cap_times_two = shl i64 %vec_cap1, 1
%4 = icmp ugt i64 %cap_times_two, %len_plus_two
%new_cap = select i1 %4, i64 %cap_times_two, i64 %len_plus_two
%new_vec_ptr = tail call i8* @HeapReAlloc(i8* %proc_heap, i32 0, i8* %vec_ptr1, i64 %new_cap)
br label %loop_write
pre_loop_write: ; preds = %loop_hdr
%new_vec_len2 = add i64 %vec_len1, 2
br label %loop_write
loop_write: ; preds = %realloc, %pre_loop_write
%new_vec_len1 = phi i64 [ %new_vec_len2, %pre_loop_write ], [ %len_plus_two, %realloc ]
%vec_ptr2 = phi i8* [ %vec_ptr1, %pre_loop_write ], [ %new_vec_ptr, %realloc ]
%new_vec_cap1 = phi i64 [ %vec_cap1, %pre_loop_write ], [ %new_cap, %realloc ]
%vec_store_slot1 = getelementptr inbounds i8, i8* %vec_ptr2, i64 %vec_len1
store i8 %val, i8* %vec_store_slot1, align 1
%vec_store_slot2 = getelementptr inbounds i8, i8* %vec_store_slot1, i64 1
store i8 %val, i8* %vec_store_slot2, align 1
br i1 %0, label %exit, label %loop_hdr
exit: ; preds = %loop_write, %check_realloc
%exit_code = phi i32 [ 0, %loop_write ], [ 1, %check_realloc ]
ret i32 %exit_code
}
declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64)
declare i8* @GetProcessHeap() unnamed_addr
declare i8* @HeapAlloc(i8*, i32, i64) unnamed_addr
declare i8* @HeapReAlloc(i8*, i32, i8*, i64) unnamed_addr
Running this through clang -O (I tried v 10.0.0), the resulting binary will continue to allocate memory in an infinite loop.
Running it through opt -loop-reduce will return a slightly different IR:
loop_hdr:
[...]
%0 = add nuw i8 %val, 1
[...]
loop_write:
[...]
%4 = icmp eq i8 %0, 0
br i1 %4, label %exit, label %loop_hdr
This is the same issue @comex mentioned above. LLVM should be clearing the no wrap flag on the add there after transforming to a post-increment check but it doesn't. Then later it just gets rid of the exit condition altogether since it believes that the add will never wrap, it can't possibly be equal to 0.
But that doesn't explain why the flag doesn't seem to work post 1.44, looking at the IR (with -cvp-dont-add-nowrap-flags=true):
1.44.1:
%_5.0.i.i.i = add i8 %iter.sroa.0.060, 1
1.45+:
%7 = add nuw i8 %iter.sroa.0.063, 1
So in 1.45+ we're not relying on LLVM's CVP to determine that the add has no unsigned wrap but some other mechanism. But after looking through print-after-all for a bit I'm pretty sure this new line in Range::next is the reason:
https://github.com/rust-lang/rust/blob/98efae87607aabd7c30b879befe61bf9c29eb978/library/core/src/iter/range.rs#L507
That boils down to an unchecked_add which in LLVM becomes LLVMBuildNUWAdd in this case.
Since that code was introduced in https://github.com/rust-lang/rust/pull/69659 that lines up with the behaviour we see in 1.45+.
Either way, still an LLVM bug that's just triggered more now.
thank you for that excellent analysis @luqmana
Filed https://bugs.llvm.org/show_bug.cgi?id=46943 with LLVM upstream.
The only remaining question on our end is if we should, for the short term, change that range.rs code to not used an unchecked_add, in order to (hopefully) bypass the misoptimization here.
(That, or try to figure out a patch to LLVM to fix the bug on their end.)
To be clear: my thinking is that we could change the range.rs and/or Step code to use wrapping_add rather than unchecked_add.
That way, we shouldn't generate any overflow checking code, so I wouldn't expect any significant performance hit.
At the same time, I'm not really a big fan of putting in such narrowly targetted hacks to work around LLVM bugs, especially ones that have been confirmed to be bugs on LLVM's end.
At the same time, I'm not really a big fan of putting in such narrowly targetted hacks to work around LLVM bugs, especially ones that have been confirmed to be bugs on LLVM's end.
(having said that, this bug does seem to serve as evidence that our code is probably executing an overflowing addition via unchecked_add. Even though that "merely" generates poison under this particular codegen path (that should be unused and thus won't cause UB), strictly speaking, it is a violation of the contract written for Rust's unchecked_add:
Returns the result of an unchecked addition, resulting in undefined behavior when
x + y > T::MAXorx + y < T::MIN.
I'm going to go in and confirm that this add may indeed be overflowing, and if it is, I probably will push for changing to wrapping_add here instead.
Update: okay, its hard to see how this could possibly be causing an overflowing addition under Rust's machine model:
if self.start < self.end {
// SAFETY: just checked precondition
let n = unsafe { Step::forward_unchecked(self.start.clone(), 1) };
...
}
Downgrading priority to P-high as discussed on the compiler team meeting.
I have a patch that fixes (what I believe to be) the underlying LLVM bug; see https://bugs.llvm.org/show_bug.cgi?id=46943 ; I'm working on making a LLVM unit test and then I'll put up a patch for the llvm-devs to review.
Most helpful comment
A bit smaller,