The following code compiles fine on Mac debug build, but fails on release build. The same happens on rust playground. This is just part of some code I use for debugging.
The remarkable thing is that in the larger example I have I use the same type of inline assembly and it compiles fine on --release, so I don't know what to conclude from this:
I tried this code (minimal example):
#![feature(asm)]
const SSIZE: usize = 64;
fn main() {
let mut ctx = ThreadContext::default();
let mut n = ThreadContext::default();
let mut stack = vec![0_u8; SSIZE];
let stack_ptr = stack.as_mut_ptr();
unsafe {
std::ptr::write(
stack_ptr.offset((SSIZE - 8) as isize) as *mut u64,
test as u64,
);
ctx.rsp = stack_ptr.offset((SSIZE - 8) as isize) as u64;
}
switch(&mut ctx, &mut n);
}
fn test() -> ! {
println!("WORKS!");
loop {}
}
fn switch(old: *mut ThreadContext, new: *mut ThreadContext) {
unsafe {
asm!("
mov %rsp, 0x00($0)
"
: "=*m"(old)
: "rm"(new)
: "rsp", "r15", "r14", "r13", "r12", "rbx", "rbp", "memory"
);
dbg!(&*old);
};
}
#[derive(Debug, Default)]
#[repr(C)]
struct ThreadContext {
rsp: u64,
r15: u64,
r14: u64,
r13: u64,
r12: u64,
rbx: u64,
rbp: u64,
}
I expected it to compile on both release and debug build. Instead, i got this error when compiling for release:
error: <inline asm>:2:22: error: expected register here
mov 0x00(-48(%rbp)), %rsp
^
--> src/main.rs:9:9
|
9 | / asm!("
10 | | mov 0x00($0), %rsp
11 | | ret
12 | | "
... |
15 | | : "rsp", "r15", "r14", "r13", "r12", "rbx", "rbp", "memory"
16 | | );
| |__________^
error: aborting due to previous error
error: Could not compile `playground`.
Easiest steps to reproduce. See this playground link:
https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=23908e070df55393682c9a922c80085b
You will need to choose "run" or "show assembly", somehow if you choose build it builds without errors in the playground.
rustc 1.37.0-nightly (37d001e4d 2019-05-29)
binary: rustc
commit-hash: 37d001e4deb206ed954fde5d91690221e8306fc3
commit-date: 2019-05-29
host: x86_64-apple-darwin
release: 1.37.0-nightly
LLVM version: 8.0
I just tested this on Windows as well and the same problem there so this does not seem to be platform specific.
rustc 1.37.0-nightly (627486af1 2019-06-02)
binary: rustc
commit-hash: 627486af15d222bcba336b12ea92a05237cc9ab1
commit-date: 2019-06-02
host: x86_64-pc-windows-msvc
release: 1.37.0-nightly
LLVM version: 8.0
I also narrowed down the minimal example to reproduce:
#![feature(asm)]
#[derive(Default)]
struct ThreadContext;
fn gt_switch(new: *const ThreadContext) {
unsafe {
asm!("
mov 0x00($0), %rsp
ret
"
:
: "m"(new)
);
}
}
fn main() {
let ctx = ThreadContext::default();
gt_switch(&ctx);
}
The code will segfault if run of course but since this is a compile time error the interesting part is the compilation. It seems to be related to the "m" constraint (which is not needed in the example but is needed in the original codebase).
Not actually a bug, I think. Or at least a complicated one. Can fix this code by getting rid of the unneeded 0x0 offset:
mov $0, %rsp
or changing the constraint to "r". The problem seems to be that new is on the stack, and there's no way to reference that location without offset addressing.
The real bug here is that the original code compiles on debug. On my box I get
movq (%rsp), %rsp
for the offending line in debug mode, which has somehow both removed the offending offset and left new in a register. I don't know how that works. I don't know how any of it works.
Well, I'm not 100% sure either if it working in debug is a bug or not working in release is a bug. What I do know is that the code works fine on debug doing what I would expect it to do. And the above is a contrived example, and the smallest I could make. The real code looks like this:
unsafe fn switch(old: *mut ThreadContext, new: *const ThreadContext) {
asm!("
mov %rsp, 0x00($0)
mov %r15, 0x08($0)
mov %r14, 0x10($0)
mov %r13, 0x18($0)
mov %r12, 0x20($0)
mov %rbx, 0x28($0)
mov %rbp, 0x30($0)
mov 0x00($1), %rsp
mov 0x08($1), %r15
mov 0x10($1), %r14
mov 0x18($1), %r13
mov 0x20($1), %r12
mov 0x28($1), %rbx
mov 0x30($1), %rbp
ret
"
: "=*m"(old)
: "r"(new)
:
: "alignstack" // needed to work on windows
);
}
Which according to the LLVM Reference should work (I think): https://llvm.org/docs/LangRef.html#indirect-inputs-and-outputs
Indirect output or input constraints can be specified by the “*” modifier (which goes after the “=” in case of an output). This indicates that the asm will write to or read from the contents of an address provided as an input argument. (Note that in this way, indirect outputs act more like an input than an output: just like an input, they consume an argument of the call expression, rather than producing a return value. An indirect output constraint is an “output” only in that the asm is expected to write to the contents of the input memory location, instead of just read from it).
This is most typically used for memory constraint, e.g. “=*m”, to pass the address of a variable as a value.
What seems to happen is that the compiler inserts an additional "offset" in release mode, that somehow compiles in debug mode, but its pretty hard to debug...
If you want to play around with a full working example that actually runs something useful, I have one here:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=be07fd6bf995a59eaf45d76fb21a48a1
So I translated the original example into C and tried it with Clang and GCC. Neither one would take it with an "m" or "m" constraint: needs to be "r" or "r". By the way probably want to clobber "rsp" in that asm: doesn't matter here though I think.
OK, so I decided to try out something else. Since there really isn't any output from this function it shouldn't need to be an output parameter so if I set both as an input parameter and set a volatile flag (which I think is done anyways if there are no output parameters passed in). Lastly, to avoid variations between release and debug I specified what registers to use like this:
#[naked]
unsafe fn switch(old: *mut ThreadContext, new: *const ThreadContext) {
asm!("
mov %rsp, 0x00(%rdi)
mov %r15, 0x08(%rdi)
mov %r14, 0x10(%rdi)
mov %r13, 0x18(%rdi)
mov %r12, 0x20(%rdi)
mov %rbx, 0x28(%rdi)
mov %rbp, 0x30(%rdi)
mov 0x00(%rsi), %rsp
mov 0x08(%rsi), %r15
mov 0x10(%rsi), %r14
mov 0x18(%rsi), %r13
mov 0x20(%rsi), %r12
mov 0x28(%rsi), %rbx
mov 0x30(%rsi), %rbp
ret
"
:
:"{rdi}"(old), "{rsi}"(new)
:
: "volatile", "alignstack"
);
}
Now this works as expected when running debug, but again fails in release build. You can easily try this out yourself in the playground.. Now there are no compilation error this time but a runtime error so I decided to output the asm and have a look at it (this is done on a windows machine):
Debug
movq 8(%rsp), %rdi
movq 16(%rsp), %rsi
.cv_loc 9 1 155 0
#APP
movq %rsp, (%rdi)
movq %r15, 8(%rdi)
movq %r14, 16(%rdi)
movq %r13, 24(%rdi)
movq %r12, 32(%rdi)
movq %rbx, 40(%rdi)
movq %rbp, 48(%rdi)
movq (%rsi), %rsp
movq 8(%rsi), %r15
movq 16(%rsi), %r14
movq 24(%rsi), %r13
movq 32(%rsi), %r12
movq 40(%rsi), %rbx
movq 48(%rsi), %rbp
retq
Release
cmpq %rdx, %r8
jbe .LBB8_7
movq (%rcx), %rax
movb $1, 88(%rax,%rsi)
movq 16(%rcx), %r8
movq 24(%rcx), %rax
movq %rdx, 24(%rcx)
cmpq %rax, %r8
jbe .LBB8_15
cmpq %rdx, %r8
jbe .LBB8_13
movq (%rcx), %rcx
leaq (%rax,%rax,2), %rax
shlq $5, %rax
leaq (%rcx,%rax), %rdi
addq $32, %rdi
addq %rcx, %rsi
addq $32, %rsi
#APP
movq %rsp, (%rdi)
movq %r15, 8(%rdi)
movq %r14, 16(%rdi)
movq %r13, 24(%rdi)
movq %r12, 32(%rdi)
movq %rbx, 40(%rdi)
movq %rbp, 48(%rdi)
movq (%rsi), %rsp
movq 8(%rsi), %r15
movq 16(%rsi), %r14
movq 24(%rsi), %r13
movq 32(%rsi), %r12
movq 40(%rsi), %rbx
movq 48(%rsi), %rbp
retq
Now I know the offset in the first part of the debug asm is due to setting the alignstack option in the asm macro.
movq 8(%rsp), %rdi
movq 16(%rsp), %rsi
I figured this out by diffing the files (sorry for the picture), but left is with alignstack option set and the right is without.

Now I can't seem to wrap my head around how the release asm output does alignment, maybe someone better in reading asm instructions than me can figure that out. But one notable thing I did figure out is that the __alignstack option had absolutely no impact on the assembly output in relase-mode__.
When I diffed them the same way as i diffed the debug asm output there is absolutely no difference in the output at all which I find strange.
I don't even know if this is related in any way and if the way release builds does alignment differently from debug builds has anything to do with this at all or if it's related to the original issue either. The only common denominator is the use of immidiate values to do offsets, and presumably set the right alignment, in release builds and which is not done in debug builds.
It seems that this is more a LLVM issue than a Rust issue to be honest.
So this is the status after quite a bit of debugging on my part:
=*m constraintm constraint is used as an outputdebug or it's a bug that it doesn't in release is still a question (it might even be intended but that sounds strange)alingstack in release build above is most likely due to inlining on the release build, everything works as expected in the last example if I use the attribute #[inline(never)]asm! macro seem to pass everything on to LLVM (even the validation of the assembly template) so any issue here is _most likely_ an issue in LLVM and not in Rust.Interesting. Consider this C code:
struct ThreadContext {};
void gt_switch(struct ThreadContext *new) {
asm(
"mov 0x0(%0), %%rsp\n\t"
"ret\n"
:
: "*m"(new)
: "rsp"
);
}
int main() {
struct ThreadContext ctx;
gt_switch(&ctx);
return 0;
}
This produces bogus assembly on both GCC 8.3.0 and Clang 7.0.1.
Honestly, it should. The only way that the requested load can be written is as fetch from the struct: the struct is sitting on the stack at an offset from %rbp.
The way Rust is getting away with this in the debug code is by pre-moving the struct address into a register before expanding the asm: this allows the thing to succeed. This is arguably a bug in rustc: if the asm asks for an indirect fetch from memory, rustc should do this fetch.
movq -8(%rbp), %rdi
.loc 6 8 8
#APP
movq (%rdi), %rsp
I think a cleaner and simpler way to write this example is as follows. It prevents a bunch of indexing errors, and makes it clear to LLVM what is being done with the registers. (struct cut down for clarity)
#![feature(asm)]
struct ThreadContext {
rsp: u64,
r15: u64,
}
fn gt_switch(new: *const ThreadContext) -> ! {
unsafe {
asm!("mov $1, $0" : "+r"("r15") : "*m"(&(*new).r15));
asm!("mov $1, $0" : "+r"("rsp") : "*m"(&(*new).rsp));
asm!("ret");
std::hint::unreachable_unchecked()
}
}
fn main() {
let ctx = ThreadContext{rsp: 0x80, r15: 0x88};
gt_switch(&ctx);
}
Unfortunately, compiling this (release or debug) results in this output:
error: internal compiler error: src/librustc_codegen_ssa/mir/operand.rs:128: not immediate: OperandRef(Pair(([0 x i8]*: %10 = load [0 x i8]*, [0 x i8]** %9, align 8, !dbg !32, !nonnull !4), (i64: %12 = load i64, i64* %11, align 8, !dbg !32)) @ TyLayout { ty: &str, details: LayoutDetails { variants: Single { index: 0 }, fields: Arbitrary { offsets: [Size { raw: 0 }, Size { raw: 8 }], memory_index: [0, 1] }, abi: ScalarPair(Scalar { value: Pointer, valid_range: 1..=18446744073709551615 }, Scalar { value: Int(I64, false), valid_range: 0..=18446744073709551615 }), align: AbiAndPrefAlign { abi: Align { pow2: 3 }, pref: Align { pow2: 3 } }, size: Size { raw: 16 } } })
thread 'rustc' panicked at 'Box<Any>', src/librustc_errors/lib.rs:637:9
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: aborting due to previous error
note: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
note: rustc 1.36.0-nightly (e70d5386d 2019-05-27) running on x86_64-unknown-linux-gnu
Bumping to tonight's nightly gives the same result. Time to file another bug, I guess.
Sigh. Someday I'll learn to write inline assembly. This looks like it outputs the right thing. At least it doesn't crash the compiler.
#![feature(asm)]
struct ThreadContext {
rsp: u64,
r15: u64,
}
fn gt_switch(new: *const ThreadContext) -> ! {
unsafe {
asm!("mov $0, %r15" : : "*m"(&(*new).r15) : "r15");
asm!("mov $0, %rsp" : : "*m"(&(*new).rsp) : "rsp");
asm!("ret");
std::hint::unreachable_unchecked()
}
}
fn main() {
let ctx = ThreadContext{rsp: 0x80, r15: 0x88};
gt_switch(&ctx);
}
Actually, those clobbers above appear to be a bad idea: they cause the compiler to save the clobbered registers gratuitously, even though it can tell that they will never be restored. Delete them.
This issue does not apply to the new asm! (RFC 2850) which doesn't support memory constraints.
The legacy llvm_asm! is deprecated and is no longer maintained.