We currently abuse LLVM segmented stack support to check for stack overflows. It would be more efficient to use guard pages to detect these. We already have guard pages on all the stacks. However to ensure that the code doesn't skip the guard pages, we need to insert stack probes. LLVM already can generate code for that on x86 and ARM for Windows. We'd just need to expose that as an option on other platforms.
It would be nice if support for stack probes could be added to MIPS in LLVM too, so we can get rid of the runtime support for the stack overflow checking.
Using stack probes is also easy and desirable to support in freestanding mode.
Here's a comment in the past about this. Sadly I don't think "just use a guard page will cut it" for the reasons outlined in that comment. That being said, I'd love to stop using segmented stacks!
I do think printing out a error message is a bad thing. On Windows it means interfering with the dialog which informs the user about an error and allows developers to debug the application. On Linux it interferes with debugging. If such a message desired for some reason, we can support it, but the POSIX solution probably isn't very pretty.
I was wondering what the impact of this would be, so I commented out the code generating the split stack attributes and did some simple benchmarks:
2.35% reduced size of librustc
4.11% less time compiling libcore
19.3% less time running shootout-fibo
Printing an error message on stack overflow would be trivial if there weren't green threads. It's as simple as installing a signal handler and print out an error if the address is located in the guard page range. It's yet another case where green threads make the language significantly worse than C++. I don't think there's a sane way to do it without horrific spin locks in today's Rust unless printing a special error message is not a requirement.
Anyway... the segmented stack support only catches overflow when it happens to occur on the Rust side rather than in C code that's being called. It doesn't actually work. For example, an infinitely recursive function may be allocating memory and there's a good chance it will be jemalloc triggering the overflow.
We can give error messages for libgreen by having it inform libnative about the active guard page. Foreign code might skip guard pages on non-Windows platforms, so we won't catch all stack overflows.
It's the platform's problem if it doesn't build with -fcheck-stack
. Stack frames that large aren't common in C code anyway.
@thestinger Knock off the general comments about how Rust is a better or worse language than C++ please.
Anyway, I totally agree with the thrust of using guard pages here, and would like us to move away from segmented stacks as soon as possible. I personally don't think good error messages in libgreen should block us. The fibo benchmark is particularly compelling.
I don't see why libgreen is relevant anyway; if there's a guard page either way, just have libgreen handle it in the same way as libnative. It should be easy to run enough of Rust from the signal handler to print out an error message before dying.
There was some discussion on the LLVM mailing list about this:
That may have been me.
I do have an implementation of the LLVM part for x86 and perhaps ARM (not sure how comprehensive that support is).
https://github.com/Zoxc/llvm/compare/llvm-mirror:master...stprobe
I don't know enough about MIPS to implement that in LLVM or enough about MIPS and ARM to implement the __probestack support function.
That fibonacci implementation is tail recursive, and optimises to a loop:
else-block.i.i.i: ; preds = %else-block.i.i.i.preheader, %else-block.i.i.i
%.tr710.i.i.i = phi i64 [ %107, %else-block.i.i.i ], [ %104, %else-block.i.i.i.preheader ]
%.tr69.i.i.i = phi i64 [ %106, %else-block.i.i.i ], [ 1, %else-block.i.i.i.preheader ]
%.tr8.i.i.i = phi i64 [ %.tr69.i.i.i, %else-block.i.i.i ], [ 0, %else-block.i.i.i.preheader ]
%106 = add i64 %.tr8.i.i.i, %.tr69.i.i.i
%107 = add i64 %.tr710.i.i.i, -1
%108 = icmp eq i64 %107, 0
br i1 %108, label %_ZN3fib20hcf5cee5c8487747eOaaE.exit.i.loopexit, label %else-block.i.i.i
@huonw: It's probably just the difference in compilers then, never mind. Rust is actually a bit faster than the C code on x86_64 when using 32-bit integers.
On x86-64 actually seems to be using a 64-bit int
in Rust vs. the 32-bit one in C. Changing to i32
throws Rust in a much better light:
LANGUAGE Rust 2167
75699
LANGUAGE C 2220
75699
LANGUAGE C 2271
(Anyway, this is off-topic for this bug, although @pcwalton may be interested in it.)
Would not having safe stacks on MIPS block landing this?
Do we definitely want a signal handler/exception handler to print out that a stack overflow happened?
I'm trying to get a sense of where this is / how to bring this forward and I'm confused about a few things:
-fstack-check=generic
works by just inserting an inline loop to probe the stack a page at a time -- is having a separate __probestack
useful? It seems like avoiding the addition to compiler-rt entirely would help this patch get landed and would also simplify the necessary change to Rust itself. (On Windows, where __chkstk
already exists, LLVM can continue to call into that.)__probestack
not to use signals, but it doesn't, itself, right? It just induces a SIGSEGV. Isn't the answer to that use case to abuse LLVM segmented stack support / __morestack
as Rust currently does?My personal opinion is that if this can be turned on for the common platforms (at least Linux on x86-32, x86-64, maybe ARM), I'd be a lot more comfortable with #27388. But if there's no Linux support at all right now, then dropping stack checks is a bit more worrisome.
What's the story here with MIPS? Am I reading correctly that LLVM can already do this for most Linux architectures but not MIPS? (What makes MIPS different from any other CPU with a stack pointer?)
As far as MIPS goes, @Zoxc couldn’t find anybody who knows MIPS assembly to implement probing, hence the question.
AFAIR no, it hasn’t landed yet.
What runtime support is needed?
Absolutely none! It depends on implementation. Reading the LLVM patch, apparently, __probestack
function will have to be defined.
There's a mention on the LLVM mailing list about people who might want __probestack not to use signals, but it doesn't, itself, right? It just induces a SIGSEGV. Isn't the answer to that use case to abuse LLVM segmented stack support / __morestack as Rust currently does?
__morestack
and probes are fundamentally functionally different. Different enough that it makes little sense to try emulate probing with __morestack
.
Where should I look for your current LLVM and compiler-rt (if any) patches?
Here and here. Not sure whether this is the most recent patch-set, though.
I may be going out on a limb here, but I'm tagging this with I-unsound since stack overflow is still a theoretical attack vector (see https://github.com/rust-lang/rust/pull/27338 for more discussion on this).
I think we need to refocus this discussion since a lot has change since the bug was first opened (I'm almost tempted to open an entirely new bug). Specifically, where are we at now for supporting stack probes on all platforms (not just first-tier platforms), what needs to be done to accomplish this, and who has the expertise needed to implement it?
http://reviews.llvm.org/D12483 seems to be the most recent patch against llvm.
@nagisa That's still in review, yes? It also seems to be a bit tentative.
Are you implying that the next step to closing this issue is "wait for LLVM to support what we need"? If so, then what would need to be done on our end once that support appears? How involved a change would it be?
Are you implying that the next step to closing this issue is "wait for LLVM to support what we need"?
Alternative would be implementing and testing out similar support in our fork of LLVM. If we _really_ want this to get fixed faster, then this is certainly an option. I believe we do not quite support external LLVM anyway at the moment.
If so, then what would need to be done on our end once that support appears? How involved a change would it be?
It then just comes down to annotating every function with probe-stack
attribute by default. Then LLVM would add probe for functions that do in fact need it. I believe we already activate some attributes by default, so activating one more shouldn’t be too involved.
Nominating as this is a soundness bug that has yet to have a priority assigned.
triage: P-medium
It'd be good to get some clarification from @alexcrichton (or someone) as to the current state of guard pages etc and what the precise risk is here. I tried following up on the links but there was a lot to read!
Our implementation of guard pages is good (don’t remember if there’s implementation for all non-tier1 platforms) and work well/correctly for both main thread (implicitly created by OS) and threads created using the standard APIs (we create the guard page ourselves).
There’s a risk to read/(over-)write data that does not belong to us/isn’t on the stack page (i.e. is outside the stack) only in very specific circumstances:
So, while technically this is a soundness bug, it is hard to imagine ever seeing this being abused in any way. Exposing variable length stack-allocated arrays would make this easier to abuse, but that’s not happening in my knowledge.
Âą: Or however many pages guard pages use on a given OS.
Ah, and Windows is not affected since it already has stack probes.
Yes I believe @nagisa is correct on all accounts.
If and when LLVM has support for stack probes on all platforms, seems like we should enable!
stack memory is initialized in a way that’s guaranteed (to my knowledge) to hit the guard page first if there’s at least some part of the guard page that’s not shadowed by uninitialized stack memory
It seems like this isn't always the case. For example:
fn stack_overflow() {
let x = [0u8; 999999999];
}
Playpen: http://is.gd/g8ZDkD
From the assembly output it seems like the array initialization starts at the bottom:
...
.Ltmp8:
subq $999999888, %rsp ; subtract 999999888 from the stack pointer
xorl %eax, %eax
movl %eax, %ecx
leaq -1000000000(%rbp), %rdx
movb $61, -1(%rbp)
movq %rdx, -1000000008(%rbp)
movq %rcx, -1000000016(%rbp)
.LBB1_1:
movq -1000000016(%rbp), %rax
movq -1000000008(%rbp), %rcx
movb $0, (%rcx,%rax) ; write a 0 byte to memory at (rcx, rax)
addq $1, %rax ; increase rax by 1
cmpq $999999999, %rax
movq %rax, -1000000016(%rbp)
jb .LBB1_1
.loc 1 9 0 prologue_end
...
I ran into this when I added guard pages to the kernel stack in my toy OS. If the array size was big enough, the code would miss the guard page and mess up page tables.
@phil-opp great observation! We want to initialize array from beginning to the endÂą (as we do here), but the array is laid onto stack reversed (i.e. the first element is at the head of the stack and the last element is closer to the beginning of the stack).
So… this is way easier to abuse than I initially claimed.
²: and since this code optimises down to memset@PLT
, we can’t really tell in which direction initialisation really happens, anyway.
What happens on platforms that don't have stack probes? I.e. anything MMU/MPU-less. Right now it seems there wouldn't be any way to reenable stack overflow checking, and while it's possible to write a custom LLVM pass and enable it via rustc -C llvm-args=-load=liboverflowcheck.so
, requiring users of something like http://zinc.rs to check out rustc's LLVM, build it, take care to keep it in sync with upstream and finally build a pass seems extremely hostile.
The [better] solution for MxU-less baremetal systems is to grow the stack
out of the RAM instead of into the heap (i.e. have the stack lower than the
heap). It is the sanest solution, but a little trickier to do with ld
.
(you need to specify the stack size instead of letting it be the space
remaining after the heap)
@bharrisau Sure, but that does not work if you have more than one stack.
@whitequark I would suspect that any "flavorful" platforms would just have stack probes disabled (e.g. it'd be a custom-target-spec option).
@alexcrichton are you suggesting that Rust will be inherently memory-unsafe even in safe code on every MPU-less platforms? That's quite crippling _especially_ because there is no MPU.
This is not a theoretical concern. On targets with little RAM, the memory layout is quite packed and small stacks directly translate to reduced device cost. Stack overflow checking is a desirable feature, e.g. FreeRTOS has their own implementation. Of course, it's not actually guaranteed to catch all stack overflows; Rust is capable of doing that and there is no excuse not to.
An ideal solution would be an ability to specify a symbol (LLVM global) holding the current stack limit, with the symbol name being configurable. An RTOS then would update it every time it transitions to a new stack.
@whitequark in a multi-threading environment the stack (and possibly also the stack limit) should be per-thread, hence the updates and checks you are mentioning should be on a thread-local variable. What you are proposing looks to me like software MMU emulation or explicit allocation (each call tries to allocate a stack frame from the fixed-size stack vector).
I am afraid this pattern would make even simple functions significantly more complex, possibly preventing many basic optimisations. Moreover, outside of LLVM the knowledge about how much stack is used is not complete (how many registers are going to be spilled? are alloca
s optimised away?). I believe that the feature you are requesting should be implemented in LLVM rather then rustc.
@ranma42 No, I am proposing exactly per-thread checks. It's just that LLVM's lowering of thread-locals is not useful for non-hosted targets, and anyway, nearly always the only reasonable way to implement those is using a regular global variable. In case your LLVM's lowering for your target does support useful thread-locals, you can simply supply one.
Yes. This feature can only be implemented as an LLVM pass; probably no changes to rustc itself are wanted or necessary. However, LLVM is an implementation detail. Rust claims to provide memory safety; it is the duty of its compilers to ensure that memory safety can in fact be provided on all platforms. I think that rustc should include such a pass in its fork of LLVM, since it is going to use its own fork for foreseeable future anyway, specifically due to Rust-specific passes.
The old hack abusing the split-stack machinery was _almost_ what I suggest here; it, however, embedded platform-specific knowledge to generate code extracting stack limit from a thread-local, and LLVM asserted out on any uncommon platform. If it was extended to read the stack limit from a (plain global or thread-local) variable given as an option rather than hardcoding the offsets for Linux, Windows, etc in the backend, it would work perfectly.
I agree that it is our responsibility to do stack checking, however we achieve it -- but it also seems clear that we want to configure this per target (iow, we do not want to add read/writes of a stack limit variable to every fn when we can use a guard page, etc).
@nikomatsakis I agree of course, guard page is the best method when it is available. I am only saying that it shouldn't be the only one.
Speaking of cost of these checks--it is actually fairly low. The prologue code should look something like this (assuming Cortex-M3):
.syntax unified
.text
prologue: @ 0
movw r0, #:lower16:sp_limit @ +1 1
movt r0, #:upper16:sp_limit @ +1 2
ldr r0, [r0] @ +2 4
add r0, r0, #n @ +1 5 (n = stack frame size + red zone size)
cmp sp, r0 @ +1 6
ble __overflow @ +1 7 (common case)
.data
sp_limit:
.long 0
It will take 7 cycles (~100ns on a 72MHz core), assuming sp_limit lives in SRAM, and will inflate non-leaf functions by 20 bytes. (I used movw
/movt
to avoid wait states for the case where flash runs at a lower frequency than the core, but if a load from a constant island would be used, the size penalty is 14 bytes plus four bytes per 4k of code). There are no caches, so the delay inflicted by memory access is isolated and predictable.
Since red zone is used, leaf functions with small stack frames do not have to pay at all.
@whitequark
It seems pretty reasonable to me that flavorful targets could use the morestack-like stack checking instead of guard pages to ensure that we can have stack checking everywhere. It'd likely require some LLVM modifications to be amenable, but shouldn't necessarily be a showstopper either way?
@alexcrichton Hm, yes. I've looked at these commits again and removal of __morestack/stack_overflow langitem won't really complicate introduction of such stack checking, so now I don't think I have anything in particular to say about this specific PR.
For those looking for background info on this bug, and considering that this is the top Google hit for "stack probe", here's an actual definition of stack probes (which AFAICT is missing from this thread):
A stack probe is a sequence of code that the compiler inserts into every function call. When initiated, a stack probe reaches benignly into memory by the amount of space that is required to store the function's local variables.
If a function requires more than size bytes of stack space for local variables, its stack probe is initiated. By default, the compiler generates code that initiates a stack probe when a function requires more than one page of stack space.
Last time there was a thread on this issue somebody said they were working on it. Does anybody recall who that was, whether there is an updated LLVM fork, or anything on the status of this? This issue comes up far more than activity on this thread would indicate...
I heard @whitequark was doing something, but I don't know what. This is my latest LLVM fork
Finishing this is still on my TODO list.
Given that Rust is now used in production. Can we merge changes to support this to our LLVM fork?
Ping @nagisa @alexcrichton
@Zoxc Please simply address the upstream concerns.
@whitequark @Zoxc Where are the upstream concerns articulated? Is there a mailing list post that is relevant?
@pcwalton See https://reviews.llvm.org/D9653#206892. The only remaining objection the upstream had to the overall design is the inability to configure the name of the stack probe function. I don't know why @Zoxc repeatedly refused to do this trivial change.
@whitequark Sounds like one of us needs to do it. Should you or should I?
@pcwalton Please do it, I was going to do it for months but I'm really overloaded.
On it.
Thanks @pcwalton.
Updated the patches. cc @whitequark
https://reviews.llvm.org/D34386
https://reviews.llvm.org/D34387
https://reviews.llvm.org/D34386 has been approved.
Still waiting on the second one. @whitequark If the upstream review continues to go slowly, can we pull this into our fork?
@pcwalton It's pretty quick so far... There's nothing inherently troublesome with having this in our own fork but I'm very worried about this major feature never reaching upstream. For one, this makes it significantly harder to maintain rustc support for out-of-tree LLVM backends. Another reason is that this is something that really ought to benefit the entire LLVM community, not just rustc but at least also clang (-fstack-check
currently does nothing).
FWIW I am watching the progress of those two patches closely. In fact let me commit D34386 for you.
@pcwalton Looks like the only remaining upstream concern with D34387 is the spilling of R11, and I am unfortunately as lost on this ABI issue as you. I can try looking into it...
@pcwalton Commented on D34387; I don't think R11 should be spilled.
@whitequark Note that compiler-rt
has to be updated in order for clang to be able to make use of stack probes. Someone can do that, but that person won't be me. Working on getting through the LLVM review process has already taken up more of my time than I can afford to spare at present.
Once the patches are upstream I don't mind doing the legwork to integrate it into the compiler, I'm already experimenting locally with the current state of the patches. (I'll take care of compiler-rt and whatnot)
Note that compiler-rt has to be updated in order for clang to be able to make use of stack probes. Someone can do that, but that person won't be me.
That's fine; it's far easier, required skills wise, to get something into compiler-rt than LLVM itself.
And, thank you for your time!
I've opened https://github.com/rust-lang/rust/pull/42816 for the integration into the compiler.
@whitequark looks like D34387 has been approved, mind landing that upstream as well?
@alexcrichton Yeah, did it as soon as I woke up and read mail.
Great work! Could somebody given a summary of what would need to also do the stack probes (or other stack overflow checking) on ARM & AAarch64, and then also what it would takes for MIPS and other platforms?
Implement in LLVM, cherry-pick the commits into Rust's LLVM, reimplement #42816 for the other platforms.
Looks like the calling convention used in LLVM master doesn't match these, but only on Windows where we don't use them. This seems to be because unlike my patches, the probe-stack
attribute takes preference over the Windows functions.
probestack.rs also lacks unwinding information. My compiler-rt patch does have this.
This seems to be because unlike my patches, the probe-stack attribute takes preference over the Windows functions.
Only on x86_32 Windows, right? The current semantics isn't an accident, I considered it cleaner.
probestack.rs also lacks unwinding information.
It is not needed. __rust_probestack is a leaf function that never unwinds, and the debugging information is generated by rustc, since the assembly is wrapped in a naked function.
Only on x86_32 Windows, right? The current semantics isn't an accident, I considered it cleaner.
So if you use "probe-stack"="__probe_stack"
you need 2 different functions on x86_32 depending on the platform, how is that cleaner?
It is not needed. __rust_probestack is a leaf function that never unwinds, and the debugging information is generated by rustc, since the assembly is wrapped in a naked function.
I wouldn't rely on LLVM generating correct debugging information for inline assembly / naked functions nor debuggers having suitable heuristics to be able to give correct stack traces.
So if you use "probe-stack"="__probe_stack" you need 2 different functions on x86_32 depending on the platform, how is that cleaner?
Omitting "probe-stack"
results in the unmodified platform ABI being used, including "probe-stack"
results in LLVM's probe function ABI being used.
I wouldn't rely on LLVM generating correct debugging information for inline assembly / naked functions nor debuggers having suitable heuristics to be able to give correct stack traces.
That's a good point.
Now that all the foundational work has been done and this has been implemented for all tier-1 platforms, I'm tempted to close this bug and open individual bugs for further platforms. Though given that we do want ARM to become tier-1 soon-ish I'm fine with leaving it open until that work is done, but overall I think this needs to be part of a broader discussion of how we judge severity of a security issue when that issue hinges on LLVM support for a less-supported platform.
Taking the initiative and closing this in favor of https://github.com/rust-lang/rust/issues/43241 .
Most helpful comment
Updated the patches. cc @whitequark
https://reviews.llvm.org/D34386
https://reviews.llvm.org/D34387