Runtime: Proposal: TryForSufficientStack method to support stackalloc usage

Created on 8 Feb 2018  路  27Comments  路  Source: dotnet/runtime

_From @kkokosa on February 8, 2018 12:8_

Due to changes in C# 7.2 and Span, more and more stackallocusages may become popular like:

Span<byte> span = stackalloc byte[1000];

However, this call will end up with unrecoverable StackOverflowExceptionif there is not enough stack space left. We can do nothing in such situation which makes this approach not useful at some point. It is now just completely unreliable to guess what stackallocsize may end up with such a catastrophe.

@jkotas pointed out in dotnet/corefx#14675 that RuntimeHelpers.EnsureSufficientExecutionStack is a reliable solution for handling stack overflow in general but as MSDN says, this method "_ensures that the remaining stack space is large enough to execute the average .NET Framework function_". However, probing for _average_ .NET framework function is not very helpful as stackalloc makes it not average for sure.

I propose to add a new helper method which gives at least some clue whether our stackallocmay end up with StackOverflowException:

public static bool RuntimeHelpers.TryForSufficientStack(long size)

I believe returning bool instead of throwing an exception (like InsufficientExecutionStackException from above method) is better because stackalloc is most probably used in hot paths already and adding exception handling there is rather undesirable.

As far as I understand this method seems to be quite simple in terms of implementation as all necessary data are there already. My naive implementation proposal:

FCIMPL1(FC_BOOL_RET, ReflectionInvocation::TryForSufficientStack, INT64 size)
{
    FCALL_CONTRACT;

    Thread *pThread = GetThread();

    UINT_PTR current = reinterpret_cast<UINT_PTR>(&pThread);
    UINT_PTR limit = reinterpret_cast<UINT_PTR>(pThread->GetCachedStackLimit());

    FC_RETURN_BOOL(current >= (limit + size));
}
FCIMPLEND

PS. I am not sure whether stack guard size should be taken into consideration here or not...

_Copied from original issue: dotnet/coreclr#16277_

Design Discussion api-suggestion area-System.Runtime.CompilerServices

Most helpful comment

I do appreciate this idea, but my intuition suggests that it's a lot less useful than it seems on its face.

Thinking about when you'd need a really big buffer (like, bigger than a 4K OS page) that could legally fit on the stack, allocating that buffer on the stack feels like it could very often cause a page fault when trying to return to the caller. Since we're probably talking about situations where these scratch buffers are of input-dependent lengths, then it seems to follow that larger buffers will correlate with more time spent in the method, which means that I see a potential for a bit of a "perfect storm":

  • memory-constrained system that aggressively swaps out memory pages that aren't used
  • method that hyper-aggressively allocates whatever it possibly can on the stack
  • caller that invokes the method multiple times, perhaps in a loop, in such a way that the caller directly or indirectly pushes the stack off onto another page and works exclusively with those addresses.

I'd imagine that someone could produce a contrived microbenchmark that shows this as being even worse than unconditionally allocating on the heap; more likely, each method will have its own threshold where it starts making sense to use ArrayPool<T>, and I imagine that this point will usually be well below "however much stack space there currently is".

Furthermore, it feels like as soon as we write a TryEnsureSufficientExecutionStack-guarded stackalloc, everything after that becomes "serious business" until the stack-allocated buffer goes out of scope:

  • If buffers are sometimes going to fit on the stack and sometimes not, then it follows that there are going to be situations where the buffer only barely fits on the stack.
  • This means that the code in question needs to consider not just the size of their desired scratch buffer, but also the size of everything else that will be pushed onto the stack after it (including other methods being called, and their own calls, etc.) and encode all of that knowledge into a number that goes into the TryEnsureSufficientExecutionStack call.
  • This feels fragile: that number can depend on code that's "far away" and can change without warning.

I only bring up the above because several examples in the thread use TryEnsureSufficientExecutionStack as the only thing that determines stack vs. heap.

Finally, what are some actual situations that we're worried about here?

  • It seems unlikely that a non-recursive leaf method would fear allocating a 1 KiB buffer on the stack, at least at a 1MiB stack size (the default on Windows - other platforms seem to have not-smaller defaults).

    • If we're worried about methods running on a thread with a lower stack size, then it feels like a better option would be to give them the ability to encode a heuristic like "stackalloc if <1 KiB and also <0.1% of max stack, otherwise grab from ArrayPool<T>".

  • I guess a possibility is that some method stack-allocates a buffer and passes its Span<T> down into another method that does the same, and it goes down like that, but I don't think the proposed guard really helps much for that situation (it has all the concerns from earlier in my comment, magnified a ton).

Again, though, I do appreciate the idea, and the sample in the original issue description seems to suggest that it's easy to implement. I just wanted to make sure to share my thoughts on it (it's been on my mind a ton since @svick linked it from dotnet/roslyn#25118), even if all it does is inspire a "Remarks" section in the docs.

All 27 comments

  • I think the name of this method should be close to EnsureSufficientExecutionStack, to indicate that the two methods do very similar things. Maybe CheckSufficientExecutionStack or HasSufficientExecutionStack?

    Also, I think Try makes sense when it's combined with a verb, so if you want to use it here, the name would be TryEnsureSufficientExecutionStack.

  • Should there be a way to use the new method to check for stack space for that "average function" without throwing? That could be achieved by adding default value to the size parameter (probably 0 or -1), or by adding a parameterless overload.

Should there be a way to use the new method to check for stack space for that "average function" without throwing?

FYI, there already is a public RuntimeHelpers.TryEnsureSufficientExecutionStack() exposed in .NET Core.

there already is a public RuntimeHelpers.TryEnsureSufficientExecutionStack() exposed in .NET Core.

Then I think the method proposed here should be added as its overload.

there already is a public RuntimeHelpers.TryEnsureSufficientExecutionStack() exposed in .NET Core.

Then I think the method proposed here should be added as its overload.

Then probably complementary EnsureSufficientExecutionStack overload (throwing InsufficientExecutionStackException) should be added?

If the proposed EnsureSufficientExecutionStack is added what the typical usage is going to look like?

Note that stackalloc should be only used for relatively small buffers (say up to 1kB). Array pool is more appropriate above this limit. The framework is using stackalloc in number of places and the typical pattern looks like this:

https://github.com/dotnet/coreclr/blob/efebb38f3c18425c57f94ff910a50e038d13c848/src/mscorlib/shared/System/Text/ValueStringBuilder.cs
https://github.com/dotnet/coreclr/blob/efebb38f3c18425c57f94ff910a50e038d13c848/src/mscorlib/shared/System/Number.Formatting.cs#L297

Would it be better to work on making this pattern easier instead?

For example, I was thinking about various scenarios where some temporary data collections have to be calculated. Instead of using an array or even ArrayPool, we could utilize stack with almost no time overhead:

public double Process(List<BigData> list)
{
    Span<DataStruct> preprocessedList = stackalloc DataStruct[list.Count];
    for (int i = 0; i < list.Count; ++i)
    {
        // Calculate preprocessedList [i] fields from list items
    }
    return ProcessData(new ReadOnlySpan<DataStruct>(preprocessedList ));
}

But currently every usage of stackallow above MinExecutionStackSize (from Thread::SetStackLimits method) - which is 128 kB (or 64 kB on 32-bit) - may end up with killing entire app with StackOverflowException.

I mean, I know it is quite a big size guaranteed already by EnsureSufficientExecutionStack call but... would not be nice to have some more size-aware way of checking this? Currently, people using stackalloc may be even not aware of those deep hidden implementation-related limitations. In other words, deciding whether we are using "relatively small buffers" may be too subjective when writing general-purpose library and there is no way currently to ensure that stackalloc won't fail.

So what would your code example look like if you had EnsureSufficientExecutionStack API?

Instead of using an array or even ArrayPool, we could utilize stack with almost no time overhead:

Large stack buffers do have overhead: The stack memory has to be faulted in (one page at a time typically), and then it tends to be stuck in the working set.

So what would your code example look like if you had EnsureSufficientExecutionStack API?

Something like:

public double Process(List<BigData> list)
{
    if (RuntimeHelpers.TryEnsureSufficientExecutionStack(list.Count * PUT_HERE_YOUR_STRUCT_SIZE))
    {
        Span<DataStruct> preprocessedList = stackalloc DataStruct[list.Count];
        for (int i = 0; i < list.Count; ++i)
        {
            // Calculate preprocessedList [i] fields from list items
        }
        return ProcessData(new ReadOnlySpan<DataStruct>(preprocessedList ));
    }
    else
    {
        // fallback to ArrayPool or other way of creating preprocessed list
    }
}

Large stack buffers do have overhead: The stack memory has to be faulted in (one page at a time typically), and then it tends to be stuck in the working set.

Isn't the same for faulting in memory from ArrayPool? I mean, if Process is on hot path called multiple times, stucking in working set is some kind of overhead we should be aware of but maybe worth considering? If Process was one-shot call, probably nobody would care about optimizing it in the first place.

Additionally, the point is that even with small buffers of 1 kB the TryEnsureSufficientExecutionStack says nothing whether it is sufficient for that 1 kB or maybe some other value.

// fallback to ArrayPool or other way of creating preprocessed list

I do not think I would want to use this pattern because of it makes me duplicate the code.

Isn't the same for faulting in memory from ArrayPool?

No - if the buffer is reused on multiple threads, or different callers on the same thread.

There needs to be best practice for allocating large buffers. If half of the folks are going to do it via ArrayPool and other half with large stack allocs, everybody loses at the end.

I do not think I would want to use this pattern because of it makes me duplicate the code.

Ok, this was a poorly prepared example. I was thinking more about something like 'generic temporary collection factory' without repeating any processing logic:

public double Process(List<BigData> list)
{
    Span<SomeStruct> span;
    if (RuntimeHelpers.TryEnsureSufficientExecutionStack(list.Count * PUT_HERE_YOUR_STRUCT_SIZE))
    {
        span = stackalloc SomeStruct[size];
    }
    else
    {
        span = ArrayPool<SomeStruct>.Shared.Rent(size);
    }

    for (int i = 0; i < list.Count; ++i)
    {
        // Calculate span[i] fields from list items
    }   
    return ProcessData(new ReadOnlySpan<DataStruct>(span));
}

However, this currently does not compile with _"A result of a stackalloc expression of type 'Span' cannot be used in this context because it may be exposed outside of the containing method"_ error. Is it supposed to be ok? Because https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md says that _"A stackalloc expression is an rvalue that is safe-to-escape to the top-level scope of the method (but not from the entire method itself)"_.

There needs to be best practice for allocating large buffers. If half of the folks are going to do it via ArrayPool and other half with large stack allocs, everybody loses at the end.

Here I agree fully and I see your point. What still bothers me is the fact that stackalloc usage may kill any app with stack overflow and the defensive programmer has no way to proactively prevent this.

Right, it does not compile. Also, missing in your example is code to return the buffer to the pool once you are done. The pattern that works today looks like this:

https://github.com/dotnet/coreclr/blob/efebb38f3c18425c57f94ff910a50e038d13c848/src/mscorlib/shared/System/Text/ValueStringBuilder.cs
https://github.com/dotnet/coreclr/blob/efebb38f3c18425c57f94ff910a50e038d13c848/src/mscorlib/shared/System/Number.Formatting.cs#L297

Or another example:

https://github.com/dotnet/corefx/pull/26942/files#diff-a07e741d448a2771beaea007ab99a266
https://github.com/dotnet/corefx/pull/26942/files#diff-00816d3a470b2701102e6be001c6e4d3R23

Though its likely unsensible to do on a different thread than the one you are running on (unless in a Wait?); how about some utility apis for querying stack size instead?

So currently there is a maxStackSize parameter for creating a thread:

public Thread (ParameterizedThreadStart start, int maxStackSize);

But you can't query it back to find out what it was set to; so maybe something like the following would be better?

public partial class Thread
{
    public int TotalStackSize { get; }
    public int AvailableStackSize { get; }
    public int UsedStackSize { get; }
}

Its also less ambiguous with guarantees that xSufficientStack would suggest

Or perhaps in System.Diagnostics?

namespace System.Diagnostics.ThreadInfo
{
    public static class ThreadInfoExtensions
    {
        public static int TotalStackSize(this Thread thread);
        public static int AvailableStackSize(this Thread thread);
        public static int UsedStackSize(this Thread thread);
    }
}

Or perhaps in System.Diagnostics?

Moving it away from "mainstream" APIs like Thread seems to be a good idea. And it seems to be a better place than quite exotic System.Runtime.CompilerServices.

It does not make sense for these methods to take Thread because of they are applicable for current thread only. For historic reasons, there are number of methods like that on Thread that throw when called on anything but current thread. We do not want to be growing this set.

Would it be more convenient if the user didn't have to compute the required size in bytes, they would just specify the type and the count of items of that type to allocate? So instead of:

```c#
RuntimeHelpers.TryEnsureSufficientExecutionStack(list.Count * PUT_HERE_YOUR_STRUCT_SIZE)

You would have:

```c#
RuntimeHelpers.TryEnsureSufficientExecutionStack<SomeStruct>(list.Count)

Though being able to specify the size in bytes is probably still useful, so this could be just a convenience overload (possibly added later than the one proposed here).

I like this idea.

I do appreciate this idea, but my intuition suggests that it's a lot less useful than it seems on its face.

Thinking about when you'd need a really big buffer (like, bigger than a 4K OS page) that could legally fit on the stack, allocating that buffer on the stack feels like it could very often cause a page fault when trying to return to the caller. Since we're probably talking about situations where these scratch buffers are of input-dependent lengths, then it seems to follow that larger buffers will correlate with more time spent in the method, which means that I see a potential for a bit of a "perfect storm":

  • memory-constrained system that aggressively swaps out memory pages that aren't used
  • method that hyper-aggressively allocates whatever it possibly can on the stack
  • caller that invokes the method multiple times, perhaps in a loop, in such a way that the caller directly or indirectly pushes the stack off onto another page and works exclusively with those addresses.

I'd imagine that someone could produce a contrived microbenchmark that shows this as being even worse than unconditionally allocating on the heap; more likely, each method will have its own threshold where it starts making sense to use ArrayPool<T>, and I imagine that this point will usually be well below "however much stack space there currently is".

Furthermore, it feels like as soon as we write a TryEnsureSufficientExecutionStack-guarded stackalloc, everything after that becomes "serious business" until the stack-allocated buffer goes out of scope:

  • If buffers are sometimes going to fit on the stack and sometimes not, then it follows that there are going to be situations where the buffer only barely fits on the stack.
  • This means that the code in question needs to consider not just the size of their desired scratch buffer, but also the size of everything else that will be pushed onto the stack after it (including other methods being called, and their own calls, etc.) and encode all of that knowledge into a number that goes into the TryEnsureSufficientExecutionStack call.
  • This feels fragile: that number can depend on code that's "far away" and can change without warning.

I only bring up the above because several examples in the thread use TryEnsureSufficientExecutionStack as the only thing that determines stack vs. heap.

Finally, what are some actual situations that we're worried about here?

  • It seems unlikely that a non-recursive leaf method would fear allocating a 1 KiB buffer on the stack, at least at a 1MiB stack size (the default on Windows - other platforms seem to have not-smaller defaults).

    • If we're worried about methods running on a thread with a lower stack size, then it feels like a better option would be to give them the ability to encode a heuristic like "stackalloc if <1 KiB and also <0.1% of max stack, otherwise grab from ArrayPool<T>".

  • I guess a possibility is that some method stack-allocates a buffer and passes its Span<T> down into another method that does the same, and it goes down like that, but I don't think the proposed guard really helps much for that situation (it has all the concerns from earlier in my comment, magnified a ton).

Again, though, I do appreciate the idea, and the sample in the original issue description seems to suggest that it's easy to implement. I just wanted to make sure to share my thoughts on it (it's been on my mind a ton since @svick linked it from dotnet/roslyn#25118), even if all it does is inspire a "Remarks" section in the docs.

Completely agree with @airbreather . Similar discussion at https://github.com/dotnet/csharplang/issues/1344

I really like the direction of this discussion. My primary reasons for this proposal were twofold:

  • with expected popularity growth of constructs like Span<byte> span = stackalloc byte[1000] in safe code, the actual size of the array that is safe to allocate is now quite an internal magic. More and more people may wonder what size is "safe"
  • stackalloc is special because it may throw uncatchable exception killing everything. Thus, it should be specially treated probably

At the moment, I fully agree with the arguments presented by you. However, the fact that such stackalloc usage is most probably, in most cases, not optimal does not incur that people will not do it. Maybe a good remark in a documentation should state clearly what this magic number is and it will be enough. Currently, I would be a little afraid to advise the client to use stackalloc and when asked what size does not kill his application, the only answer is "well... probably something around 1 KiB".

I like @jkotas idea of LocalBuffer<T> a lot as a way of addressing this problem. It would leave stackallocating-people on their own but at least they would have a safer alternative.

@airbreather

This means that the code in question needs to consider not just the size of their desired scratch buffer, but also the size of everything else that will be pushed onto the stack after it (including other methods being called, and their own calls, etc.) and encode all of that knowledge into a number that goes into the TryEnsureSufficientExecutionStack call.

EnsureSufficientExecutionStack already has an opinion about how much stack is required to execute "average .NET Framework function". Maybe you could tell TryEnsureSufficientExecutionStack how many functions you need to fit on the stack after this allocation? That way, you wouldn't be encoding it yourself and any magic numbers would be hidden from you. The code could look like:

c# RuntimeHelpers.TryEnsureSufficientExecutionStack( size: list.Count * STRUCT_SIZE, calledMethods: 10)

There are a couple of issues that feed into this. One is that the stack size varies widely on platforms now (for instance 2mb on windows and a default of 8mb on Ubuntu). Also I am not sure about allocating large chunks of stack until we get the "don't zero the stack". Without that its faster to pull from the array pool because zeroing a large array will cost you more and as said it will need to be faulted in anyway.

EnsureSufficientExecutionStack already has an opinion about how much stack is required to execute "average .NET Framework function". Maybe you could tell TryEnsureSufficientExecutionStack how many functions you need to fit on the stack after this allocation? That way, you wouldn't be encoding it yourself and any magic numbers would be hidden from you.

@svick looking through the code, the rule-of-thumb used by EnsureSufficientExecutionStack seems like it's already accounting for methods calling other methods at a "typical" rate (plus the overhead of CLR services), so I'm guessing that the implementation would be more like "return true iff the stack budget burned so far, plus the directly desired size, plus that rule-of-thumb value for the current thread, would be less than the current thread's max stack size".

On the one hand, that certainly feels like it would be safe enough, but on the other hand, is it maybe a bit too safe? If I'm a leaf method that's 50KiB away from overflowing the stack (which might actually start happening because of callers over-eagerly stack-allocating because of this guard), and this guard stops me from allocating 400B, then I'm not happy.

Currently, I would be a little afraid to advise the client to use stackalloc and when asked what size does not kill his application, the only answer is "well... probably something around 1 KiB".

@kkokosa Without further context, if all you're given is a question phrased like "how much can I stackalloc before my application explodes?", then the perfect answer seems to depend both on how much stack is already burned by the time your method is called and how much stack the method will burn after the stackalloc (including any of the CLR services that might get invoked automatically like the GC).

Anyway, without that context, isn't the best answer still going to be something like that? "Probably something around 1 KiB"? The helper utility would only help to remove the "stack burned so far" part, not the "stack burned after the stackalloc" part.

It's almost like the more useful helper would be to just give a utility method that answers "how many more bytes can I push onto the stack before I hit the guard page?" (edit: or was it the page that comes immediately "before" the true guard page?) and just leave it up to the caller to interpret that according to their specific knowledge of what they intend to do in the future.

In PowerShell repo we got first experience of using Span and stackalloc. We found that we use one pattern:
```c#
const stackallocTheshold = 120;

Span array;

int size = GetArraySize();
if (size < stackallocTheshold)
{
array = stackalloc int[size];
}
else
{
array = new int[size]
}

Based on the discussion we would have to implement:
```c#
const RedLineStackallocTheshold = 1Kb; // ???
const StackallocTheshold = 120;

Span<int> array;

int size = GetArraySize();
if (size < StackallocTheshold && AvailableStackSize() > RedLineStackallocTheshold)
{
    array = stackalloc int[size];
}
else
{
    array = new int[size]
}

RedLineStackallocTheshold protects from stack overflow.
StackallocTheshold is important - it guarantees optimal use of the stack because we use stackalloc in nested functions. This value can only be selected based on the properties of the application itself. In Powershell repo we use 120 - this is the maximum width of the screen by default and we are sure that in most scenarios the application will use the stack, but if the user will set extremely large parameters, the application will continue work using heap.

Fallback to heap can be interesting proposal for the discussion.

Maybe the method should return the number of bytes left. That way the caller can apply any logic they like.

This would simplify the API as well. No need for a size parameter and comparison.

https://github.com/dotnet/runtime/issues/25423 is a better proposal solving the same problem.

Was this page helpful?
0 / 5 - 0 ratings