Runtime: Thread.Stop

Created on 12 Jul 2020 · 19Comments · Source: dotnet/runtime

Background and Motivation

Thread.Abort is gone [sic].

People need a way to stop threads as all threads must technically stop at some point, ie. before system shutdown or face outright termination which is worse.

ThreadState already supports Stopped and WillStop.

Proposed API

namespace System.Threading
{
     public class Thread {
+    public bool TryStop() => return !CurrentState.HasFlags(Stop|StopRequested)) ? (CurrentState |= StopRequested; return true) : return false;
+ public bool StopRequested => CurrentState.HasFlags(StopRequested))
+ public bool Stopped => CurrentState.HasFlags(Stopped))
+ public bool WillOrHasStopped =>Stopped| StopRequested ;
}
     }

If you allow resuming internally (as I am sure you already do) then maybe allowing that publicly would be beneficial to some but the costs there might outweigh the gains.

Other considerations

Possibly as extension methods

Risks

Low, better than people getting it wrong.

Alternative design

A Thread.Interrupt with a state object, similar to Monitor.Pulse api's.

or as @stephentoub "suggested"

    public class Thread {
+    public CancellationTokenSource CancellationTokenSource {get;}
}

Prior Art

https://github.com/juliusfriedman/net7mma_core/blob/master/Concepts/Classes/Threading/Threading.cs#L56

We may want to consider other methods for other flags as well.

api-suggestion area-System.Threading untriaged

Source

juliusfriedman

Most helpful comment

This was mentioned earlier, but I want to draw extra attention to it. There is a difference between _unsafe_ code ([SkipLocalsInit] / Unsafe.As) and _unreliable_ code (Thread.Abort). You're right in that unsafe code is inherently dangerous. But it is definitely possible to use such code correctly. With appropriate discipline and code review you can use these features in production apps and not worry about latent bugs. Such usage will not introduce reliability issues into the application.

With _unreliable_ code, there's no good way to use it correctly. Framework / runtime code is generally not resilient against rude thread aborts. If you abort a thread, you need to know for certain that every single frame on that thread can properly handle being rudely aborted. .NET Core does not provide any way of making this guarantee. The end result is that if you call a rude abort API, you inherently risk corrupting the state of the process. There's no reasonable way to write the call site such that it only aborts the thread when it is known safe to do so. These APIs present an unacceptable pit of failure for developers, which is why we tend not to be keen on reintroducing them.

GrabYourPitchforks on 12 Jul 2020

👍3

All 19 comments

cc (in no particular order)@stephentoub @GrabYourPitchforks @terrajobst @jkotas

juliusfriedman on 12 Jul 2020

Is the idea behind this graceful cancellation? That is, we expect the threat to occasionally poll this property and unwind if the flag is set? It's an interesting idea but I wonder how it'll work as the ecosystem moves to a Task-based abstraction rather than a Thread-based abstraction.

GrabYourPitchforks on 12 Jul 2020

❤1

Yes at heart not having to set your own stop flag because one already exists.

If people are using Threads how likely is it they will move to Tasks?

juliusfriedman on 12 Jul 2020

Such a mechanism already exists in the form of CancellationToken{Source}. How is this substantially different?

stephentoub on 12 Jul 2020

❤1

How do I use a plain thread with a CancellationToken? a construct commonly associated with async to which a runtime may not implement. (albeit not required to be async at all.)

This allows the runtime threads to be stopped (and if you deem to resume) from user code without inter-op / custom host. (All native threads can be stopped)

It allows an outside user a way to Stop a thread they have access to which is a normal thing to do in almost all cases. (Save the lone watchdog and yet even then)

I don't have to check for my own Stopped or even have a CancellationToken or be aware of the ThreadPool.

Some runtime might not implement async or even care about Tasks but you need threads for the CLR and Tasks are built on CLR threads which might not even be a real thread.

WASM / arm, IOT would benefit from the ability to Stop threads to save power (potentially better with suspend but more overhead)

We already have the flag and no way for the user to set it officially (safely) although they can observe it (bad joo joo)

Its not different in the sense that you probably already use this from CTS, it only makes sense for others to be able to use it as well.

Unless you are proposing that all threads have a CancellationTokenSource to which the user can interact I would pose the question how does CancellationToken really matter here?

P.s. Thank you for the constructive criticism.

juliusfriedman on 12 Jul 2020

How do I use a plain thread with a CancellationToken?

Exactly the same way you'd use these, e.g. poll to see whether cancellation has been requested.

This allows the runtime threads to be stopped (and if you deem to resume) from user code without inter-op / custom host. (All native threads can be stopped). It allows an outside user a way to stop a thread they have access to which is a normal thing to do in almost all cases. (Save the lone watchdog and yet even then)

This is dangerous and unreliable. Thread aborts were removed because of the inherent dangers of such interactions, and this could be even worse due to cleanup not running, locks not exiting, etc.

stephentoub on 12 Jul 2020

😕1

How do I use a plain thread with a CancellationToken?

Exactly the same way you'd use these, e.g. poll to see whether cancellation has been requested.

This allows the runtime threads to be stopped (and if you deem to resume) from user code without inter-op / custom host. (All native threads can be stopped). It allows an outside user a way to stop a thread they have access to which is a normal thing to do in almost all cases. (Save the lone watchdog and yet even then)

This is dangerous and unreliable. Thread aborts were removed because of the inherent dangers of such interactions, and this could be even worse due to cleanup not running, locks not exiting, etc.

You may as well remove ThreadState from public consumption just as well as Running etc from Threads then otherwise it seems to me we have much bigger issues....

We already have those flags and make use of them so I don't buy that. (See Socket.Connected) for the same issue on a commonly used object (under the TCP protocol is that property documented as dangerous?)

Abort was different because of the exception handling at the very least.

This involves no such exceptions and doesn't provide the user with anything they don't already have other than a way to set that flag reliability just as it is set from the CLR.

If the CLR can set it then why can't it's users?

Why should I have to resort to unsafe code or interop to send a stop to something which can be very very reliable in that regard, safety is typically not the concern.

E.g. I know a thread is accessing memory it's not supposed to, why do I need to talk to the kernel to talk back to the CLR when the CLR can handle it's own business? These threads aren't (may not) be under the kernels control directly so why even make it seem like its not safe or something serious when in fact the CLR reliably uses these and sets these flags all the time ?

I could argue that covariant returns are not safe just as I could argue about a lot of other things but the point is moot as there are safe ways to use them.

And again thank you for your time.

juliusfriedman on 12 Jul 2020

I guess the confusion stems from that we're not quite sure what's motivating this. Is the issue that you want to be notified before process termination so that you can perform some cleanup work? If you could explain the scenario a bit more there might even be an existing API we could point you to.

GrabYourPitchforks on 12 Jul 2020

Sorry for the confusion.

I just want to be able to tell the thread to stop working ASAP. (If it does not then the CLR should be able to terminate it just like the OS) and I feel like there should be API's for that just like there are for GC Notifications.

If that is dangerous I don't see why.

This is me just being able to read and write the same flags that I can observe from the Thread from it's ThreadState.

E.g. I can see it's stopping but I can't tell it to stop seems like a pit of failure. some code]

Since these might not even be real threads anyway, I feel like having a way to control them is better than having to resort to unsafe code / custom host just to have that level of control.

We offer API's for SkipLocalsInit which is just as dangerous...

We have Unsafe.As now which is just about as dangerous if not more.

Yes I know I could just make a class/struct and in that type I have my thread and my state which can be observed; but; then the thread can do whatever it wants (within reason) because I cannot forcefully SHUT it down. I cannot even ask it to stop. All I can do it set my state or call Cancel on my CTS and hope for the best. (which is what Tasks are kinda doing under the hood anyway right? Only they have slightly more help...)

So if I now know that I asked my type to stop and it will not stop for some reason what is the API for me to look at?

Thread.Suspend I would assume? Collect some stack and ensure my state etc. That is not friendly either but well supported...

I guess it depends on how the program is structured because if my (OS) started my process that started that (CLR) thread and I need to kill something all I have is my process id so how can I terminate that CLR thread without a custom host / inter-op code?

This implies to isolate the program I need to move it into a separate process and then I need to ensure access to that process from the other etc which can get quite complex. MSDN

I guess if I just had the ability to set the STOP bit on a CLR thread and have it be enforced just as it always does I would be happy and feel a lot more secure about certain scenarios (Networking, IPC)

How crazy it is that I can access CPUID and call Intrinsic's safely but I can't stop a thread without killing en entire process...

juliusfriedman on 12 Jul 2020

WASM / arm, IOT would benefit from the ability to Stop threads to save power (potentially better with suspend but more overhead)

Not sure exactly how? To me this sounds like "you get almost infinite battery life if you keep your device turned off". Technically you can, but you don't want to because you want to run tasks on it.

Some runtime might not implement async or even care about Tasks but you need threads for the CLR and Tasks are built on CLR threads which might not even be a real thread.

I'm not sure if async-await / Task is a concept that's deeply embedded into the runtime, and even if they are, if they can get away with it for proper ECMA-335 conformation? Also, Task != Thread: https://blog.stephencleary.com/2013/11/there-is-no-thread.html

I just want to be able to tell the thread to stop working ASAP. (If it does not then the CLR should be able to terminate it just like the OS) and I feel like there should be API's for that just like there are for GC Notifications.

So this is basically Thread.Abort, except this isn't notified to the code running in the thread (like ThreadAbortException), and it is guaranteed to stop the thread unlike Thread.Abort which may not stop the thread (e.g. ResetAbort)?

If that is dangerous I don't see why.

Assuming that I have understood this correctly, IMO this is even worse than Thread.Abort; with Thread.Abort you could do certain cleanups to guard against sudden Aborts by putting try-catchs everywhere (except people don't do that, and it's incredibly hard to write code under the assumption that the thread may be terminated at any given time). With this you don't even get the option to perform clean-ups, and this means Abort and this Stop method shares the common problem that it may lead to corrupt state, break invariants and lead to much more weird failures somewhere further down the line that's impossible to diagnose. @jkotas makes several great points against Thread.Abort in #11369, and I think most of the points also apply to this proposed API as well.

We already have the flag and no way for the user to set it officially (safely) although they can observe it (bad joo joo)

If the CLR can set it then why can't it's users?

I'm not sure if these flags are still used, documents for ThreadState clearly state that StopRequested is for internal use only and it might not even be used anymore; at least looking at the C# sources I cannot find any references to that member.

SkipLocalsInit, Unsafe.As, CPUID, intrinsics

I don't see how these are relevant to this. Those could be equally dangerous if not used correctly, but you can determine when you can use it safely and in return you get great benefits (i.e. performance). However, unlike those mentioned, with this proposed API you cannot determine when it's safe to be used safely and it likely would be very hard to get right even if we added more things to accommodate this API.

Thread.Suspend I would assume? Collect some stack and ensure my state etc. That is not friendly either but well supported...

Thread.Suspend also is not supported on .NET Core like Thread.Abort, and it will result in PlatformNotSupportedException being thrown when called.

Why should I have to resort to unsafe code or interop to send a stop to something which can be very very reliable in that regard, safety is typically not the concern.
E.g. I know a thread is accessing memory it's not supposed to, why do I need to talk to the kernel to talk back to the CLR when the CLR can handle it's own business? These threads aren't (may not) be under the kernels control directly so why even make it seem like its not safe or something serious when in fact the CLR reliably uses these and sets these flags all the time ?

So if I now know that I asked my type to stop and it will not stop for some reason what is the API for me to look at?

IMO, those are exceptional (bug) cases and the app should terminate ASAP (e.g. Environment.FailFast / Environment.Exit) instead of trying to recover and continue executing. Alternatively, you could do what you described and run the task in a separate isolated process that's a lot less likely to affect your program.

Gnbrkm41 on 12 Jul 2020

Stopping threads uncooperatively is dangerous and was intentionally removed. I think the discussion above provides sufficient information about why.

Aside from its merits, as a general suggestion @juliusfriedman for API proposals, especially non trivial ones like this, it's very helpful to have more detailed motivation based on examples of real world use - something like "People need a way to stop threads as all threads must technically stop at some point, " isn't enough - as you can see you end up with all kinds of clarifying questions. Thanks for the proposal.

danmosemsft on 12 Jul 2020

@Gnbrkm41 trust me I believe you and @jkotas , I know how dangerous it can be... just ask Oracle why Thread.stop was deprecated there....

My points are as follows:

C# is not java you have access to unsafe and basically machine level code (moot any programming language does, see row hammer 40k js for some laughs)
you can kill any process at anytime (kill -9) (taskkill) from the os and the os can kill any running call fiber / co routine it starts on a core.
What if my battery is dying and I want to kill threads I know can be killed because I have their state on disk or otherwise?
In the end a user should be able to attach, detach terminate or abort just as the debugger can, the debugger is just software which should be able to be run by other software.
The user can't control bugs in the framework / os etc and at that time if they need to kill a thread rather than a process they should have the ability to detach as well as they have to join and I would imagine the debugger would be able to re-attach as required when required
the debugger is just about as dangerous in some cases then

Given all those things are true I don't see the big deal, this is kinda just a wrapper over detach really with some state on the managed side to be aware I just moved my CLR thread into native territory and it's no longer under my direct authorita.

The only other option is manually creating threads at the OS level or using processes....

Thank you all for your time!

juliusfriedman on 12 Jul 2020

If you don't care about doing anything gracefully, you can already pinvoke to (on Windows) TerminateThread if you want. It will very likely not do what you want. As the documentation says, "TerminateThread is a dangerous function that should only be used in the most extreme cases. You should call TerminateThread only if you know exactly what the target thread is doing, and you control all of the code that the target thread could possibly be running at the time of the termination.". You certainly would have to control all the code involved, and you do not control the runtime.

you can kill any process at anytime

Killing a thread is not at all like killing a process, because processes by default do not have access to each other's resources - handles, etc.

danmosemsft on 12 Jul 2020

👍1

If you don't care about doing anything gracefully, you can already pinvoke to (on Windows) TerminateThread if you want. It will very likely not do what you want. As the documentation says, "TerminateThread is a dangerous function that should only be used in the most extreme cases. You should call TerminateThread only if you know exactly what the target thread is doing, and you control all of the code that the target thread could possibly be running at the time of the termination.". You certainly would have to control all the code involved, and you do not control the runtime.

you can kill any process at anytime

Killing a thread is not at all like killing a process, because processes by default do not have access to each other's resources - handles, etc.

I can interrupt at anytime though right? The hardware has to handle that and only then I get control back if everything goes smoothly.

Definitely appreciate you trying to point me to an Api but I was more after the provocation of the thought rather than exactly how to do it, I could just patch the IP to HLT or something 📦

Thank you all again for you time!

In the future I would hope as we get even closer to the metal that we can mimic everything offered in C++ rather than get so close to the metal we lose the ability to innovate.

C++ has this feature and C# should also.

juliusfriedman on 12 Jul 2020

C++ does not have this feature.

jkotas on 12 Jul 2020

👀1

GrabYourPitchforks on 12 Jul 2020

👍3

C++ does not have this feature.

I don' mean to argue but what feature? It seems to have them all except Stop (which is terminate) afaik.

https://en.cppreference.com/w/cpp/utility/program/abort
https://en.cppreference.com/w/cpp/error/terminate
https://en.cppreference.com/w/cpp/thread/thread/detach

They also have

https://en.cppreference.com/w/cpp/utility/program/signal to which I think @tmds proposed an API in #15178 perhaps I will take my gripes there and ask for a cross platform counterpart?

There is no reason I see that .net threads can't make support of signals internally as some concept to which is abstracts for various platforms...

juliusfriedman on 12 Jul 2020

This was mentioned earlier, but I want to draw extra attention to it. There is a difference between _unsafe_ code ([SkipLocalsInit] / Unsafe.As) and _unreliable_ code (Thread.Abort). You're right in that unsafe code is inherently dangerous. But it is definitely possible to use such code correctly. With appropriate discipline and code review you can use these features in production apps and not worry about latent bugs. Such usage will not introduce reliability issues into the application.

With _unreliable_ code, there's no good way to use it correctly. Framework / runtime code is generally not resilient against rude thread aborts. If you abort a thread, you need to know for certain that every single frame on that thread can properly handle being rudely aborted. .NET Core does not provide any way of making this guarantee. The end result is that if you call a rude abort API, you inherently risk corrupting the state of the process. There's no reasonable way to write the call site such that it only aborts the thread when it is known safe to do so. These APIs present an unacceptable pit of failure for developers, which is why we tend not to be keen on reintroducing them.

I never had a problem with Thread.Abort and yet I agree with everything you and the rest of the teams is saying.
All I am saying is we give all these other foot guns with explicit warnings etc.

IMHO this is no more dangerous than attaching a debugger. (just about the same amount of work has to be done) the debugger almost must handle native process loop so it would also therefor already likely have some of this logic inside of it.

The easy example here is say I launch the debugger and the debugger attached to my program and crashes, it will likely take my program down with it if the code injected into it can no longer be handled by it.

This is no more or less dangerous IMHO

juliusfriedman on 12 Jul 2020

Btw, we experienced a crash today from the profiler running under the debugger with a release application... it crashed the whole app. Talk about double dipping and double danger. (OOM in Telemetry..)

juliusfriedman on 14 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings