So Thread.Abort / Thread.Suspend are not just deprecated, they're removed outright from .NET Core, which I think is unnecessary.
.NET Core's nature is to power performance-critical online services, and for services, being "alive" is everything. A lot of efforts can be spent on making the service resilient against unexpected events.
.NET Core is also a very good embedding engine to power a plug-in like system (the service stays native, and extension code runs in .NET Core embedded in the host native service), it's not always realistic to expect that all code running on top of .NET Core to cooperatively check for cancellation tokens, nor would that completely eliminate the risk of "3rd party" extension code misbehaving.
Therefore, as a last resort, it's often required to do whatever is in the power of the service to stay live and being able to serve requests, even at the risk of having partially corrupted state and/or partially inconsistent state, for two reasons:
That is, even if Thread.Abort can lead to inconsistent or corrupted state, it shouldn't be the reason to remove it all together from our toolbox.
We should trust service engineers' ability to weigh the risk against the benefit to make decision on their particular case, instead of worrying that they will shoot their foot somehow.
I think the extreme insistence on a "no corrupted state" program (hence the mindset "just crash if state is corrupted") doesn't apply well to today's service oriented world.
Bottom line is, without these two APIs, I have no way to stop suspected infinitely looping threads from eating up my cores and bring down my service's capacity, which is a big deal for services.
I think the case you make is very reasonable. It is also my opinion that the risk trade-off for aborts can be worth it. ASP.NET for example implements request timeouts with Thread.Abort
. But this approach can absolutely destroy the entire app domain (for example by crashing a static constructor). I usually disable the request timeout entirely therefore.
I personally have never seen a good case for Thread.Abort
but I'm sure it exists. You seem to be in a place where you sometimes need to abort 3rd party code as a mitigation of last resort. It certainly is a radical stance to remove Thread.Abort
from the platform and prevent these good use cases.
It would be incorrect to say that the corruption risk of Thread.Abort
is never worth it. Real world applications sometimes require nasty trade-offs.
I'm personally very happy to see the scourge that is Thread.Abort
to be purged from the platform. So much evil has been committed using it. It's a net benefit to prevent it from being used even considering that valid use cases are prevented as well.
@GSPP thanks for the detailed and considerate response. I have actually encoutered a case where I would like to be able to abort a thread - the thread executes some extension code not owned by the service owner, if it end up busy looping, it will eat up my cores, bring down service capacity, or even kill the service.
To restart the whole service would be a too aggresive move and is subject to false alarm / overreaction, and probably won't help anyways, since new request can still make the situation re-occur.
Only viable damage control is to stop further execution of such misbehaving code - but what's already damaged can not be reversed, i.e. we can stop more cores from being further eaten up, but what's already eaten up (we of course set a maximum limit N) cannot be brought back unless we can suspend or abort the bad thread.
P.S.
[1] I don't think the static constructor situation is always a big concern - sure it can bring down the whole appdomain, but first of all, this is only a disaster for managed service, since the whole service may relie on the appdomain to function, but for hybrid services (becoming more & more common nowadays), the dying of an appdomain is simply one fault that can be tolerated or even partially recovered from (say recreate the appdomain, or recreate the .NET Core CLR), second of all, isn't there ways to make sure a certain area of code (e.g. extension code) never uses static constructors through roslyn code analysis?
[2] While I was doing my research on the whole Thread.Abort/Thread.Suspend situation, I found that there's a huge discrepancy in what experts out there are advising and what the industry is demanding, i.e. almost all experts in the field strongly discourage against / frown upon / outright forbid the use of Thread.Abort/Suspend, which I totally agree is very valid recommendation and for very good reasons, I admit I probably would've done the same if I were to give general suggestions. However, I also see that there're a whole lot of requests from engineers who're crying out loud for a way to abort misbehaving code, and no, cooperative cancellation is often not an option, despite being repeatedly suggested on. The situations they face are often similar or identical to what I have mentioned above. And I would only imagine with the ever more pervasiveness of cloud services and hybrid architecture (using .NET Core as intra-process containers akin to v8/Node.js), this demand is only going to be more loud. And the reason we don't see a lot of yelling is not necessarily due to it not being important, but the human nature to work with what they've been given, i.e. to workaround or tolerate it with what they have in their toolbox, which are not ideal.
but for hybrid services (becoming more & more common nowadays), the dying of an appdomain is simply one fault that can be tolerated
Core CLR doesn't have appdomains; it will crash the process
@benaadams I know. But not if CoreCLR is hosted by a native process.
Could use multi-process and terminate the process instead? Porting to .NET Core - AppDomains
For code isolation, we recommend separate processes or using containers as an alternative.
ASP.NET for example implements request timeouts with
Thread.Abort
. But this approach can absolutely destroy the entire app domain (for example by crashing a static constructor). I usually disable the request timeout entirely therefore.
I completely agree that Thread.Abort should not be used as a timeout mechanism, because timeout is a very frequent event, and repeatedly Thread.Aborting is not a pretty scene to imagine.
In my mind, Thread.Abort is _never_ supposed to be used for timeout.
It's instead used for "this-thread-is-definitely-doing-evil-stuff-like-infinitely-looping-or-blocking", which is a whole different category than normal everyday timeout. Admittedly it's theoretically impossible to determine if a thread is indeed in this kind of state, but in practice it's often possible to do so through domain knowledge particular to one's particular service and application; a very simple way that often works is to have a very relaxed timer that should the timer fires, then definitely things are going south. And this timer is of course conceptually different from the ordinary timeout timer (usually it's much longer)
Could use multi-process and terminate the process instead? Porting to .NET Core - AppDomains
For code isolation, we recommend separate processes or using containers as an alternative.
Yes. That can work. But that kind of destroys the whole essence of .NET Core's main battleground - extreme performance. Not to mention that it's often a huge a mount of dev work to make the process level isolation work nicely if the communication protocol between the processes are not "trivial" such as a blob of data tossed over (shared memory of course) the boarder and then response blob of data received once the other process has done its job.
isn't there ways to make sure a certain area of code (e.g. extension code) never uses static constructors through roslyn code analysis?
Even System.String
has a static constructor. All kinds of types initialize caches using them. That said all of these would typically occur in the first few moments of an application's execution. It does not seem to be a huge problem in practice but it bears the potential of permanently bring down the appdomain.
The documentation for Thread.About is very specific about what the method can and cannot do: "It might terminate the thread". That's not a great API to support in .NET Core.
Unless Windows starts offering something with better guarantees I fully support the decision to NOT have these mechanisms in .NET Core. There is also the concern that whatever mechanism .NET Core introduces, it should work across operating systems.
You can start multiple processes and use that to provide reliability. It's not as convenient as handling it internally in a single process, but you can do it without using API's that are documented "to maybe work".
@mnmr this has little to do with windows offering something or cross platform. Calling Abort
throws an asynchronous exception (async because it is not caused by something that thread is doing itself) into the target thread. That can only happen in managed code (since unmanaged code doesn't support .NET exceptions) when not in finally blocks (because those have well defined behaviors of will always execute if possible at all) and will only abort if the target thread doesn't cancel the abort by catching it and calling ResetAbort
. Those are well defined behaviors. Similiar one could argue that starting another process is not a great API to support in .NET Core either because it "only maybe works" since you can run out of system resources necessary to start a new process. And calling CancellationTokenSource.Cancel()
has even less chances of actually cancelling work as that relies on the code actually checking the CancellationToken
which it will do irregular and it might be stuck somewhere where it doesn't check.
@mnmr Multi-process is often a costly option that goes against the the latency requirement.
And whether an API has guaranteed success or not shouldn't be the reason for banning it outright. As @Suchiman said, they're orthogonal matters. We have a whole lot of APIs that "may or may not" succeed.
I would argue that it's a philosophical question eventually, i.e. whether to trust engineers' ability to make judgments on their own, or babysit them by taking away the sharp objects from the playpen.
In the past where .NET's major battleground was not extreme performance-sensative areas, I would say the babysitting approach was fine, but, nowadays we see .NET Core extending deep into so many extremely performant areas where not only it's required to have riskier APIs, but also where service engineers working there are capable of making risk-aware choices for themselves.
The bottom line is: I don't really see the downside of including the API, since with the current state of C# development environment, we can state very clearly the danger (when and how, and the alternatives, if they apply) in "in-code documentations" such as [Obsolete] tag, and enforce them through warnings, code analysis suggestions, and etc. With these on-the-spot educational tips while the code is being written, I really don't see the over worrying sentiment behind this API, after all they have been included in Windows API since like forever.
I would argue that it's a philosophical question eventually, i.e. whether to trust engineers' ability to make judgments on their own, or babysit them by taking away the sharp objects from the playpen.
This is something particular observed well in .NET Core, prior examples include removing StackTrace
because some developers abused it in an ORM to check for transaction nesting which is really bad but no reason to punish everyone.
@Suchiman yup, and I worry about the philosophy behind these kind of decision makings, particularly because this is .NET Core, this is in and of itself the sharp sword we're handing over to service engineers who would go to great length to squeeze out the last drop of performance gain, and now we're worried about them cutting themselves because we didn't remove some rough edges on the sword's handle. This is paradoxical. Why then don't we ban "unsafe" all together?
And trust me I want .NET Core to be successful as much as anyone. I want to migrate my project over to .NET Core, but without Thread.Abort / Thread.Suspend I couldn't achieve the same level of robustness / resilience that I can through .NET Framework 4.7, and I don't think downgrading services' level of robustness is a very good option (process isolation is not an option either). The case is particularly bad because I couldn't even fashion my own alternative through P/Invoke to Win32 TerminateThread, since they're totally different animals.
And I even want a Span
@pongba
I worry about the philosophy behind these kind of decision makings, particularly because this is .NET Core, this is in and of itself the sharp sword we're handing over to service engineers who would go to great length to squeeze out the last drop of performance gain, and now we're worried about them cutting themselves because we didn't remove some rough edges on the sword's handle. This is paradoxical. Why then don't we ban "unsafe" all together?
Experienced developers who care a lot about performance are not the only target audience of .Net Core. It's also meant to be used by beginners, and everyone in between the two extremes, so I don't see the paradox. Performance is important, but it's not the only important thing.
And the difference between unsafe
and Thread.Abort
is that unsafe
is hard to use safely, while it's generally considered impossible to use Thread.Abort
safely.
And I even want a Span that directly holds pointers to native heap (to reduce interop cost to essentially 0; as of now to my understanding of Span if I need to expose a native std::string's guts to managed code, 1 copy of memory is still needed.)
Span<T>
has a constructor that takes a pointer an length, so I don't see why accessing std::string::data
though Span<byte>
would require a copy. Though I think that discussion is off-topic on this issue.
Span<T>
has a constructor that takes a pointer an length, so I don't see why accessingstd::string::data
thoughSpan<byte>
would require a copy. Though I think that discussion is off-topic on this issue.
@svick Thanks a lot, it appears that my readings were off. This is great to know. I was (incorrectly) under the assumption that Span has to hold interior pointers to GC managed objects.
And the difference between
unsafe
andThread.Abort
is thatunsafe
is hard to use safely, while it's generally considered impossible to useThread.Abort
safely.
@svick then put a huge warning sign in the API that basically says "you think you get this, but you don't. read the full document and you'll see why". stuff like that.
I also noted that you agree that it's not impossible to use correctly (or more precisely, used in the correct scenario). When it's used in the correct scenario it can be very useful, and removing a very useful tool out of the toolbox is not a good design, especially considering the fact that modern development environment has so many ways to discourage or forbid certain APIs in a user-specific way (it should be very easy to have a project-global supression of certain dangerous APIs if project leads consider these taboo), this may even be able to configured in default CSharp build targets config so that any use of it results in error (but can be specifically turned on in a global settings)
And also, I still don't get the static constructor argument, isn't it possible to design and/or implement the API in a way that if an abort was requested during static constructor, the abort simply returns failure, or blocks to wait for static constructor to complete, etc.? (of course the second option has the potential to dead lock, but hey, deadlocks do occcur in everyday programs regardless, right? I kind of prefer the immediate failure if safe-to-abort (safe points) conditions can not be met, so that the aborter can retry with some kind of strategy) cc @GSPP @RussKeldorph
@pongba
I also noted that you agree that it's not impossible to use correctly (or more precisely, used in the correct scenario).
I think that either I misunderstand what you're saying or you misunderstood what I said. I do think it is impossible. In your original post, you said:
for a well-designed plug-in system, it can be designed so that any state the plug-in code can access & manipulate is per-request, so corruption to the state doesn't have global effects.
How exactly do you prevent a plug-in from using global state like Reflection?
If you don't prevent it, then there is no such thing as "partially corrupted state", the whole appdomain is corrupted.
If you do prevent it, then you're almost certainly using something like Roslyn or Cecil to analyze the plug-in code. And if you're doing that, couldn't you also ensure that the plug-in either correctly reacts to cooperative cancellation, or even rewrite it to react appropriately? (For inspiration, here is the code used by SharpLab for this purpose.)
@svick I probably misread you, since I thought you meant "generally considered impossible but can be possible in specific cases".
As to static ctors, right, there's not prevention of implicit usage of static constructors, but
The key problem with thread abort is that it affects reliability of the whole stack. If you are using thread abort, all code (ie all libraries) running in the process have to be robust against being killed by thread abort at any point. It is extremely expensive to audit and write libraries with this constrain.
The static constructors are actually easy to solve. The runtime can tell when the thread is running static constructor, and supress the thread abort during that time. It is all the other code that is the problem.
Try to review any code that is doing a more complex managed/unmanaged interop (e.g. sockets in CoreFX) and try to find places where inserting a thread abort exception would cause bad things to happen. I am sure that you are going to find many places where inserting thread abort would lead to hangs, crashes or data corruptions. And there will be a lot more that the tests would discover. People just do not naturally write code that is able to recover from being aborted at any point.
@jkotas, perhaps outlining the differences in BeginCritialRegion or something else would be helpful for comparisons sake to readers who are not familiar with the technical semantics associated...
None the less; I feel this feature could have been provided with enough care and also would have allowed other conversations e.g. how one would be able to customize and address compatibility.
Things aside, there's nothing stopping this being stubbed into most projects via an extension method and those who can't due to complexity or other concerns must have been doing it wrong.
@jkotas I absolutely agree with you that it's virtually impossible to write normal code that is robust against asynchronous exceptions such as ThreadAbort. However, my main point is that this should not be the reason to remove the API from the toolbox, because:
1) It's very dangerous, for sure. And it's extremely easy to misuse (engineers are prone to mistakenly assume Thread.Abort's safetiness and often incorrectly use it as a timeout mechanism, which they shouldn't). But this danger can (or at least should) be easily controlled by project or even repo-wise API restriction policy set by project leads.
2) In the circumstances where the services don't need normal code to be resilient (i.e. when it does Abort it has a very good idea of where the thread is running at), the Abort / Suspend API can save precious cores by cancelling busy-yet-doing-nothing threads, thus avoiding disastrous online incidents.
A good example is this: at least in two different circumstances I have heard this issue of race-condition in concurrent code causing managed Dictionary's internal linked list to form a loop and therefore causing infinite loops that eats up service machine's cores one by one, eventually leading to big service healthiness issues. Note that currently we have no way of auto-mitigating this issue once it happens in an online system, unless Thread.Abort is supported.
So to sum up my argument: 1) the downside can be quite easily controlled on a per-project(repo) basis 2) the upside is vital *and irreplacable to some important scenarios.
Last but not least, speaking from the point of view of a 1st party, as online service engineer, who's trying to put together a service that allows extension code from 3rd party, the fact that I cannot have 100% guaranteed way of stopping 3rd party code wakes me up at night, and I would imagine a lot of other engineers would share the same sentiment. (also note that, as a comparison, Javascript V8 engine offers ganranteed termination through v8::Isolate::TerminateExecution API, which offers platform abosolute control over extension code in the form of .js)
All in all, what would you do in such scenario?
P.S. the only reason I may be able to resonate with for removing Abort/Suspend from the platform is that in order to maintain this API, all future CoreCLR development may potentially need to carefully take in to consideration the possibility of an async exception that is ThreadAbortException, which adds burden to the development indefinitely.
But even regarding that, I would imagine it's not too hard to modify the semantics of Thread.Abort in such a way that if it's invoked at a unsafe point (from the CLR runtime's point of view only, since the API won't have any domain knowledge to judge whether it's at a safe point from user's point of view, nor should it care) it simply return false immediately, signaling a "failure to abort".
In other words, the Thread.Abort is only responsible for guaranteeing the consistent state of the runtime, not the consistent state of user code, the later should be users responsibility and judgement call. This dichotomy is similar to the "Basic Guarantee" and "Strong Guarantee" of C++ exception-safety guarantees.
I have heard this issue of race-condition in concurrent code causing managed Dictionary's internal linked list to form a loop and therefore causing infinite loops that eats up service machine's cores one by one
This specific issue has been fixed in .NET Core. The Dictionary in .NET Core will throw exception when it detects corrupted state due to race condition instead of ending up infinite loop. If there are other bugs that you need Thread.Abort
to workaround, please do let us know. We would rather fix them.
Javascript V8 engine offers ganranteed termination
The JavaScript language and libraries are designed to allow in-process sandboxing - running arbitrary 3rd party code and abort it reliably without killing the process. We have given up on sandboxing with .NET a long time ago.
@jkotas good to know it's fixed in .NET Core. I still wonder if the fix adds perf penalties or not.
And besides that, more importantly, in the long run, data structures designed for single-threaded usage
shouldn't get worried about its internal invariants getting broken because its public methods are called in a non-thread-safe way, if every data structure had to worry about this, it would be a poor separation of responsibility I think, since it should be the user code's responbility to respect the precondition (single-threaded access). Here we have linked lists, and then trees, and then graphs, DAGs, and do we need to start detecting loops all of a sudden because now they should start providing the "no infinite loop" guarantee? And what about custom data structures that makes uses of some form of linked lists. I don't see this as being very sustainable.
As for why .NET sandboxing effort has been given up, can you share more on that, I'm very interested in knowing why. As far as I'm concerned, a reliable stopping mechanism is about the only piece left in the puzzle (besides the language construct & library usage restrictions & stylecops, etc.). Of course there's the stackoverflow issue, but we can live with that (also there're mitigations and prevention mechanisms for that, unlike the situations discussed above). And don't get me wrong: I do know that .NET probably can never be 100% completely sandboxed, there will always be some loopholes due to the powerfulness of .NET, but again a lot of the time we're not trying to prevent Machiavelli but Murphy from breaking laws.
why .NET sandboxing effort has been given up
It is hard to tell where it is "safe" to insert the thread aborts. Libraries that need to be robust in presence of Thread.Abort need to be annotated for it, coded in a special way and stress tested.
We have tried to do this in .NET Framework: It was a full time job for several people to run a stress harness that inserted thread about at random points in .NET Framework, and file and fix bugs on the crashes, hangs and data corruptions that it hit. This was done only for a subset of .NET Framework that was usable in SQLCLR, and still it was never ending stream of issues.
Even with this effort, we often got a support escalation (from paid support) where people hit problems with Thread.Abort in production. Some of these issues require a very ugly hacks to workaround because of there was just no right fix to them. We had to resort to hacks like decoding assembly instructions and suppress or adjust thread abort behavior for particular instruction pattern that was known to hit the problem.
I am not even talking about larger .NET ecosystem - if you take a random NuGet package from nuget.org, it is almost guaranteed that it has reliability bugs in the presence of thread abort.
good to know it's fixed in .NET Core. I still wonder if the fix adds perf penalties or not.
It doesn't as it https://github.com/dotnet/coreclr/pull/16991 piggybacks on the existing Hash DoS protection; also you will find the Dictionary in .NET Core has had many performance improvements made to it and is much faster than .NET Framework even with this additional protection.
data structures designed for single-threaded usage shouldn't get worried about its internal invariants getting broken because its public methods are called in a non-thread-safe way
I agree with this in general. However, if we find that a particular datastructure tends to get used incorrectly, it is worth it to make this incorrect use diagnosable if it is cheap to do.
why .NET sandboxing effort has been given up
It is hard to tell where it is "safe" to insert the thread aborts. Libraries that need to be robust in presence of Thread.Abort need to be annotated for it, coded in a special way and stress tested.
We have tried to do this in .NET Framework: It was a full time job for several people to run a stress harness that inserted thread about at random points in .NET Framework, and file and fix bugs on the crashes, hangs and data corruptions that it hit. This was done only for a subset of .NET Framework that was usable in SQLCLR, and still it was never ending stream of issues.
@jkotas I really appreciate the detailed explanation, now I see the scope of effort needed to support the API much clearly. And it also appears to me that as much as I would like to wish the API restriction to be an effective way of limiting misuse, it's probably not going to be so easy in real world, people will still somehow find ways or simply ignore the warning sign.
Core printing API has some serious issues (including outright hangs on a PrintVisual/PrintDocument calls), can else should I be able to stop the freezed thread?
Well, very good points all over the place here! here are my 2 cents:
-A insecure, error prone, "deadly" api is better than no api at all.
Right now I'm looking at a rancher monitor that shows me ever-growing CPU consuming pods. When the CPU baseline/idle is around 50%, we kill the pods and let Kubernetes recreate them. "Oh the thread aborting corrupted the app domain and crashed the app?" well, it only anticipated the inevitable and now I don't have to keep 50% of my pod doing trash or w8ing for answers that will never come.
Kestrel is a very good and performant web server, but it's inability to timeout a request (wether by aborting a thread, or killing a task) makes it a ticking bomb in hands of lower-tier developers...
@jkotas Just to say that F# and other .NET REPL coding environments had happily been using Thread.Abort for years as a way of support Ctrl-C/Interrupt to stop evaluation and continue. It worked reliably enough of the time to make it highly useful.
It's a real shame to see this crucial functionality ripped out without this scenario being analysed. It really doesn't require the mechanism being 100% reliable - it requires ~99.x% reliability, which you'd achieved in .NET Framework.
This also means that it's impossible to have an "Interrupt" button in any .NET execution environment executing arbitrary user code, ever. For example,
.NET Interactive Jupyter notebooks can't support the Jupyter "Interrupt" button to stop user code execution
A .NET teaching environment can't ever have an "Interrupt" button
REPLs can't support Ctrl-C, one of the most basic functions of a REPL.
These scenarios matter for the long term future of .NET.
There's been a bit of a category mistake here I think based on incorrect analysis that reliability = sandboxing = perfection = usefulness. It's just not the case - there are valid scenarios where thread abort just has to work most of the time.
We still have the "good enough" thread abort functionality available for debuggers via debugging APIs. If the other debugger-like environments cannot act as real debuggers, I would not be opposed to exploring ways how to make the thread abort available to them.
We still have the "good enough" thread abort functionality available for debuggers via debugging APIs. If the other debugger-like environments cannot act as real debuggers, I would not be opposed to exploring ways how to make the thread abort available to them.
Ah interesting. cc @eiriktsarpalis @KevinRansom
Could you link me any docs on this? Thanks
@jkotas Just to check, can processes open debug connections to themselves, or do they need an external process to be the debug process?
Also cc @jonsequitur since it's related to Interrupt in .NET Interactive and Jupyter notebook.
Debugger needs to be external process.
Ah I see, so all evaluation must be submitted via debug API. Hmmm... Well, it's something to think about.... though very different to what we are doing today for F# Interactive etc.
It wouldn't be totally impossible to rearchitect the F# REPL to use a supervised process but it would be a large amount of work. And that work would likely be needed again for .NET Interactive and C# REPL.
If it were possible to put on the agenda a System.Runtime.Helpers.UnsafeThreadAbortOnlyForUseByInteractiveExecutionEnvironmentsActingAsTheirOwnSupervisor()
that would be grand....
If the other debugger-like environments cannot act as real debuggers, I would not be opposed to exploring ways how to make the thread abort available to them.
Could we reopen this issue to keep this specific possibility tracked?
The case that started this issue was production scenarios. The case for debugger-like scenarios should be a new issue.
Ah ok. I'll start an issue
Issue is here: https://github.com/dotnet/runtime/issues/41291
I would like to comment on this issue as a client: We (keep it confidential for now) have started using .NET Core for our new software projects. The platform is great, in general most can be achieved rather efficiently. But quirks such as this one are quite scary - for example, right now I am working on an application which requires an operation to run in a new thread (pretty much isolated operation), but after some time the operation should be aborted. Because the operation consists of invoking WCF, REST and HTTP services, many of which are generated, it is practically impossible to apply a timeout to all of them (30 distinct clients), especially because all possible timeouts are not accessible (open request, read request, write, etc.). So I have tried to use Abort, which is surprisingly deprecated, and the only remaining option is to simply run the operation in a new task or thread and let the thing timeout via default settings, hopefully! implemented by enterprise systems aware developers. This would normally not be OK, because the operation would modify data in connected systems, but in this health check application this is possible.
But for our other applications this will definitively be a major problem. What options are available to us for resolving this problem (paid support, service ticket, manager contact, ...)?
Thread.Abort
for production scenarios is not going to come back to .NET Core. I am sorry.
The best solution is to make the services that you are using to be exposed via cancellable async operations.
Thanks for the suggestion.
I must add that the operation in a new thread does not invoke all 30 of the clients, but only one client per thread is invoked in series or in parallel, as configured (that was the intention). So typically this would be a short operation, but if the service is slow, the thread my take longer than the health check threshold. Because I am simply invoking a service client, it is not possible to poll for cancellation token. But it does seem to be possible to configure timeouts for most of our requirements, perhaps the issue will not be so problematic. Below is a short listing in case anyone hits this page searching for timeouts, .NET Core and service clients:
It is possible to set binding timeouts the same way as in .NET Framework for WCF .NET Core clients:
https://docs.microsoft.com/en-us/dotnet/framework/wcf/feature-details/configuring-timeout-values-on-a-binding
Then there are REST clients which typically use HttpClient. It has a Timeout property which may be useful as well: https://docs.microsoft.com/en-us/dotnet/api/system.net.http.httpclient.timeout?view=netcore-3.1
The above represents most of the service clients and the remaining ones will just have to be dealt with one at a time. If anyone hits the same situation, don't forget to tell the manager that patience is a virtue ;)
(e.g. sockets in CoreFX)
It is about almost any other code.
Many developers don't understand their program can be stopped at any moment.
Most helpful comment
Well, very good points all over the place here! here are my 2 cents:
-A insecure, error prone, "deadly" api is better than no api at all.
Right now I'm looking at a rancher monitor that shows me ever-growing CPU consuming pods. When the CPU baseline/idle is around 50%, we kill the pods and let Kubernetes recreate them. "Oh the thread aborting corrupted the app domain and crashed the app?" well, it only anticipated the inevitable and now I don't have to keep 50% of my pod doing trash or w8ing for answers that will never come.
Kestrel is a very good and performant web server, but it's inability to timeout a request (wether by aborting a thread, or killing a task) makes it a ticking bomb in hands of lower-tier developers...