Runtime: BackgroundService handles Exceptions getting thrown differently when tasks are not faulted vs faulted

Created on 29 Nov 2018  ·  20Comments  ·  Source: dotnet/runtime

I think how exceptions are handled with BackgroundService class is a bit confusing and may require some discussion.

  • If an exception is thrown during startup (e.g. in Main) and before ExecuteAsync is handled, the app will crash.
  • If an exception is thrown during ExecuteAsync _but before_ the first await is called, the app will crash.
  • If an exception is thrown during ExecuteAsync _but during or after_ the first await is called, the app will ... semi-hang? It stop processing ... but ... Ctrl-C does successfully start the app gracefully terminating.

Now - I _hope_ i have the above steps/scenario's right -- I might have fudged something (async/await/ task contexts, etc are my weak points).

So the problem here is about the inconsistency about what an exception should be doing in a BackgroundService. Should it crash the host (assuming no top level try/catch in Main)? or get swallowed? Maybe the issue is really about documentation (as the code is totally fine).

This issue arose when I started a conversation in the "ASP.NET slack" group when I was trying to handle an exception getting thrown in side the ExecuteAsync method and bubbling that upwards.

To quote some of the summaries of this convo:

@davidfowl [7:06 PM]
great discussion
heh
it could suppress both and if you wanted to get exceptions, then you override StartAsync and put logic there

--

@chrisvanderpennen [7:13 PM]
theres imo three scenarios that need covering:
i don't want any exceptions in this to stop my host
i want an exception during init to stop the host startup, but runtime exceptions are ok
i want a runtime exception to stop my host

@davidfowl [7:30 PM]
somebody file a bug

so CC'ing @poke @gulbanana @chrisvanderpennen and @davidfowl who were in the convo.

  • I think the issue subject might need to be refactored.
  • happy to edit the main body of this issue, where i got my facts wrong.
  • This issue needs the famous tag 'fowler' (fowler has issues). :)

edit: this was a sample gist I tried to make to also test/repo this problem.

area-Extensions-Hosting enhancement

Most helpful comment

Let me try to paint the worst case scenario for each "mode".
Scenario 1. You have a non-important Service and the default is to crash on unhandled errors
This brings your application down. You go look at the logs, and you see that it's your NotImportantService that brought the house down. You curse under your breath, and wrap it in a dummy catch or a Polly retry or whatever. Worst case is you get some downtime once.

Scenario 2. You have an important service, and the default is to just let it die
Your service runs some stuff every hour. At some point data stops coming through. You look at the server - it's running fine (and you don't know that a crashing BackgroundService doesn't crash the server). You look at the logs. Finding anything is hard (impossible without this new PR), because the service could have crashed a long time ago. You mutter under your breath, and figure out how to properly close the application via IApplicationLifetime or similar.
I'm not exaggerating when I've seen people spend weeks re-writing stuff because they were unable to debug a silently crashing BackgroundService, and thought the process was gettign OOMKilled or similar.

All I'm saying is that the failure mode for Scenario 2 is much worse, which is why I think that crashing should be the default. Admittedly that calculation was more heavily skewed back when we didn't even have any logs.

But perhaps the right solution is for there to be no default - but forcing the user to explicitly specify "Should this backgroundservice crash the server on an uncaught error" - because I agree that there are both types of services.

All 20 comments

imo the bug is that BackgroundService deliberately defers exceptions which are stashed in the Task, but can still be brought down by a Task-returning function which throws before or instead of awaiting. it’d be more consistent if it also used try-catch to intercept and store exceptions from that first section of the state machine.

the “semi-hang” might just be the intended behaviour of ConsoleLifetime though.

here’s one way that the exceptions could be given a consistent treatment, using a handler function which relies on AsyncTaskMethodBuilder to queue continuations onto the threadpool if they do occur:

https://github.com/gulbanana/local-async-example/blob/master/Program.cs

@davidfowl what's the work in 3.0 here?

cc @Tratcher

A few targeted changes to BackgroundService.

  • We Task.Run the call to ExecuteAsync because user code cannot be trusted
  • Exceptions thrown in the background service are observed in StopAsync but never in StartAsync.

@davidfowl are you going to do this?

Yes, I have a branch, I've just been lazy finishing it. I'll do it next week.

Curious if anyone has implemented a resilient pattern in their BackgroundService so that it will continue running even if an exception is thrown during ExecutionAsync

Sadness -> this is now in Backlog and will not ship with v3?

Correct. Unfortunately there was higher priority work that we had to bump this out for.

I would have expected worker services to have a built-in notion of global exception handling, but I guess not (yet)?

Anyway, here's what I do for now:

protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
    while (!stoppingToken.IsCancellationRequested)
    {
        try
        {
            _logger.LogInformation("Worker running at: {time}", DateTimeOffset.Now);

            // actual stuff happens here
        }
        catch (Exception ex)
        {
            // waiting on https://github.com/aspnet/Extensions/issues/813
            _Logger.LogError(ex, "Global exception occurred. Will resume in a moment.");
        }

        await Task.Delay(1000, stoppingToken);
    }
}

I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.

PR https://github.com/dotnet/runtime/pull/42981, submitted for 6.0, changes this scenario a bit.

  • If an exception is thrown during startup (e.g. in Main) and before ExecuteAsync is handled, the app will crash.

This behavior has not changed.

  • If an exception is thrown during ExecuteAsync _but before_ the first await is called, the app will crash.

This behavior has not changed.

But for the below scenario:

If an exception is thrown during ExecuteAsync but during or after the first await is called, the app will ... semi-hang? It stop processing ... but ... Ctrl-C does successfully start the app gracefully terminating.

(directly related to: https://github.com/dotnet/runtime/issues/36017)
when this happens we log the exception

summary

The main discussion around this issue is around the fact that if we throw an exception before await vs after an await the app behaves differently.

Thanks team for not forgetting or closing this issue.

@maryamariyan that's great progress, but I think you should consider not just logging the exception but actually shutting down the application, to mirror the synchronous behaviour.

That doesn't really make sense as it can fail at an arbitrary point in time. If you want the thing to fail then override start async and have it fail there

Why doesn't that make sense? I'd certainly expect if I have a background service running that fails at some point and I haven't caught the error, that it would bring the application down.

anurse was very adamant that a BackgroundService throwing an exception shouldn't bring down the application. See the thread here:

https://github.com/dotnet/runtime/issues/36017#issuecomment-562710058

In my opinion, this issue is to make these two scenarios consistent:

  • If an exception is thrown during ExecuteAsync but before the first await is called
  • If an exception is thrown during ExecuteAsync but during or after the first await is called

Since the 2nd case shouldn't bring down the application, neither should the first case.

He does seem to be rather adamant about that. I personally think that's the wrong decision (particularly for worker services you can end up in a situation where the entire process just does.. nothing and you don't know unless you have alerts on your logs) - but hey ¯\_(ツ)_/¯

Right, I think it's fine to suggest an option on HostOptions that would fail if any background service task throws but it wouldn't be the default.

Let me try to paint the worst case scenario for each "mode".
Scenario 1. You have a non-important Service and the default is to crash on unhandled errors
This brings your application down. You go look at the logs, and you see that it's your NotImportantService that brought the house down. You curse under your breath, and wrap it in a dummy catch or a Polly retry or whatever. Worst case is you get some downtime once.

Scenario 2. You have an important service, and the default is to just let it die
Your service runs some stuff every hour. At some point data stops coming through. You look at the server - it's running fine (and you don't know that a crashing BackgroundService doesn't crash the server). You look at the logs. Finding anything is hard (impossible without this new PR), because the service could have crashed a long time ago. You mutter under your breath, and figure out how to properly close the application via IApplicationLifetime or similar.
I'm not exaggerating when I've seen people spend weeks re-writing stuff because they were unable to debug a silently crashing BackgroundService, and thought the process was gettign OOMKilled or similar.

All I'm saying is that the failure mode for Scenario 2 is much worse, which is why I think that crashing should be the default. Admittedly that calculation was more heavily skewed back when we didn't even have any logs.

But perhaps the right solution is for there to be no default - but forcing the user to explicitly specify "Should this backgroundservice crash the server on an uncaught error" - because I agree that there are both types of services.

Was this page helpful?
0 / 5 - 0 ratings