Runtime: Dataflow TransformBlock silently fails if TaskCanceledException is thrown

Created on 21 May 2019  路  12Comments  路  Source: dotnet/runtime

If an operation executing in the context of a TransformBlock throws a TaskCanceledException - specifically that exception - it does NOT fault the dataflow pipeline and silently continues execution.

In my case, I was making some HTTP calls and an overloaded server caused HttpClient to throw that exception.

.NET Core SDK (reflecting any global.json):
Version: 3.0.100-preview3-010431
Commit: d72abce213
System.Threading.Tasks.Dataflow, Version=4.6.4.0

Sample program that will reproduce the issue (it should throw an exception, but instead it just prints 'Done'):

```c#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;

namespace ConsoleApp1
{
class Program
{
static async Task Main(string[] args)
{
var producer = new BufferBlock();

        var processor = new TransformBlock<int, int>(async input =>
        {
            throw new TaskCanceledException("Silent failure");
        });

        producer.LinkTo(processor, new DataflowLinkOptions { PropagateCompletion = true });

        var results = new List<int>();

        var consumer = new ActionBlock<int>(result =>
        { 
            results.Add(result);
        });

        processor.LinkTo(consumer, new DataflowLinkOptions { PropagateCompletion = true });

        // add each spec to the pipeline
        foreach (var spec in Enumerable.Range(0, 10))
        {
            producer.Post(spec);
        }

        // done adding to queue
        producer.Complete();

        await Task.WhenAll(producer.Completion, processor.Completion, consumer.Completion).ConfigureAwait(false);

        foreach (var result in results)
        {
            Console.WriteLine(result);
        }

        Console.WriteLine("Done!");

        return 0;
    }


}

}
```

Changing the thrown exception to say InvalidOperationException fixes the problem and an exception is thrown when we await the dataflow completion.

area-System.Threading.Tasks question

Most helpful comment

Makes total sense from a normal async perspective, but in this particular case it was a pretty big surprise that HttpClient deciding to cancel a request due to network issues would result in the silent swallowing of said network error, as well as silently skipping the processing of part of my data batch.

Should HttpClient be throwing TaskCanceledExceptions from within its inner workings, if cancellation is being used as async flow control like this? I did pass it my own cancellation token in this particular case, but my cancellation token was NOT marked as canceled which was strange.

All 12 comments

If an operation executing in the context of a TransformBlock throws a TaskCanceledException - specifically that exception - it does NOT fault the dataflow pipeline and silently continues execution.

This is by design.

It's not just TaskCanceledException, but rather any OperationCanceledException, as that's how async methods work. The lambda you're passing to the block is async. Exceptions that go unhandled within an async method get captured by the async method and stored into the resulting Task, but OperationCanceledException and any derived exception type (like TaskCanceledException) transition the Task to be Canceled rather than Faulted. As with any other canceled Task returned to the TransformBlock, it then ignores the canceled item and continues processing:
https://github.com/dotnet/corefx/blob/940ecee40b64ab93c92f87a7e8909771c013f1a5/src/System.Threading.Tasks.Dataflow/src/Blocks/TransformBlock.cs#L295

Makes total sense from a normal async perspective, but in this particular case it was a pretty big surprise that HttpClient deciding to cancel a request due to network issues would result in the silent swallowing of said network error, as well as silently skipping the processing of part of my data batch.

Should HttpClient be throwing TaskCanceledExceptions from within its inner workings, if cancellation is being used as async flow control like this? I did pass it my own cancellation token in this particular case, but my cancellation token was NOT marked as canceled which was strange.

Is this behavior properly documented? I couldn't find it by looking for mentions of OperationCanceledException in the documentation. (OCE is only mentioned in the context of cancelling the whole block.) If not, should it be? (Though that would be an issue for dotnet/docs, not this repo.)

should HttpClient be throwing TaskCancelledExceptions?

Yes.

OperationCanceledException (or things derived from it) is what's thrown when something is canceled. There's no special meaning to TaskCanceledException, other than it is an OperationCanceledException. HttpClient throwing OperationCanceledException when something is canceled is exactly what it should be doing, and is no different from anything else throwing OperationCanceledException when something is canceled.

This isn't about HttpClient doing something special: it's about async methods treating OperationCanceledException specially. This is all just dominos. Tasks have three final states: RanToCompletion, Faulted, and Canceled. So when async methods were designed, we collectively decided that OperationCanceledExceptions should translate into a Task entering the Canceled rather than Faulted state. If someone just awaits such a Canceled task, the difference isn't observable: the same exception is propagated just as if the Task had been Faulted. But you can manually look at the Task and see its Status as Canceled and choose to behave differently, e.g. Canceled Tasks don't cause TaskScheduler.UnobservedTaskException to be raised.

If you don't like TransformBlock's design of special-casing Canceled Tasks, that's fine; I've zero doubt in my mind that had we made it treat Canceled as if it were Faulted, others would be concerned with that. Regardless, the design is what it is, and I don't see a lot of benefit in trying to change it now. As @kamsar suggested, if you don't want OperationCanceledExceptions to be ignored, you can catch them and throw something else, or otherwise catch and process them however you like.

Is this behavior properly documented?

I'm sure there are opportunities for improvement. PRs to improve the docs in the docs repo are very welcome. Thanks!

Thanks @stephentoub for the description and the thinking around the design. I think what is confusing to @kamsar and myself is: when should a library (e.g. HttpClient) throw an OperationCanceledException given that consumers (e.g. DataFlow) treat those exceptions differently.

My intuition would be that a callee should only throw a canceled exception iff I mark a cancelation token I provided it as cancelled. That, is a task ought not cancel itself.

What are your thoughts for best practices for library authors like myself?

What are your thoughts for best practices for library authors like myself?

Throw OperationCanceledException (or a derived type) to represent cancellation.

What a consumer chooses to then do with that exception is a separate concern.

If you start instead throwing XyzException to represent cancellation, a consuming library that cares to special-case cancellation either a) will misinterpret your exception and break, or b) will see "hmm, Xyz now also means cancellation, I better special-case that as well", in which case you haven't achieved anything other than introduce yet another exception to mean the same thing.

I'm not suggesting coming up with my own cancellation types 馃榿 I'm trying to understand what is the guidance for what events/circumstances should "represent cancellation".

Take a hypothetical API:
'''
Task '''
Which of these, if any, should be considered "cancellation":

  1. The source behind the token is cancelled.
  2. The path is on the network and the remote server loses power.
  3. The path is on the network and there is a network time out.
  4. The path is on the network and the local admin kills the network session.
  5. The read is to a local disk and the local disk falls catastrophically.
  6. The read is to a local disk, but the local admin dismounts the volume.
  7. A filter driver blocks the action because the file is malware.
  8. A filter driver blocks the action because the path is to an application that is not approved for use on this locked-down machine.
  9. ?
  1. Yes
  2. No
  3. It depends what you mean by timeout, e.g. if the timeout was specified by the user as a way to initiate cancellation after some period of time.
  4. No
  5. No
  6. No
  7. No
  8. No

That makes total sense @stephentoub, and it's how I understood cancellation. I looked into the HttpClient source a bit and this is what I think specifically happened to me:

  • I overloaded the server with parallel requests, which caused the client to timeout (I was not specifying a timeout, but the default is 100 seconds)
  • Internally the timeout is handled by creating a CancellationTokenSource _inside_ the HTTP client, which can cause cancellation independently from any CancellationToken that is passed into the HttpClient request - API-initiated cancellation, instead of user-initiated cancellation.
  • When the timeout throws its exception, that caused the I/O exception that I saw via IntelliTrace when I was debugging my issues. That exception is caught here and explicitly rethrown as a cancellation

If my analysis is correct:

  • Per 3 above, a user-specified timeout can be a cancellation. In this case there is a _default_ timeout in HttpClient that can cause cancellation. Which makes sense from a "don't infinitely hang your app" perspective, but it's a bit of a footgun here.
  • Because of this design, it is specifically important when combining Dataflow with HttpClient - or anything else using timed cancellation tokens - to catch cancellations from your HTTP requests and rethrow them as some other exception type if the dataflow should abort on task timeout.

It does seem to me that there's value in adding the option to dataflow to treat cancellation as an abort, instead of an ignore signal. But there is a workaround, anyway.

Which makes sense from a "don't infinitely hang your app" perspective, but it's a bit of a footgun here

Yup. https://github.com/dotnet/corefx/issues/20296 is tracking concerns around that design and whether there's anything we can do to improve it.

Makes sense to me, the underlying issue is more with HttpClient than Dataflow. I think we've explored this issue sufficiently to close it :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

noahfalk picture noahfalk  路  3Comments

btecu picture btecu  路  3Comments

Timovzl picture Timovzl  路  3Comments

EgorBo picture EgorBo  路  3Comments

jzabroski picture jzabroski  路  3Comments