Roslyn: Proposal: language support for async sequences

Created on 5 Feb 2015 · 279Comments · Source: dotnet/roslyn

Both C# and VB have support for iterator methods and for async methods, but no support for a method that is both an iterator and async. Its return type would presumably be something like IObservable<T> or IAsyncEnumerable<T> (which would be like like IEnumerable but with a MoveNext returning Task<bool>).

This issue is a placeholder for a feature that needs design work.

2 - Ready Area-Language Design Feature Request Language-C# Language-VB

Source

gafter

👍36

Most helpful comment

interface IAsyncEnumerator

Changing topics to another part of the proposal...

The main approach we've all been assuming for the interface is:

``` C#
public interface IAsyncEnumerator
{
Task MoveNextAsync();
T Current { get; }
}

With that, a loop like:

``` C#
foreach (await T item in enumerable)
{
    …
}

would compile down to something like:

``` C#
var e = enumerable.GetEnumerator();
while (await e.MoveNextAsync())
{
T item = e.Current;
…
}

I think we should consider (at least discuss) an alternative.

The existing `IEnumerator<T>` has a well-known design concern: it requires two interface calls per element, one for MoveNext and one for Current.  There are various ways to address that for an `IAsyncEnumerator<T>`, one of which would be along the lines of this alternative:

``` C#
public interface IAsyncEnumerator<T>
{
    Task<bool> WaitForNextAsync();
    bool TryGetNext(out T item);
}

which with the aforementioned foreach loop would compile down to something like:

``` C#
var e = enumerable.GetEnumerator();
while (await e.WaitForNextAsync())
{
while (e.TryGetNext(out T item))
{
…
}
}

The idea is that WaitForNextAsync would return a synchronously completed task if data was already available (or if it knew there would never be more data) and otherwise would perform whatever operation was necessary to bring down the next piece of data, but it wouldn’t actually take that data from the enumerator; that would be done by TryGetNext, which would return data available if there is any, otherwise returning false.

One obvious advantage of this is that it addresses the two-interface-calls-per-iteration issue. Worst case, there are two interface calls per iteration, if each call to WaitForNextAsync only makes a single item available for TryGetNext, but best case, there’s only one interface call for each element.

However, the `IEnumerable<T>` two-call design has another downside: it’s not possible to create a thread-safe implementation.  Without locking external to the interface implementation, multiple consumers can’t consume the same enumerator, as the call to MoveNextAsync/Current can’t be made atomic.  That’s not the case with the alternate model.  You can implement a thread-safe WaitForNextAsync/TryGetNext pair, as the TryGetNext itself can use whatever synchronization it needs internally to both get the next item and tell you whether it was successful  Worst case, if the caller loses a race condition with another consumer, TryGetNext can return false and you can loop back around to try again.

Now, today an `IEnumerator<T>` isn’t thread-safe, and as such no one can use it as such. but I think there are more use cases for supporting thread-safety in the async world.  These async enumerators are likely to be used in some cases for producer/consumer models, and once you get there, it’s not far before you want to allow multiple consumers off of the same data stream, parallelized processing of results, etc.  It would be possible to enable such multi-consumer scenarios off of an enumerable, where the enumerable coordinates safety over a single underlying stream, handing out single-consumer enumerators that all coordinate with each other.  But that violates the notion of how we’ve been talking about enumerables, as something where another call to GetEnumerator essentially restarts the operation.

There's also another use case for this model. Imagine you wanted a construct like:

``` C#
IAsyncEnumerator<Student> students = …;
IAsyncEnumerator<Teacher> teachers = …;
select
{
    case students as Student s:
        HandleStudent(s);
        break;
    case teachers as Teacher t:
        HandleTeacher(t);
        break;
}

The idea here is that we have two async iterators, and we take one item from one of them and process it, whichever arrives first, kind of like a WhenAny. This can’t be done correctly on top of the MoveNextAsync/Current pattern: you would have to call MoveNextAsync on both iterators, at which point you’ve already moved the other one forward (and it may still be in flight when you exit the select). In contrast, because the act of saying “try to ensure more data is available synchronously” is separate from the act of saying “get me synchronously available data”, this can be done with the alternate model (though, full disclosure, it would likely require the WaitForNextAsync implementation to be thread-safe to be used in this manner, or at least tolerate another call to WaitForNextAsync while a previous one is still in flight).

(These reasons are part of why the channels prototype I did on corefxlab essentially has these two methods as the main part of the readable channel interface:
https://github.com/dotnet/corefxlab/blob/master/src/System.Threading.Tasks.Channels/System/Threading/Tasks/Channels/IChannel.cs#L32-L45)

Anyway, food for thought.

stephentoub on 8 Sep 2016

👍11 ❤1

All 279 comments

I'd personally would love to see support for both IObservable<T> and IAsyncEnumerable<T>. The former interface at least already exists in the BCL and there is fantastic support for it in Rx. The latter interface has already been defined in the sister-project Ix (and under System.Collections.Generic no less) so would this language feature involve taking a dependency on that project or duplicating that interface in the BCL?

Ultimately switching between the two would be pretty easy (and already supported in Rx/Ix), but from a sequence producer point of view they would behave a little differently in that yield return for IObservable<T> would likely continue executing immediately whereas for IAsyncEnumerable<T> it would wait until the next invocation of MoveNext().

Also, if considering support for IObservable<T> you might want to consider requiring that the generator method accept a CancellationToken which would indicate when the subscriber has unsubscribed.

From the consumer point of view they should probably behave the same way. Observable.ForEach allows the action to execute concurrently and I think that it would probably be pretty unintuitive to allow the foreach body to have multiple concurrent threads (assuming that they're not being dispatched through a SynchronizationContext). If the implementation is similar to how await works whatever intermediary (SequenceAwaiter, etc.) could handle the details of buffering the results from an IObservable<T> or an extension method could just turn it into an IAsyncEnumerable<T>.

HaloFour on 5 Feb 2015

@HaloFour Observable.Create already provides an optimal implementation of this that language extensions wouldn't add any value to.

IAsyncEnumerable, however, has no optimal way to generate a sequence other than implementing the interface manually. It's fairly easy to make something that emulates yield return but it is super inefficient so this is badly needed.

scalablecory on 5 Feb 2015

I don't disagree. Rx is awesome like that. I advocate for it mostly to bring Rx closer to the BCL so that people are more aware that it exists, and also because those core interfaces are at least a part of the BCL whereas IAsyncEnumerable<T> is brand new to the BCL (and duplicates Ix).

HaloFour on 5 Feb 2015

👍2

I'm not familiar with Ix, so I can't comment on any existing IAsyncEnumerable, but I would rather the team start fresh when thinking about async enumerables rather than try to build off IObservable. Rx was an interesting project, but it was designed mostly before async existed and then later tried to bolt the two concepts together with varying success. Present-day Rx also has a very cluttered API surface area with poor documentation all around.

async/await enables code that looks almost identical to synchronous code- I'd like to be able to work with asynchronous sequences as effortlessly as you can work with IEnumerable today. I've definitely wanted to mix yield return and async/await before so this is a feature that would be very helpful.

MgSam on 5 Feb 2015

👍4

Indeed, there is a lot of duplication between the two because they were developed independently and Rx never had the resources that BCL/PFX had. I also don't think that Rx/Ix could be merged into the BCL as is.

The Ix IAsyncEnumerable<T> interface is exactly as described here, basically identical to IEnumerable<T> except that MoveNext() returns Task<bool>. As mentioned the big difference between something like IObservable<T> and IAsyncEnumerable<T> is that the latter is still a pull-model as the generator really couldn't continue until the consumer called MoveNext() again. In my opinion this would make it less suitable for certain concurrent processing scenarios since the producer code isn't running between each iteration. An IObservable<T> async iterator could continue executing immediately after yielding a value.

In my opinion supporting both would be worthwhile. The compiler could generate different state machines depending on the return type of the async iterator.

HaloFour on 5 Feb 2015

I've been wishing for this feature ever since C# 5 came out. Being able to write something like yield return await FooAsync() would be very useful; currently when I have an async method that returns a collection, I just return a Task<IReadOnlyCollection<T>>, because implementing lazyness has too much overhead.

I noticed that Roslyn already has an IAsyncEnumerable<T> interface here. That's pretty much the design I had in mind, although I had forgotten about cancellation. To make it really useful, we would also need an async version of foreach (including a way to pass a CancellationToken to MoveNextAsync).

thomaslevesque on 7 Feb 2015

@thomaslevesque, the Roslyn link is 404.

paulomorgado on 9 Mar 2015

@thomaslevesque, the Roslyn link is 404.

Uh... looks like it's no longer there. A search for IAsyncEnumerable returns nothing (the name only appears in a comment). Perhaps it was moved and renamed to something else, or it was just removed.

thomaslevesque on 9 Mar 2015

Entity Framework uses the IAsyncEnumerable pattern to enable async database queries. In EF6 we had our own version of the interface, but in EF7 we have taken a dependency on IX-Async.

anpete on 21 Apr 2015

@anpete Seems to me that if async streams depends specifically on a new BCL IAsyncEnumerable<T> interface that not only will it not be a very usable feature until more projects more to the newer frameworks but there will also be a lot of confusion between the different-yet-identical interfaces that already exist.

Perhaps the compiler could support the different interfaces by convention, or have an easy way to unify them through a common utility extension method. But if, for whatever reason, they need to be converted back to their proper specific interface that would still pose problems.

I believe quite strongly that not at least trying to integrate the BCL and the Roslyn languages with Rx/Ix is a massive wasted opportunity.

HaloFour on 21 Apr 2015

Just to provide some additional background, this can already be done in F# (because F# "computation expressions", which is a mechanism behind both iterators and asyncs is flexible enough). So, the C# design might learn something useful from the F# approach to this. See:

Programming with F# asynchronous sequences - a short overview article
The F# Computation Expression Zoo - paper that discusses the syntax and underlying computation expressions from more theoretical perspective
F# Async: FSharp.Control.AsyncSeq - library with F# async sequences

Probably the most interesting consideration here is what is the programming model:

F# async sequences are pull-based. You ask for a next item and then a callback is called (async) when the value is available. Then you ask for a next item, until the end.
Rx is push-based. You ask it to start, and then it keeps throwing values at you.

You can convert between the two, but going from Rx to AsyncSeq is tricky (you can either drop values when the caller is not accepting them, or cache values and produce them later).

The thing that makes AsyncSeq nicer from sequential programming perspective (i.e. when you write statement-based method) is that it works well with things like for loops. Consider:

asyncSeq { 
  for x in someAsyncSeqSource do
    do! Async.Sleep(1000)
    processValue x }

Here, we wait 1 second before consuming the next value from someAsyncSeqSource. This works nicely with the pull-mode (we just ask for the next value after 1 second waiting), but it would be really odd to do this based on Rx (are you going to start the loop body multiple times in parallel? or cache? or drop values?)

So, I think that if C# gets something like asynchronous sequences (mixing iterators and await), the pull-based design that is used by F# asyncSeq is a lot more sensible. Rx works much better when you use it through LINQ-style queries.

EDIT: (partly in reply to @HaloFour's comment below) - I think that it makes sense to support the async iterator syntax for IAsyncEnumerable<T> (*), but not for IObservable<T>, because you would end up with very odd behavior of foreach containing await on Task<T>.

(*) As a side-note, I find IAsyncEnumerable<T> quite odd because it lets you call MoveNext repeatedly without waiting for the completion of the first - this is probably never desirable (and AsyncSeq<T> in F# does not make that possible).

tpetricek on 22 Apr 2015

@tpetricek The difference in behavior between IAsyncEnumerable<T> and IObservable<T> is exactly why I think async iterators should support both, it gives the programmer the capacity to decide whether it's a push- or pull-model and abstracts the difference to the consumer. I think a lot of scenarios benefit from a push-model, such as launching a bunch of operations simultaneously and wanting to process the results as they are available.

Beyond that hopefully both interfaces will enjoy support of all of the common LINQ operators plus those operators that apply to asynchronous streams.

HaloFour on 22 Apr 2015

@tpetricek - The FSharp.Control.AsyncSeq documentation has been clarified to use the terminology "asynchronous pull", rather than just "pull", i.e. a pull operation that returns asynchronously, Async<T>. I'll leave it to others to debate what exactly the difference is between an "asynchronous pull" and a "synchronous push" :)

dsyme on 23 Apr 2015

It would be nice if the reading from async sequence had constant stack usage and simple associative operations like concatenation had decent performance no matter whether left- or right-associated. Eg. reading from IEnumerables constructed by following functions

        static IEnumerable<int> LeftAssocEnum(int i)
        {
            var acc = Enumerable.Empty<int>();
            while (i > 0)
            {
                acc = Enumerable.Concat(acc, new int[] { i });
                i--;
            }
            return acc;
        }

        static IEnumerable<int> RightAssocEnum(int i)
        {
            var acc = Enumerable.Empty<int>();
            while (i > 0)
            {
                acc = Enumerable.Concat(new int[] { i }, acc);
                i--;
            }
            return acc;
        }

causes StackOverflowException for sufficiently large i and both IEnumerables have quadratic complexity.

radekm on 25 Apr 2015

@radekm For your kind of usage (sequence is materialized, size is known in advance) you can already use List<int>.

    static IEnumerable<int> LeftAssocEnum(int i)
    {
        var acc = new List<int>(i);
        while (i > 0)
        {
            acc.Add(i);
            i--;
        }
        return acc;
    }

Does your request mean that all possible implementations of IEnumerable<T> (including immutable and lazy ones) should behave like List<T>?

vladd on 25 Apr 2015

@vladd It was only a simple example, you can take for instance Fib()

        static IEnumerable<BigInteger> Fib()
        {
            return Fib(BigInteger.Zero, BigInteger.One);
        }

        static IEnumerable<BigInteger> Fib(BigInteger a, BigInteger b)
        {
            yield return a;
            foreach (var x in Fib(b, a + b))
            {
                yield return x;
            }
        }

which has the same problems. What I want is to compose complex asynchronous sequences from very simple and reusable parts. To do this the operators like concatenation must be efficient. Since I don't know how to do this in C# I'll give a few examples in Scala with scalaz-stream.

1) Recursion can be used to define streams:

def fib(a: BigInt = 0, b: BigInt = 1): Process[Nothing, BigInt] =
    emit(a) ++ fib(b, a + b)

There is no risk of stack overflow and reading the first n items takes O(n) not O(n^2) (assuming that a + b is computed in constant time which is not true).
Note: fib(b, a + b) is passed by name so the above code terminates.

2) Even transformations of streams are easily composable:

process1.take[Int](5).filter(_ > 0) ++ process1.id

This applies the filter only to the first 5 integers of the stream. You can use it with operator |>

Process(1, 2, -3, -4, -5, -6, -7) |> (process1.take[Int](5).filter(_ > 0) ++ process1.id)

and it gives you 1, 2, -6, -7.

radekm on 25 Apr 2015

I think that it would be wise to have language parity with F# for supporting async pull sequences (e.g. IAsyncEnumerble<T>, AsyncSeq<'T>).
@tpetricek and @dsyme make very valid points here and the links are excellent and well worth reading as there appears to be confusion between when it is appropriate to use async pull vs IObservable<T>.

That leads me on to making some comments about Rx and why I _dont_ think it needs any language support (right now).

IObservable<T> is in the BCL. Fine, so people know about it.
Being a library, it can have a faster release cadence than the language. This has been particularly positive for the adoption and improvement of the library. As we speak Rx 3.0 is in development, and it may have breaking changes. Let's not mix libraries with languages. You can also see this now happening at the framework level.
Yup we need better doc's and education. I tried my best to do my part IntroToRx.com

@thomaslevesque says "I've been wishing for this feature ever since C# 5 came out.".
It seems to me that his example is a great candidate for Rx (async, lazy and support for cancellation).

@HaloFour "Observable.ForEach" _shudder_.
Please don't use this method.
It needs to be removed.
It has no cancellation support, nor does it have any error handling/OnError

LeeCampbell on 5 May 2015

@LeeCampbell I'd largely be happy if the C# team did the same thing they did with await and provided a pattern that could be used to describe anything as an asynchronous stream. Then Rx could easily support that pattern, probably through an extension method that would describe the correct back-pressure behavior.

I think that there is a massive amount of information for Rx out there, but if nobody knows to look it might as well not exist. I think that it needs the same kind of campaign from MS that LINQ and asynchrony received. _Some_ kind of inclusion in the languages pushes that point. I've been doing a lot of Java dev lately and it annoys me how much excitement there seems to be building around Rx that I don't see on the .NET side.

HaloFour on 5 May 2015

I am interested to see how you would see this work. I think the way you work with and AsyncEnum and the way you work with an IObservable sequence are quite different. The former you poll and pull from until complete and then you move on to the next statement.

IAsyncEnumerable<int> sequence = CreateAsynEnumSeq();
Output.WriteLine("Awaiting");
await sequence.ForEachAsync(Output.WriteLine);
Output.WriteLine("Done");

The later you set up a subscription providing call backs and then move on immediately. The callbacks for an Observable sequence are called at some future point in time.

IObservable<int> sequence = CreateObservableSeq();
Output.WriteLine("Subscribing");
sequence.Subscribe(Output.WriteLine, ()=>Output.WriteLine("Done"));
Output.WriteLine("Only Subscribed to, but not necessarily done.");

With this in mind, they (to me at least) are totally different things, so I am not sure why or how language support would help here. Would like to see a sample of your ideas. I can see some usefulness for language support of AsynEnum sequences, again, at least to get language parity with F#

LeeCampbell on 7 May 2015

@LeeCampbell

To give you an idea, I already currently have an extension method for IObservable<T> called GetAsyncEnumerator which returns my own IAsyncEnumerator<T> implementation:

public IObservable<int> Range(int start, int count, int delay) {
    return Observable.Create(async observer => {
        for (int i = 0; i < count; i++) {
            await Task.Delay(delay);
            observer.OnNext(i + start);
        }
    });
}

public async Task TestRx() {
    Random random = new Random();
    IObservable<int> observable = Range(0, 20, 1000);
    using (IAsyncEnumerator<int> enumerator = observable.GetAsyncEnumerator()) {
        while (await enumerator.MoveNextAsync()) {
            Console.WriteLine(enumerator.Current);
            await Task.Delay(random.Next(0, 2000));
        }
    }
}

There are several overloads to GetAsyncEnumerator depending on if/how you want to buffer the observed values. By default it creates an unbounded ConcurrentQueue into which the observed values are collected and MoveNextAsync polls for the next value available in that queue. The other three options are to use a bounded queue, to have IObservable<T>.OnNext block until there is a corresponding call to MoveNextAsync or to have IObservable<T>.OnNext not block but have MoveNextAsync return the latest available value, if there is one. There are also overloads that accept a CancellationToken, of course, and IAsyncEnumerator<T>.Dispose unsubscribes the observer.

I hope that kind of answers your question. It's early and I didn't get much sleep last night. Basically, I am treating the IObservable<T> as an IAsyncEnumerable<T> and bridging between the two isn't all that difficult. The big difference is that the observable can continue to emit values and not have to wait for someone to poll.

HaloFour on 7 May 2015

Guys who are interested in IObservable support -- can you describe the benefit integrating this into the language would bring?

scalablecory on 7 May 2015

@scalablecory

The interface is already in the BCL and has been since .NET 3.5.
Rx already provides an amazing degree of support for working with IObservable<T> and marries the existing language concepts of asynchrony with LINQ beautifully.
I think that "push" model asynchronous enumeration is very useful for a lot of scenarios, specifically when you need to dispatch a series of asynchronous requests at once and then process them as the results become available. This is currently pretty obnoxious to handle with async methods alone.
Volta needs MS love.

To Devil's Advocate my own arguments:

Probably moot as we'd likely need a streaming analog to GetAwaiter so we're stuck waiting on a BCL change anyway.
Someone's bound to write all of the same LINQ methods to work against IAsyncEnumerable<T>, despite being a pretty massive duplication of effort. Rx already has, it would be silly to do it _again_.
I'm sure that IAsyncEnumerable<T> can wrap a "push" notification source. I'm already doing it.
Java clearly loves Volta more. :wink:

Now, given the _probability_ of Devil's Advocate point 1, some streaming analog to GetAwaiter, support for IObservable<T> from the consuming side could be accomplished by extension methods within Rx, and I'd be perfectly happy with that.

Now, for my arguments from the generating side, I'd like to revisit my use case of dispatching a bunch of asynchronous operations. This is something that the current project I work on does incredibly frequently, basically _n+1_ operations against web services where the first response provides a bunch of IDs that then need to be resolved individually*. If async streams return IAsyncEnumerable<T> where the coroutine isn't continued until the consumer asks for the next value then you don't really have the facility to perform the operations in parallel.

public async IAsyncEnumerable<User> QueryUsers(int organizationId, CancellationToken ct) {
    Organization organization = await ws.GetOrganization(organizationId, ct);
    foreach (int userId in organization.UserIds) {
        User user = await ws.GetUser(userId);
        yield return user; // can't continue until consumer calls IAsyncEnumerator.MoveNext
    }
}

Granted, there could be BCL methods to make this a little easier, but it feels like something that can be supported out of the box:

public async IObservable<User> QueryUsers(int organizationId, CancellationToken ct) {
    Organization organization = await ws.GetOrganization(organizationId, ct);
    foreach (int userId in organization.UserIds) {
        User user = await ws.GetUser(userId);
        yield return user; // Effectively the same as calling IObserver.OnNext(user)
    }
}

HaloFour on 7 May 2015

@HaloFour Just like you can currently decide whether to process IEnumerable<T> in series (foreach) or in parallel (Parallel.ForEach()), there could be a similar distinction for IAsyncEnumerable<T>; you don't need IObservable<T> for that.

The problem I have with IObservable<T> is that it's pretty much impossible to process it in series without either blocking the producer or using some kind of buffer.

svick on 7 May 2015

@HaloFour Let me rephrase my question.

Putting aside the "push" vs "pull" or "Rx" vs "IAsyncEnumerable" debate, for this proposal to gain weight, it needs to show solid benefits for _language integration_. These benefits have not yet been shown for the Rx side of things.

My two cents is that language integration wouldn't provide a significant benefit over Observable.Create. Its sole purpose would be to provide feature parity with "yield return", which I don't think is a good reason. Bringing popularity to Rx is also not a good reason.

IAsyncEnumerable, on the other hand, has no "Create" method. You can make one, but it's horribly inefficient. It is however possible to implement efficienctly in straight IL, so there'd be a huge benefit to language integration and having the compiler generate the complex state machine around it.

If the argument devolves into simply that one of push or pull is better than the other, and thus we should forget about the inferior one and not bother integrating with it, I think that's pretty short sited. Both models do things that the other is simply unable to _efficiently_ accomplish.

(Also, Ix-Async already implements IAsyncEnumerable with all that good LINQ integration if you want to check it out)

scalablecory on 7 May 2015

Well put @scalablecory.

LeeCampbell on 7 May 2015

@scalablecory

Sorry if it comes across that I'm having a debate, or that it's a either/or proposition. I don't think that "push" is better than "pull" or vice versa, only that there are use cases for both and that it would be nice to support multiple forms of "streams" within async methods. I think that achieving feature parity with F# would also be a very good thing.

My two cents is that language integration wouldn't provide a significant benefit over Observable.Create.

You're right. I think that it would be _nice_ to have yield support for IObservable<T>, but Observable.Create is already very easy to use.

IAsyncEnumerable, on the other hand, has no "Create" method.

Sure it does, Ix-Async offers AsyncEnumerable.Create as well as a few other helper methods. Plus you can already convert between IObservable<T> and IAsyncEnumerable<T>. Of course if we're talking about a _third_ Microsoft-provided IAsyncEnumerable<T> then no, nobody has written those methods yet.

What I'm really interested in hearing is any preliminary ideas regarding how these streams would be consumed. What a hypothetical foreach would look like with async streams. If it's based on a fairly-loosely defined pattern like GetAwaiter then support for IAsyncEnumerable<T> and IObservable<T> should both be quite easy to provide without the compiler having to know about either or any of those interfaces. To me, that's the ideal solution.

HaloFour on 7 May 2015

@HaloFour I believe you're mistaking EnumerableEx.Create for AsyncEnumerable.Create. There is nothing in Ix that provides "yield return" semantics for async sequence creation.

scalablecory on 7 May 2015

@scalablecory You're right, I am thinking of EnumerableEx. Doesn't seem like too far of a stretch to get an AsyncEnumerable.Create to work with pretty much the same syntax, though:

IAsyncEnumerable<int> ae = AsyncEnumerable.Create(async yield => {
    for (int i = 0; i < 10; i++) {
        await Task.Delay(1000);
        await yield.Return(i);
    }
});

HaloFour on 7 May 2015

I created an implementation of Async Pulll Sequences a while back, with LINQ support.

The syntax for an AsyncEnumerable was quite similar to the proposed one:

   AsyncEnumerable.Create<T>(async producer =>
   {
          await producer.Yield(value).ConfigureAwait(false);
   });

BernhardGlueck on 16 May 2015

As a proof of concept (and also to get some insight into Roslyn), I started implementing async iterators here. On this branch, the compiler is able to compile this file (called AsyncIterators.cs)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;

namespace AsyncIterators
{
    public class EntryPoint
    {
        public static void Main(string[] args)
        {
            var enumerator = new EntryPoint()
                .AsyncIterator()
                .GetEnumerator();

            while (enumerator.MoveNext().Result)
            {
                Console.WriteLine(enumerator.Current);
            }

            Console.ReadLine();
        }

        public IAsyncEnumerable<int> AsyncIterator()
        {
            for (var i = 0; i < int.MaxValue; i++)
            {
                await Task.Delay(500);
                yield return i;
            }
        }
    }
}

Call the Roslyn compiler like this:

csc "Your\Path\To\AsyncIterators.cs" /reference:"Your\Path\To\System.Interactive.Async.dll"

Get System.Interactive.Async.dll through the corresponding Nuget package.

The compiled .exe should output an incrementing number twice a second (as seen in the code).

The modification was pretty straightforward, there are two rewritings involved: First, the iterator is rewritten into a state machine (just like for synchronous iterators), but instead of IEnumerable and IEnumerator, IAsyncEnumerable and IAsyncEnumerator are implemented. The implementation of the async MoveNext method (that returns a Task<bool>) is subsequently rewritten into a state machine (just like for async methods). The resulting IL will therefore have two nested state machines.

I also tested enlosing the body of AsyncIterator by using/try-finally blocks, this also works.

Note that is is essentially only a proof of concept: In this commit, I point out the first language design issue that occured to me. Also, it is unclear how code in async iterators should access the CancellationToken that is passed to MoveNext.

Let me hear what you think.

danielcweber on 27 May 2015

@danielcweber Great job!

scalablecory on 27 May 2015

@danielcweber

Awesome job. I like how you reused Ix's IAsyncEnumerable. I wonder if Roslyn would eventually also support that interface as well as a BCL-provided one.

Also, it is unclear how code in async iterators should access the CancellationToken that is passed to MoveNext.

I had been wondering that myself. One potential solution might be to have the iterator method accept (require?) a CancellationToken argument and the generated state machine would effectively "merge" the two tokens together into one per every invocation of MoveNext? That might be too much voodoo and it doesn't help if the iterator doesn't accept a CancellationToken argument.

HaloFour on 27 May 2015

I'd almost like a syntax sort of like this:

IAsyncEnumerable<int> GetSequence() with token
{
    if(token.IsCancellationRequested)

But that may be too extreme of a change...

scalablecory on 27 May 2015

@HaloFour what's the BCL-provided interface for async iterators?
@scalablecory I also thought about having a variable (eg. named "ct") implicitly in scope, like "this" is also implicitly in scope. Of course, you may not have a parameter with that name but I guess that would be tolerable.

danielcweber on 27 May 2015

@danielcweber

There isn't one. I am under the assumption that the feature would depend upon such an interface being added to the BCL as to not require a project to take a hard dependency on Ix-Async. Even if the feature is convention-driven and would work with Ix-Async I would expect an identical-looking interface to be added to the BCL.

HaloFour on 27 May 2015

I believe the only truly systematic and consistent approach is to have the MoveNext operation on the AsyncIterator accept the Cancellation token.

That is, the natural and systematic translation to make any method M asynchronous is as follows: "any operation M generating result R translates to an async method M taking a cancellation token CT and giving result Task<R>".

Altering the scope and flow of cancellation tokens to be different to this tends to be like fiddling with assembly code and registers - things that look sensible come back to bite you later. It's possible passing the token to GetEnumerator will work but my guess is it will have problems in some cases.

Note that F# async avoids this problem by hiding cancellation tokens (they are implicitly threaded them through the asynchronous computation structure - you only supply a cancellation token when starting an overall composite Async). In F# code you generally don't have to pass cancellation tokens explicitly at all. That simplification was dropped in the C# version of the feature. Anyway, we should get Tomas Petricek's to add this to his summary of differences between the two models.

Cheers
Don

dsyme on 27 May 2015

I believe the only truly systematic and consistent approach is to have the MoveNext operation on the AsyncIterator accept the Cancellation token.

Of course, and that is to be expected. The question is how to expose that token in the iterator.

That is, the natural and systematic translation to make any method M asynchronous is as follows: "any operation M generating result R translates to an async method M taking a cancellation token CT and giving result Task".

But these sequences themselves also represent an asynchronous operation which can potentially be cancelled as a whole, such as a single HTTP call which is returning a stream of data asynchronously which is parsed into a sequence of elements. You'd need a reliable mechanism to convey cancellation to that process which may need to be separate from cancelling the current iteration. This gets even hairier if you want to distinguish cancelling the attempt to move to the next element from cancelling the sequence.

Altering the scope and flow of cancellation tokens to be different to this tends to be like fiddling with assembly code and registers

Isn't that describing the coroutine shenanigans done with iterator and asynchronous state machines in general? :smile:

I think that ultimately there are probably two options. Provide a keyword that will access the current cancellation token, or add the concept of a current thread-local cancellation token to the BCL.

HaloFour on 28 May 2015

@HaloFour, when would you need to cancel MoveNext without canceling whte whole iteration and vice versa?

paulomorgado on 28 May 2015

@paulomorgado Probably not. At best canceling the MoveNext operation would leave the sequence in an indeterminate state. But I do think that we'll want a simple cancellation mechanism that unifies the three that exist in the async sequence pattern:

CancellationToken passed to the async iterator function.
CancellationToken passed to the IAsyncEnumerator.MoveNext method.
The IDisposable interface implemented by IAsyncEnumerator.

I think that having the MoveNext method accept a CancellationToken makes things really hairy. There is no existing syntax that would allow for the iterator to accept that token. If that iterator is based on multiple operations already in flight even if you could obtain that token it couldn't really be used to affect those operations. Any existing foreach syntax lacks the notion of passing an argument to the MoveNext function. All of these problems would need to be addressed and I fear the syntax that would arise as the answer.

I'm thinking that maybe we keep it simple and that cancellation via CancellationToken is optional and only available if the async iterator explicitly accepts one as an argument. When that token is cancelled the entire sequence is then cancelled.

HaloFour on 28 May 2015

@HaloFour: By "the async iterator explicitly accepts one as an argument" you mean the method that returns IAsyncEnumerable<T> and contains yield return statements having a parameter of type CancellationToken? Or do you mean that the GetEnumerator method of IAsyncEnumerable should take a CancellationToken ?

danielcweber on 28 May 2015

We have to take in account that there are two sides to an enumerable: the producer and the consumer.

Although most developers will be consumers, ease of production will benefit all. So, language support for creating asynchronous enumerators will have it's own value.

We also have to take in account that some types might be both synchronous and asynchornous enumerables. I don't think the existing foreach keyword can be reused. At least. not alone.

Maybe the best compromise will be something like this:

foreach async(var item in collection, cancellationToken)
{
    ...
}

But it gets a lot more trickier if we want to use LINQ operatores. Should the pattern be augmented to pass around a cancellation token.

Or should we see how obervables/qbservables fit in all this?

paulomorgado on 28 May 2015

@danielcweber The method that uses yield return, etc. That's likely the only method the consumer will be calling directly and passing arbitrary arguments.

@paulomorgado Rescanning the thread it doesn't appear that any modifications to foreach to consume async sequences has been discussed yet. I do expect that some syntax changes would be required to make that happen and maybe there is room to fit in a CancellationToken as you describe. But you'd also need syntax in the producer to accept that syntax. That could involve syntax changes to yield return although that could only provide a value for the second iteration and onward.

Observables do this very differently. Once you subscribe you don't ask for the next values, they just come to you. Cancellation can be implied by unsubscribing, which is done by disposing of the subscription. Rx provides a helper class CancellationDisposable which can trigger a CancellationToken when disposed which can allow the producer to react to being unsubscribed.

HaloFour on 28 May 2015

@HaloFour The sequence is cold at the point where you call the generator method. This is not an appropriate place to pass a cancellation token.

GetEnumerator() could work, but it's not exactly in line with existing practice. Right now, just about everywhere in .NET, you pass in a token to the method that does the actual work, not to some encompassing factory instance.

MoveNext() is the most in line with existing practice -- both in the Ix implementation, as well as with streaming sources like DbDataReader and Entity Framework.

scalablecory on 28 May 2015

@scalablecory I know, it's just the only place where the consumer would normally explicitly pass any arguments and the only place where iterator methods can accept arguments. Anything beyond that is going to require some fun syntax candy for both the consumer and the producer.

The consumer is probably relatively simple. We'll probably end up with something similar to what @paulomorgado suggested.

For the producer, the only thing that really makes sense to me is a new context sensitive keyword or expression that would allow access to the CancellationToken parameter of MoveNext. However, where I grapple a little with this is how that might behave if the iterator method also accepted a CancellationToken parameter, or if the iterator method is a cold enumeration over a hot sequence.

HaloFour on 28 May 2015

As async streams are now being brainstormed in #5383, I took the occasion and rebased my proof of concept of async streams found here. It still works for this simple example (and probably for complexer ones, too).

danielcweber on 16 Oct 2015

Interesting idea.

weitzhandler on 21 Oct 2015

(I'm assuming discussion on async streams properly belongs here rather than on the language design meeting notes.)

5383 suggests that cancellation tokens may well be handled in the same way as await configuration, but it's not clear to me how feasible that is, at least if we want yield return support. I can see how an extension method can easily make an `IAsyncEnumerable<T>` which just doesn't flow the context, for any arbitrary `IAsyncEnumerable<T>`... but the code in the iterator block would need to get at the desired cancellation token.

It feels to me like the GetEnumerator method should be passed the cancellation token - because I _would_ expect a single token while iterating over the whole sequence.

One extra note: should IAsyncEnumerator<T> implement IAsyncDisposable as well? There was a brief mention in the language notes, but it didn't crop up in the example and I haven't seen it mentioned here. This may be quite complicated when it comes to cancellation, as you may need _two_ cancellation tokens - if an iteration operation times out, you still want to get a shot at disposing of the sequence, on the other hand you might want an "overall" timeout. Here be dragons, I suspect.

jskeet on 12 Nov 2015

Wouldn't cancellation tokens simply behave as they always have?

async IAsyncEnumerable<int> SlowNumbersAsync(int from, int to, CancellationToken token)
{
    for (var i = from; i <= to && !token.IsCancellationRequested; i++)
    {
        await Task.Delay(100, token);
        yield return i;
    }
}

foreach await (var item in SlowNumberAsync(100, 200, token))

A system for configuring arbitrary tokens would be great but I don't think it's necessarily specific to foreach await.

jcdickinson on 12 Nov 2015

Well, the question is whether the IAsyncEnumerable itself knows the cancellation token, or whether it's _each time you iterate_ that knows the cancellation token. Would it make sense to have an IAsyncEnumerable<string> which (lazily) fetched stock tickers from a web service, and which could be reused multiple times, with a different continuation token each time you iterate over it? Maybe, maybe not. The fact that there are three steps involved (creating the sequence, creating the iterator, and then calling MoveNext - multiple times - leads to lots of choices...

jskeet on 12 Nov 2015

Looking at it a few different ways here:

If we go the GetEnumerator() route, it breaks convention -- you're no longer passing the token to the method doing the work. But, it also presents the most efficient operation in that you won't have to create any proxy cancellation tokens.

If we go the MoveNext() route, it keeps convention, but will either be confusing to use or inefficient to implement. Consider my previous suggestion:

IAsyncEnumerable<int> GetSequence() with token
{
    if(token.IsCancellationRequested)

Here, the GetEnumerator() route is will ensure token never changes, as one would normally expect. A naive MoveNext(CancellationToken) sugar will change the meaning of token after every yield return -- efficient, but clearly confusing. A more anchored implementation will need to wrap it in a proxy to ensure token doesn't change:

CancellationTokenSource proxy;
MoveNext(CancellationToken token)
{
    using(token.Register(()=>proxy.Cancel()))
    {
        // user's code.
    }
}

Which is clearly not very efficient if we consider this overhead for every item.

scalablecory on 12 Nov 2015

Then there are the syntax issues with potentially dealing with tokens at those three places.

From the iterator side you'd obviously need a way to get at those tokens. I don't really see a great way to accomplish this without some wacky syntax or voodoo.

Probably the most voodoo-y route would be to have the generated iterator class automatically link the three potential CancellationTokens into a single token replacing the value of the token passed as a parameter to the iterator method (by overwriting the field in the enumerator during MoveNextAsync). I don't know what overhead that might entail, perhaps it would require decorating a CancellationToken parameter with an attribute.

From the consumer side I think the least messy option if you wanted to affect cancellation of an existing IAsyncEnumerable<T> (you're not calling the iterator method directly) would be a ConfigureCancellation method wrapping that instance of IAsyncEnumerable<T> passing the specified CancellationToken to GetEnumerator and then wrapping that IAsyncEnumerator<T> also passing the token to MoveNextAsync. That would remain compatible with a foreach await syntax. If the consumer wanted more control than that they could always just call GetEnumerator and/or MoveNextAsync themselves and pass arbitrary tokens to either.

HaloFour on 12 Nov 2015

foreach only works with enumerables and must call GetEnumerator() on it to get the enumerator.

But what if this was legal?

foreach(var i in Range(0,10).GetEnumerator()) WriteLine(i);

Then we could have this for non-cancellable async enumerators:

foreach(async var i in AsyncRange(0,10)) WriteLine(i);

And this for cancellable async enumerators:

foreach(async var i in AsyncRange(0,10).GetAsyncEnumerator(ct)) WriteLine(i);

For a cancellable MoveNextAsync one would have to write the whole code. But I expect these cases to be less frequent that having a unique cancellation token for the enumerator.

And nothing prevents the enumerable itself to from being cancellable:

AsyncRange(0,10, ct)

paulomorgado on 13 Nov 2015

@jskeet I understand your concern now and completely agree. An interesting interaction here is that IAsyncEnumerable.Dispose is actually an implicit cancellation token. How does that (if at all) interact with a CancellationToken explicitly provided to the method?

jcdickinson on 16 Nov 2015

@jcdickinson Goodness knows! But I think both of these features really need to be considered closely together. I can easily imagine that a solution to one may cause issues for the other.

jskeet on 16 Nov 2015

IMO any discussion here should also consider ValueTask, as this is perhaps the type of scenario that _might_ have bunches of "I have stuff without needing to actually be async right now" _and_ have distinct return values, making pre-cached completed tasks not viable.

mgravell on 4 Dec 2015

If IObservable<T> ever gets language support I think that would be great for events to be able to return an IObservable<(object Sender, EventArgs Args)>, just like F#. In defence of that, a language construct would be a lot better instead of a Subscribe with a bunch of lambda expressions IMO.

alrz on 13 Dec 2015

As for async foreach and such I'd like to suggest fork loops which act like fork-join model.

For example,

// waits in each iteration
async Task SaveAllAsync(Foo[] data) {
    foreach(var item in data) {
        await item.SaveAsync();
    }
}

// same as above
async Task SaveAllAsync(Foo[] data) {
    for(int i = 0; i < data.Length; ++i) {
        await data[i].SaveAsync();
    } 
}

// this won't wait and just run in parallel
async Task SaveAllAsync(Foo[] data) {
    fork(var foo in data) {
        await foo.SaveAsync();
    } // join
}

// same as above
async Task SaveAllAsync(Foo[] data) {
    fork(var i = 0; i < data.Length; ++i) {
        await data[i].SaveAsync();
    } // join
}

With async sequences, one might use yield return inside the loop to return a push-based collection,

async IObservable<Result> ProcessAllAsync(Foo[] data) {
    fork(var foo in data) {
        yield return await foo.ProcessAsync()
    }
}

In contrast, if you use a regular foreach it will return an IAsyncEnumerable,

async IAsyncEnumerable<Result> ProcessAllAsync(Foo[] data) {
    foreach(var foo in data) {
        yield return await foo.ProcessAsync()
    }
}

Also it should be able to be used to iterate both IObservable<T> and IAsyncEnumerable<T> collections.

alrz on 18 Jan 2016

@HaloFour I'd much rather see Parallel.* extended for TPL than add a fork keyword. Language integration gives us nothing here.

scalablecory on 18 Jan 2016

@scalablecory Wrong mention. :smile:

@alrz I like the idea. If async methods could return observable sequences I think it would be pretty useful. However, I also worry that it might be a little _too_ easy and tempt people into using it without understanding the ramifications of having multiple threads silently spawned within their iterative-looking code. All of a sudden all of the nuances of synchronization and thread-safety applies to locals (yes, I am aware that this is true today with closures, but that does involve a little more opt-in.) As a new keyword it could be possible that limitations would be imposed such that variables declared outside of the block could not be modified, but that would differ from any other semantics in C#.

HaloFour on 18 Jan 2016

@HaloFour "having multiple threads silently spawned within their iterative-looking code." There can be no thread actually, it depends on the implementation of the awaited method but the point is that it doesn't wait for each await in each iteration, you can think of it as a list of running tasks and a Task.WhenAll but a little nicer, it provides language support for processing tasks as they compete which is a common pattern, I guess.

As for synchronization, we can wait for all the tasks in parallel but process them in order, meaning that a completed task might be waiting for another iteration to get to an await or end of the loop block, this makes sense because all of them will join at the end of the loop. Similarly, it doesn't "move next" until it gets to an await. But not allowing modifications might be limiting and eventually not really sufficient, one might say in a lock block once again you _can_ modify variables and so on.

PS: On the second thought (after opening the issue actually) I just convinced that language support doesn't really buy much,

fork(var item in new[] { 2, 3, 1 }) {
    await Task.Delay(item*1000);
    Console.WriteLine(item);
}

// easy peasy

await Task.WhenAll(new[] { 2, 3, 1 }.Select(async item => {
    await Task.Delay(item*1000);
    Console.WriteLine(item);
}));

and also it would be worse because the loop hides a task and doesn't support cancellation etc.

alrz on 18 Jan 2016

@alrz If it's akin to Task.WhenAll then the default (and expected) implementation would be that more than one of those operations are executing concurrently, so there would definitely be threads involved. Without that there is absolutely no difference between fork and await foreach.

HaloFour on 18 Jan 2016

@HaloFour However, my example satisfies the first rule of fork, "it doesn't "move next" until it gets to an await" but not the second, "a completed task might be waiting for another iteration to get to an await or end of the loop" so synchronization is not guaranteed.

The problem with await foreach or await using (#114) is that await applies to an expression — the Task, which mustn't be ignored. with these constructs there is no reasonable way to have ConfigureAwait, CancellationToken, etc. And that's ok because these are not related to the language, await is not specific to Task, so tying foreach to these types is not a good idea IMO.

alrz on 18 Jan 2016

@alrz await expressions do not have to be Tasks, they can be anything that correctly follows the awaitable convention. await has no concept of cancellation tokens and ConfigureAwait just happens to be an instance method of Task. For example, await Task.Yield() is legal, but Task.Yield() returns a YieldAwaiter, not a Task, and there is no way to either cancel nor configure the return of that method.

Note that I do think that both cancellation and configuration are useful. I think that offering extension methods off of IAsyncEnumerable<T> should satisfy the requirement:

IAsyncEnumerable<int> sequence = GetSequenceAsync(1, 10, CancellationToken.None);

CancellationTokenSource cts = new CancellationTokenSource();
cts.CancelAfter(500);
foreach (var number in sequence.ConfigureAwait(false).WithCancellationToken(cts.Token)) {
    Console.WriteLine(number);
}

HaloFour on 18 Jan 2016

@HaloFour Yeah I know, exactly my point, anyway. As it turns out, Task.WhenAll is not safe when you have shared state, that's where fork can be helpful.

alrz on 18 Jan 2016

I think CoreFxLab's Channels are closely related to the subject.

omariom on 10 Feb 2016

@omariom Neat. I wonder how it differs from Rx and why there is an effort to seemingly duplicate that library. Can't be NIH, they invented it. @stephentoub ?

HaloFour on 10 Feb 2016

@HaloFour Rx is for push-based reactive streams. Channels are pull-based and more similar to TPL Dataflow blocks (though a bit lower on the abstraction stack and less opinionated)

i3arnon on 10 Feb 2016

I put together a proposal for async iterators here:
https://github.com/ljw1004/roslyn/blob/features/async-return/docs/specs/feature%20-%20async%20iterators.md

Allows to "async-foreach" over an async enumerator, as suggested by @paulomorgado
Puts the cancellation token into the call to GetEnumerator rather than MoveNextAsync
Uses IAsyncEnumerable.ConfigureAwait / IAsyncEnumerator.ConfigureAwait as suggested in LDM #5383
Consumer is completely pattern-based to support ValueTask if needed
Producer is completely pattern-based to allow an async iterator method to return IObservable<T> or IAsyncActionWithProgress<T> or IAsyncEnumerable<T> or IAsyncEnumerator<T>
A novel mechanism for getting hold of cancellation token from within the body of the async method. I've used async contextual keyword to refer to the current instance of an async method, similar to this and base. This mechanism is useful more generally for an async method or async iterator method to interact with its builder (equivalently, the object that it returns). Windows tasks like IAsyncAction can use it to handle IAsyncAction.Cancel() or IAsyncActionWithProgress.OnProgress(). I believe that IObservable would use it too, but need to learn more.

ljw1004 on 27 Apr 2016

👍4

I really like the new proposal. I think it handles the CancellationToken problem really nicely. Adding an example to the spec of how the async contextual keyword would work for your standard Task-returning async method would be useful as well. Does this mean there would no longer be a need for endless overloads in our async methods that accept CancellationToken?

I prefer the foreach (await var x in asyncStream) { ... } syntax, as it is the most consistent with how we await asynchronous tasks elsewhere currently. It feels weird to put the word async in there when what is happening is an await.

I think the iterator modifier section needs expansion, as it is in the title, but the text doesn't actually ever mention an iterator modifier. I strongly am in favor of adding an optional iterator keyword that can be added to the method signatures of normal iterators and async iterators. People who don't want it don't have to use it, but those of us that do can set style rules so that an analyzer and quick fix can make all the methods consistent.

It is much more useful to have the method signature document what kind of method it is, rather than having to scan through the entire body to guess whether its an iterator or not. All that being said, this is a minor point that is probably separable from the rest of the proposal given that it is contentious.

Overall, really looking forward to see a prototype of this feature!

MgSam on 27 Apr 2016

// EXAMPLE 7:
// async C f()
// {
//   var ct = async.CancellationToken;
//   await Task.Delay(1, ct);
//   yield 1;
// }
// expands to this:

I think async.CancellationToken could be problematic in that async could be a variable or type in scope inside an async method already. await.CancellationToken seems available though...

bbarry on 27 Apr 2016

@bbarry Thanks. Missed the comment there.

MgSam on 27 Apr 2016

Is this going to use the same shape of IEnumerator<T> i.e. MoveNextAsync and Current? Wouldn't it make sense to make it similar to what F# does for its AsyncSeq?

type IAsyncEnumerator<'T> =
    abstract MoveNext : unit -> Async<'T option>
    inherit IDisposable

alrz on 27 Apr 2016

@bbarry I suggested that the "async" contextual keyword is only a contextual keyword in async methods that return a tasklike other than void and Task and Task<T>. There are no such methods today. In methods where it's a contextual keyword, then by definition it will never refer to a variable or type in scope, and there won't be a back-compat break. But you're right to bring up await as another possibility. I guess a prototype could include both options to see which one people like.

@alrz I think Async<'T option> would be a bad idea for C# because (1) option doesn't exist, (2) it would be prohibitively expensive, causing a heap allocation for every element of the sequence. By contrast Task<bool> never requires a heap allocation for an already-completed task, since Task.FromResult(true) and Task.FromResult(false) are both pre-allocated static singletons in the BCL.

ljw1004 on 27 Apr 2016

@ljw1004 Why not make contextual async a possibility for existing Task-returning async methods as well? As I mentioned above, it is annoying code bloat to make CancellationToken overloads for every single async method. First class language support for supporting cancellation would be a huge improvement.

The backwards compat issue doesn't seem major. If you're foolish enough to name a type or local variable async then the compiler can report an error, and you'll fix the error when you upgrade. The benefit seems worth the cost, IMO.

MgSam on 27 Apr 2016

@MgSam could you spell precisely out how you see this mechanism being able to avoid CancellationToken overloads? I can't see it...

ljw1004 on 27 Apr 2016

@ljw1004 I'm talking about two interface invocations that is required for every element. However, it could be something like this:

interface IAsyncEnumerator<T> {
  Task<bool> TryGetNextAsync(out T value);
}

But since out is actually ref in CLR, the T cannot be defined as covariant.

interface IAsyncEnumerator<out T> {
  ITask<T?> GetNextAsync();
}

This one works as long as we could use ? on unconstrained generics (#9932) and we have a covariant ITask<T> interface.

Anyhow, since async has its own overheads, these micro optimizations probably aren't much important.

alrz on 27 Apr 2016

I'd imagine you'd need a slightly different calling convention.

``` C#
async Task Foo()
{
await Task.Delay(1000);
if(async.CancellationToken.IsCancellationRequested) return;
await Task.Delay(1000);
}

var ct = default(CancellationToken);
await Foo(), ct;
```

MgSam on 27 Apr 2016

@MgSam

What exactly do you propose that translate into? Have the compiler automatically generate overloads? Or automatically add optional parameters? Stuff the CancellationToken into a thread-local? All of those options sound pretty nasty. I just add a default CancellationToken parameter and not worry about overloads.

HaloFour on 27 Apr 2016

@arlz The interface used here is a "pattern" like how foreach works elsewhere. You wouldn't have to actually implement IAsyncEnumerable<T> in order to make it work (I think example 6 is confusing though).

It looks like this type would satisfy enough requirements to work inside a foreach:

class Foo
{
  ConfiguredTaskAwaitable<bool> MoveNextAsync() { ... }
  int Current {get { ... } }
}

used as:

async Task<int> SumAsync(Foo f)
{
  int result = 0;
  foreach(async var i in f)
  {
    result+=i;
  }
  return result;
}

compiling as if it was:

async Task<int> SumAsync(Foo f)
{
  int result = 0;
  {
    try
    {
      while(await f.MoveNextAsync())
      {
        int i = f.Current;
        result+=i;
      }
    }
    finally
    {
      (f as IDisposable).Dispose();
    }
  }
  return result;
}

(and either I am reading ex 6 wrong or it is disposing something it probably shouldn't)

bbarry on 27 Apr 2016

@bbarry My intention is that async enumerators must all implement IDisposable.

It sounds like you find that odd. Here's another alternative1) if you foreach over a _enumerable_, then the foreach statement has acquired a resource and must dispose of it; (2) if you foreach over an _enumerator_ then you're the one who acquired it and you're the one who must release it...

foreach (async var x in enumerable) { ... }
// becomes
using (var enumerator = enumerable.GetEnumerator())
    foreach (async var x in enumerator) { ... }


foreach (async var x in enumerator) { ... }
// becomes
while (await f.MoveNextAsync()) {var x = f.Current; ... }

But actually that feels problematic also. It would mean that these two statements have very different disposal semantics:

foreach (var x in enumerable) { ... }
foreach (var x in enumerable.GetEnumerator(cts.Token)) { ... }

It would also mean that folks who write async enumerator methods (rather than async enumerable) would have to expect that realistically most callers would never end up disposing of them. They'd instead just foreach over them as is easiest.

I wonder if Dispose() even has any role to play? Why would async enumerators have IDisposable rather than some hypothetical IAsyncDisposable? Maybe it would be cleaner to cut out the call to Dispose entirely?

ljw1004 on 27 Apr 2016

@HaloFour Generating an overload seems the most straightforward because it guarantees backwards compat but I haven't thought carefully about every edge case.

I also just add a default value for the CancellationToken, but very often people forget to do this, and once they've forgotten, it breaks the cancellation chain. It makes it impossible to cancel the method and any async methods that method itself calls, even if they do support cancellation. And, as you know, optional parameters are problematic in public interfaces.

Manually adding the cancellation parameter is also a ton of extra code bloat on every single async method (which often is most of them, nowadays).

Having it just be in context of every async method would both make it consistent with this new feature and allow methods to more strictly focus on business logic rather than retyping this plumbing over and over again. Combined with an analyzer that warns you if you forget to pass an existing cancellation token to any async method you call, and you'd get a big win in cancellation support over current state, where most code written doesn't use it.

MgSam on 27 Apr 2016

I think it is odd that you are disposing something that you potentially don't own.

You cannot today foreach over an IEnumerator, only an IEnumerable (and similar constructs). If you could I would think it should not be a compiler task to dispose of it as part of the foreach statement. (passing IEnumerator-like types around that are designed to be mutated seems smelly in the first place, but that is another point I suppose)

I think the pattern might be better if it were:

interface IAsyncEnumerable<T>
{
   IAsyncEnumerator<T> GetEnumerator();
   IAsyncEnumerable<T> ConfigureAwait(bool b);
   IAsyncEnumerable<T> RegisterCancellationToken(CancellationToken cancel = default(CancellationToken));
}

and so

class Foo
{
  Foo GetEnumerator() { return this; }
  ConfiguredTaskAwaitable<bool> MoveNextAsync() { ... }
  int Current {get { ... } }
}

with the same usage method compiling as if it was:

async Task<int> SumAsync(Foo f)
{
  int result = 0;
  {
    Foo e = null;
    try
    {
      e = f.GetEnumerator();
      while(await e.MoveNextAsync())
      {
        int i = e.Current;
        result+=i;
      }
    }
    finally
    {
      (e as IDisposable).Dispose();
    }
  }
  return result;
}

bbarry on 27 Apr 2016

@bbarry @ljw1004

What about having extension methods for ConfigureAwait and RegisterCancellationToken rather than having the IAsyncEnumerable<T> interface have to implement methods for either?

HaloFour on 27 Apr 2016

@HaloFour the IAsyncEnumerable<T> interface doesn't actually have to be implemented at all (it actually can't exist per se in the general form used by the foreach statement because the return type from GetEnumerator() is the pattern IAsyncEnumerator<T>, not any actual type on its own...). It exists for the purpose of the spec conversation here as the interface methods the state machine would generate in order to support this pattern.

bbarry on 27 Apr 2016

@ljw1004 re the CancellationToken passed to GetEnumerator() instead of MoveNextAsync(): This has always been my preference, however the following was actually one of the strong arguments for having it on MoveNextAsync() when I discussed it years ago with the RX folks:

This feels weird. We've always passed in cancellation token at the granularity of an async method, and there's no reason to change that.

Copy & paste error or did I miss what you meant? :smile:

divega on 27 Apr 2016

@divega I explained badly. Let me rephrase:

Some folks suggest that each individual call to MoveNextAsync should have its own cancellation token. This feels weird. We've always passed in cancellation token at the granularity of an async method, and there's no reason to change that: from the perspective of users, the async method they see is the _async iterator method_.

It's also weird because on the consumer side, in the _async-foreach_ construct, there's no good place for the user to write a cancellation token that will go into each MoveNextAsync: from the perspective of the users, they don't even see the existence of MoveNextAsync.

Also: it would be weird to attempt to write an async iterator method where the cancellation-token that you want to listen to gets changed out under your feet every time you do a yield:

// Here I'm tring to write an async iterator that can handle a fresh CancellationToken
// from each MoveNextAsync...
async IAsyncEnumerable<int> f()
{
   var cts = CancellationTokenSource.CreateLinkedTokenSource(async.CancellationToken);
   var t1 = Task.Delay(100, cts.Token);
   var t2 = Task.Delay(100, cts.Token);
   await t1;
   yield 1;

   // ??? at this point can I even trust "cts" any more?
   // or do I need to create a new linked token source for further work that I do now?

   yield 2;

   await t2;
   // ?? is this even valid anymore, given that it's using an outdated cancellation token?
   // If someone cancels that latest freshest token passed in the latest MoveNextAsync,
   // how will that help cancel the existing outstanding tasks?
}

ljw1004 on 27 Apr 2016

from the perspective of users, the async method they see is the async iterator method.

Ah, I get what you mean now. Worth clarifying that is from the perspective of foreach consumers IMO because in the underlying methods things are the other way around, i.e. GetEnumerator() is not async.

I agree with everything else although it seems to be a bit unfortunate that you have to call GetEnumerator(ct) explicitly in async foreach. Ideally users shouldn't need to know about GetEnumerator() either, like they don't need to know when using non-async foreach. Any chance we could get language sugar for passing the CancellationToken? E.g.

c# foreach(await var o in asyncCollection using ct)

BTW, foreach(await ... looks good, but I suggest also considering await foreach for the list of possible syntax.

divega on 27 Apr 2016

Some folks suggest that each individual call to MoveNextAsync should have its own cancellation token. This feels weird. We've always passed in cancellation token at the granularity of an async method, and there's no reason to change that: from the perspective of users, the async method they see is the async iterator method.

@ljw1004 Remember, MoveNextAsync() _is_ the async method here. In .NET, you pass the token to the thing doing the work, which is not GetEnumerator().

I think people are getting hung up here in that this is both a .NET add and a C# add. Those two things shouldn't compromise each-other. IAsyncEnumerable needs to remain intuitive and consistent with the rest of .NET, because it won't always be used with C# sugar on top.

Look at the very similar DbDataReader class.

scalablecory on 27 Apr 2016

Throwing my two cents in here but has this syntax been considered?

foreach(var item in await collection) 
{
}

IMO it would make sense in C++ too:

foreach(auto item : await collection)
{
}

Without the await the code reads as: "For each item that has been retrieved from the collection's values"
With the await the code reads as: "For each item that has been retrieved from the collection by awaiting the collection for values"

Or alternatively, we have async and await -- why not introduce an awaitin keyword?

foreach(var item awaitin collection) 
{
}

tumtumtum on 27 Apr 2016

c# foreach(var item in await collection) { }

That is what you would write if collection was a Task<IEnumerable<T>> (or similar awaitable type). The distinction that you may need to await for each element is important.

divega on 27 Apr 2016

👍2

@bbarry Oh my! This IDisposable stuff is crucial. Let me try to lay down step-by-step the issues and their consequences...

The Dispose question

When you write an async method with try/finally blocks, you _certainly_ expect the finally blocks to execute. Everything in the language+libraries should be geared to support this.
Therefore the enumerator must implement IDisposable (which is the only way that an object can indicate that it needs some final action to be run), and the foreach must do a dispose.
If the async foreach construct is able to consume _enumerables_ then of course it calls GetEnumerator() and then calls Dispose() on the enumerator. Nothing new here.
If the async foreach construct is able to consume _enumerators_ then it must also call Dispose() on the enumerator. This is unusual -- because foreach doesn't look like using, and it's strange that foreach disposes of something it didn't acquire. But it's necessary. Because if we don't do it, then we'll push developers into a pit of failure where they typically fail to dispose of enumerators, and hence fail to execute the finally blocks in async methods. For instance:

// This is a common and easy idiom. We need to make sure it calls Dispose().
foreach (async var x in GetStream()) {
   if (x == 15) break;
}

// It would be ugly boilerplate if the user instead was expected to manually wrap
// each foreach inside a using:
using (var enumerator = GetStream()) {
  foreach (async var x in enumerator) {
    if (x == 15) break;
  }
}

There is one saving grace. The meaning of an enumerator is that it can be consumed exactly once. So if you async foreach over it once, then no one can ever foreach over it again. Therefore it doesn't matter that you've disposed of it.

The CancellationToken question

There are three places at which cancellation tokens might be communicated:

At every MoveNextAsync(token).
- This has two killer disadvantages: it's too hard to write an async iterator method which has to deal with a different token after every yield; and there's no way in the async foreach construct to provide a different token after each one.
With every _enumerator_
- it might be passed in the call to an async iterator method itself GetStream(token) if async iterators can return enumerators
- it might be passed in the call to GetEnumerator(token) if you have an enumerable in hand
With every _enumerable_
- it would be passed to the async iterator method `GetStreamFactory(token) if async iterators can return enumerables.

I'd initially discounted option 3 because I thought that everyone who kicks off a stream deserves to be able to cancel the stream themselves; because it felt weird that you could cancel it once and then everyone who tries to get the enumeration in future will find it cancelled without really knowing why. It means that the enumerable isn't a true pure _factory_ of independent identical streams.

But actually, IEnumerables today aren't true factories either. Consider:

class Baby
{
  private List<string> poops = new List<string> { "morning", "noon", "night" };
  public void DoPoop(string s) => poops.Add(s);
  public IEnumerable<string> GetPoops(string prefix) => poops.Where(p => p.StartsWith(prefix));
}

var b = new Baby();
var p = b.GetPoops("n");
foreach (var s in p) Console.WriteLine(s); // "noon, night"
p.DoPoop("now!!!");
foreach (var s in p) Console.WriteLine(s); // "noon, night, now!!!"

(My wife gave me a 30 minute break from baby-care duties so I could write up these notes...)

Anyway, I've kind of persuaded myself that it's might be tolerable to have cancellation done at the level of enumerable, not enumerator.

Design options

In the light of the two points above, let's talk through the design options again...

Option1: Streams (async enumerators) are foremost. An async iterator method can only produce enumerators like IAsyncEnumerator<T>, not enumerables. The async foreach construct can only consume enumerators, not enumerables.
- _Advantage:_ cancellation has a clear and obvious meaning, no trickery involved. You pass a cancellation token to the async iterator method, end of story.
- _Disadvantage:_ async foreach has to dispose of the enumerator, which feels weird, but at least has the "saving grace" that the stream could in any case never be re-used.
Option2: Async enumerables are foremost. An async iterator method can produce enumerators and enumerables. The async foreach construct can only consume enumerables; it does so by calling GetEnumerator() and then disposing of the enumerator. There is no special support for cancellation, so folks have to cancel at the level of the _enumerable_ not the _enumerator_.
- _Advantage:_ this is the option most in keeping with the existing language. It also treats Dispose very cleanly.
- _Disadvantage:_ it has a sort of "nod and a wink" that even though folks will typically use the feature for one-off streams that are produced once and consumed once, it nevertheless wraps them up in the form of "stream factories" that aren't pure factories of identical streams. The ubiquity of cancellation tokens means you'll never achieve pure factories.
Option3: Enumerators and enumerables are both equally foremost. An async iterator method can produce either. The async foreach construct can symmetrically consume either, and calls Dispose no matter which one it's given. Cancellation is done at the level of the _enumerator_ by leveraging the novel async contextual keyword within async methods.
- Key to this option is the _symmetry_ of async foreach. That's what allows you to either consume foreach (async var x in GetStreamFactory()) or foreach (async var x in GetStreamFactory().GetEnumerator(token)), and not have to think about it.
- _Advantage:_ this option feels the best match for the inherent nature of streams and factories. Also the async contextual keyword is useful for other things such as IAsyncAction. I conjecture it might also be needed for IObservable but don't yet know.
- _Disadvantage:_ Just like Option1 it feels goofy that async foreach (enumerator) disposes of the enumerator when it's done -- but at least like Option1 it still has the saving grace that the enumerator could never be re-used anyway so it doesn't matter. The main disadvantage is that the new async contextual keyword is a heavyweight addition to the language.

At the moment, I think Option2 has the best chance of getting into the language. I think I should prototype Option3 since it's a strict superset of the other options. That'll allow for some good experiments.

ljw1004 on 28 Apr 2016

👍2

@ljw1004 sorry if I am being repetitive... Could we have option 2 but with a twist: that async foreach adds optional syntax that accepts a CancellationToken to be passed to the GeEnumerator() call? Then a cancellation would only affect that particular enumerator but users wouldn't need to call GeEnumerator(cancellationToken) themselves.

divega on 28 Apr 2016

@divega why not do that in the library? Why not have a function

interface IAsyncEnumerable<T>
{
   IAsyncEnumerable<T> WithCancellation(CancellationToken c);
}

with the meaning that every enumerator which is subsequently gotten from this new IAsyncEnumerable will end up using the cancellation token? In the typical idiom, async foreach (var x in f.WithCancellation(c)), there'll only be a single enumerator gotten from it in any case...

ljw1004 on 28 Apr 2016

👍1

Yeah sorry I did edit but not sure why it didn't get saved.

How about awaitin?

foreach(var item await in collection) 
{
}

async, await and awaitin

tumtumtum on 28 Apr 2016

🎉1 👎1 👍1

@ljw1004

Anyway, I've kind of persuaded myself that it's might be tolerable to have cancellation done at the level of enumerable, not enumerator.

I think of cancellation like any other form of composition allowed in LINQ, like TakeWhile, and in that case it does make sense for whatever method this is to return a composed IAsyncEnumerable<T>. Then I could compose the cancellation into the sequence and hand it off to some other method to enumerate over without that method having to be aware and without that method needing an overload for IAsyncEnumerator<T>.

I think that this design would also allow the CancellationToken argument to be moved back to the MoveNextAsync method which would make it more consistent with async method design. The standard enumeration emitted by C# with foreach could pass CancellationToken.None, still leaving room for exploring the potential of having some other way of providing that token in the language. The IAsyncEnumerator<T> provided by the decorated IAsyncEnumerable<T> could then replace that with the token provided in the WithCancellation method, or merge them if they're both cancellable, or whatever.

HaloFour on 28 Apr 2016

👍3

@ljw1004

why not do that in the library? Why not have a function

That of course would work, but it may not be as nice as it could be. See, unlike with async iterators, we can already do a decent job for async foreach completely in the libraries, e.g. for current incarnations of IAsyncEnumerable<T> we have extension methods that go more or less like this:

``` C#
public static async Task ForEachAsync(
this IAsyncEnumerable source,
Action action,
CancellationToken cancellationToken = default(CancellationToken))
{
CheckNotNull(source, nameof(source));
CheckNotNull(action, nameof(action));

using(var enumerator = source.GetEnumerator())
{
    while (await enumerator.MoveNextAsync(cancellationToken)) {
        action(enumerator.Current);
    }
}

}
```

Besides the obvious extra method and delegate I happen to regard async foreach integration in the language mostly as sugar that can make IAsyncEnumerable<T> much nicer to consume. So if it is language sugar and if cancellation is such an important feature of async, why not take it the full way?

BTW, I am very sympathetic with any resistance to add arbitrary things to the language, e.g. adding a language feature that is aware of CancellationToken (although perhaps it could be just about passing any state to the GetEnumerator() method?) is perhaps a leap we haven't taken before and also having to pick among new and existing keywords and patterns for this starts to sounds very expensive. But I just wonder if it is worth doing it in this case.

divega on 28 Apr 2016

(although perhaps it could be just about passing any state to the GetEnumerator() method?)

Considering IEnumerable to be an IEnumerator factory (which 99% of the time in normal workflow code is only ever asked to create a single instance), I think it is more about enabling IEnumerable to generate the IEnumerator instance with the correct state.

Perhaps, :spaghetti: (pretend IEnumerable/IEnumerator didn't matter as an interface but as a pattern), it is really about some arbitrary parameter list to the .MoveNext method:

foreach (var x in enumerable) (state, ...)
{
  ...
}

translates that parameter list to the MoveNext method call (as a completely orthogonal feature to async foreach)?

It would then be reasonably intuitive for a syntax:

var cancellationToken = ...;
await foreach(var x in asyncenumerable) (cancellationToken)
{
  ...
}

(you could reassign cancellationToken here if you so desired a new one for the next MoveNext inside the block for the foreach statement)

bbarry on 28 Apr 2016

@HaloFour If we do move cancellation token into MoveNextAsync(...), could you try writing the body of an async enumerator or enumerable which takes advantage of it? and which handles the case where a different token gets passed in each time? I'd love to see a concrete example. I persuaded myself it can't be done.

ljw1004 on 28 Apr 2016

@ljw1004

Admittedly that's a little tricky and there probably isn't a method that doesn't involve some degree of voodoo. You either need some kind of "ambient" token, like your async.CancellationToken concept or something buried in the framework like a CancellationToken.Current or you need something that could wire it up to any CancellationToken defined as a parameter. You get this problem regardless of where the cancellation token comes from, though, unless the only place you can provide one is in the call to the async iterator directly.

The last option is very voodoo-y. 🍝 In short, the async iterator state machine would take any CancellationToken parameter and instead of using it as is would create and manage its own CancellationTokenSource and pass along that token to the iterator method. Then the state machine would wire up each token provided on MoveNextAsync so that if it is cancelled or becomes cancelled that it cancels the token source which would in turn cancel the token in the iterator. I don't know if the overhead associated with something like that would be too excessive.

HaloFour on 28 Apr 2016

Yes, I think registering each token is the only way it could be done. Unfortunately the common path of CancellationTokenSource.InternalRegister is non-trivial, not the kind of overhead I'd love to have on every loop iteration.

scalablecory on 28 Apr 2016

@ljw1004 @HaloFour personally I prefer the CancellationToken on GetEnumerator() for many reasons. One that comes to mind right now is that conceptually I want the use the CancellationToken to cancel the whole enumeration, so calling MoveNextAsync() after the enumeration got cancelled should be futile. If the CancellationToken was instead in MoveNextAsync() and I could pass a different one each time, then it seems that the cancellation doesn't affect the whole enumeration and I am free to retry calling MoveNextAsync() with a different CancellationToken after I got the first exception.

So please consider my suggestion to have optional syntax to pass the CancellationToken in async foreach on its own merits :smile:

BTW, the idea of having an optional parameter list for GetEnumerator() alongside foreach() doesn't seem completely crazy to me.

divega on 28 Apr 2016

@divega

The experience would be the same regardless of whether the token was passed to GetEnumerator or MoveNextAsync. You'd either need some kind of change to the syntax of foreach to accept that token, or you'd need a method (extension or otherwise) to decorate the IAsyncEnumerable<T> with the token. In the case of foreach syntax it could simply pass the same token on each iteration. If you _needed_ to pass a different token each time then you'd have to manually create and consume the IAsyncEnumerator<T>.

From the async iterator it doesn't matter if the token comes from GetEnumerator or MoveNextAsync, it's going to result in some kind of voodoo in order to consume that token.

As such I'd much rather follow the already defined convention for async methods and have the CancellationToken exist on MoveNextAsync since it is the only actual async method.

HaloFour on 28 Apr 2016

👍1

I think conceptually

interface IAsyncEnumerable<T>
{
    IAsyncEnumerator<T> GetEnumerator(CancellationToken token = default(CancellationToken));
};

Makes the most sense to me; however I'm not sure how if fits nicely with foreach.

@ljw1004 exploring the WithCancellation approach; were you thinking something like

interface IAsyncEnumerable<T>
{
    IAsyncEnumerator<T> GetEnumerator(CancellationToken token = default(CancellationToken));
};

static class IAsyncEnumeratorExtensions<T>
{
    static IAsyncEnumerator<T> WithCancellation(this IAsyncEnumerable<T> enumerable, CancellationToken cancellationToken)
    {
        return enumerable.GetEnumerator(cancellationToken);
    }
}

Then something like this would work

foreach (var item await in collection) 
{
   ...
}

// or

foreach (var item await in collection.WithCancellation(ct)) 
{
   ...
}

benaadams on 28 Apr 2016

@HaloFour I disagree. For an async iterator method to consume a different cancellation token from each individual MoveNext requires an entirely new level of voodoo. Consider:

// Here I'm tring to write an async iterator that can handle a fresh CancellationToken
// from each MoveNextAsync...
async IAsyncEnumerable<int> f()
{
   var cts = CancellationTokenSource.CreateLinkedTokenSource(async.CancellationToken);
   var t1 = Task.Delay(100, cts.Token);
   var t2 = Task.Delay(100, cts.Token);
   await t1;
   yield 1;

   // ??? at this point can I even trust "cts" any more?
   // or do I need to create a new linked token source for further work that I do now?

   yield 2;

   await t2;
   // ?? is this even valid anymore, given that it's using an outdated cancellation token?
   // If someone cancels that latest freshest token passed in the latest MoveNextAsync,
   // how will that help cancel the existing outstanding tasks?
}

If you want a fresh cancellation token on each MoveNext you will have to tell me the semantics of how one should write an async iterator method that responds to cancellation correctly. You can assume we have "minor level voodoo" in the form of async.CancellationToken which retrieves the most recent cancellation token that was passed to MoveNext. But I want you to build on top of that to explain how you'll actually author the async iterator method in a sane coherent way, building upon that minor level voodoo. I think it can't be done, as in my example above.

ljw1004 on 28 Apr 2016

@ljw1004 That is a problem for C# to solve, not for IAsyncEnumerator to solve. The solution should be one that doesn't result in one or the other feeling out of place.

scalablecory on 28 Apr 2016

@ljw1004

Not necessarily. The async iterator state machine could provide a single async.CancellationToken that never changes, but internally it manages its own CancellationTokenSource and links the tokens on each MoveNextAsync. That way even if you grab async.CancellationToken at the very beginning of the iterator into a local and reuse that local throughout the rest of the body the semantics would be as expected. The issue is, of course, the potential overhead. In the most common cases the CancellationToken would be either not cancellable (CancellationToken.None) or would be the same CancellationToken. Both situations are easy to detect and linkage can simply be skipped.

@scalablecory

Exactly, I'm arguing this on the premise that the language should adhere to the conventions of the framework, not change the conventions to suit the needs of the language.

HaloFour on 28 Apr 2016

With the cancellation token given to GetEnumerator the implementation of IAsyncEnumerator can then just take the token in its .ctor

benaadams on 28 Apr 2016

@benaadams

Yes, which nobody (very few) people will actually be writing. So you still end up with voodoo in actually consuming that token from within an iterator, and you're breaking the async method convention.

HaloFour on 28 Apr 2016

@benaadams You asked "WithCancellation approach; were you thinking something like...". No, what I was thinking was this:

interface IAsyncEnumerable<T>
{
   IAsyncEnumerator<T> GetEnumerator();
   IAsyncEnumerable<T> WithCancellation(CancellationToken);
}

In other words, I was thinking of it in exactly the same light as LINQ TakeWhile, following the suggestion of @HaloFour

ljw1004 on 28 Apr 2016

👍1

@ljw1004 does adding the extra method to the interface; add any extra value vs just adding and extension method? i.e. people will have to implement two methods. You can still add a WithCancellation to the strongly typed class to override it.

Re: syntax I like foreach (var item await in asyncCollection)

As what it represents is conceptually like

foreach (var task in asyncCollection)
{
    var item = await task;
    // ...
}

So you are awaiting what is returned from the in

benaadams on 28 Apr 2016

@HaloFour Cancellation should exist as the IAsyncEnumerator level not at MoveNext; unless you are suggesting cancelling one operation of the iteration then continuing iterating; rather than cancelling the whole iteration.

benaadams on 28 Apr 2016

@benaadams if you mean at the IAsyncEnumerator level, 👍

divega on 28 Apr 2016

@divega lol, yes - edited :grin:

benaadams on 28 Apr 2016

@benaadams

I'm not suggesting that you can cancel one MoveNextAsync and start another, no. That doesn't make much sense. Neither does being able to call MoveNextAsync twice or any other variety of garbage that we get saddled with by making IAsyncEnumerator<T> follow the IEnumerator<T> pattern. Certainly once a single iteration is cancelled as is the entire sequence.

But if we're going to move the voodoo back that far I'd say that we'd be better served by nixing the voodoo completely. You pass the CancellationToken directly to the async iterator method and nowhere else. That way there's no voodoo in the iterator method body at all. You lose the functionality of being able to cancel the async operation once it's in flight, but we have precedence of this behavior already with async methods since you can't cancel a Task. At most, maybe wire up some voodoo so that Dispose on the IAsyncEnumerator<T> triggers a cancellation, then at least the client has some control.

HaloFour on 28 Apr 2016

I think I've been thinking of this too narrowly...

The language is mostly neutral whether cancellation happens at enumerable, enumerator or movenext level. With some exceptions...

if you want cancellation at enumerator level but want async iterator methods to produce enumerables, then you need something like my async contextual keyword voodoo.
if you want cancellation at MoveNext level, then again you need the voodoo.
if you want cancellation at enumerator level but want to foreach over enumerables, then you also need to be able to foreach over enumerators

So the debate about cancellation at MoveNext vs Enumerator vs Enumerable should be framed in terms of which of those basic language features are an acceptable price to pay: is the voodoo acceptable? is it acceptable to foreach over an enumerator?

I want to write a prototype which does support those two basic language features. If I get that far, then the cancellation debate is worth having by people implementing different libraries/builders that implement cancellation at the different levels, and seeing which one "wins".

There's a third basic language feature which @divega brought up that's more important to address: should cancellation be added as a first-class concept to the language, and if so then what syntax should the foreach statement use for it? Diego suggested foreach (var x await in xx using cancel). Are there any other contenders? Which other parts of the language would this new feature work in? And what exactly should it do?

ljw1004 on 28 Apr 2016

Diego suggested foreach (var x await in xx using cancel) . Are there any other contenders?

I am not emotionally attached to that choice of syntax or keyword :smile:

~~@HaloFour~~ @bbarry suggested this:

C# foreach (var x await in xx) (cancel)

In his suggestion I like to interpret as (cancel) as a parameters list to be passed to GetEnumerator() which at that point can contain arbitrary parameters and not just the CancellationToken.

divega on 28 Apr 2016

@ljw1004 For the sake of prototyping I think that the entire cancellation argument could be deferred. Why stress over voodoo that may not be necessary? Are we terribly inconvenienced that we can't cancel Tasks returned from async methods today?

@divega

@HaloFour suggested this:

Nope, @bbarry did. I personally think that's pretty hideous. :grin:

HaloFour on 28 Apr 2016

@HaloFour sorry, my mistake.

divega on 28 Apr 2016

@HaloFour, @divega

I suggested:

await foreach (var x in xx) (cancel)

(await before foreach)

The reason is, foreach becomes(ish):

var e = xx.GetEnumerator();
while(e.MoveNext())
{
  var x = e.Current;
}

...

in the idea that you might want to pass a parameter list to the MoveNext() method:

//foreach (var x in xx) (param0, param1)
var e = xx.GetEnumerator();
while(e.MoveNext(param0, param1))
{
  var x = e.Current;
}

with the notion that foreach (var x in xx) is roughly a standin for the method call MoveNext...

Anyway with this concept in mind (having nothing to do with async) it becomes a straightforward leap to think "oh I want to await the MoveNext method", so await foreach (resulting in while(await e.MoveNextAsync()) or while(await e.MoveNextAsync(token)) for await foreach(var x in xx) (token)

I personally like await foreach(var x in xx) as the async foreach syntax (over foreach(async var x in xx)) but I really don't care much; syntax is best left to a late decision made long after we have figured out and proposed acceptable semantics.

I don't really understand why someone might want to pass in a different token to each call of MoveNext, but the general pattern of

var e = xx.Factory();
while(e.SomeWork(param0, param1))
{
  var x = e.CurrentEvaluatedState;
  ...
}

could be succinctly represented with a foreach pattern as:

foreach (var x in xx) (param0, param1) { ... }

and maybe?? that is useful?

bbarry on 28 Apr 2016

My take

Interfaces

interface IAsyncEnumerable<T>
{
    IAsyncEnumerator<T> GetEnumerator(CancellationToken cancellationToken = default(CancellationToken));
};

interface IAsyncEnumerator<T> : IDisposable
{
    Task<bool> MoveNextAsync();
    T Current { get; }
}

public static class IAsyncEnumeratorExtensions
{
    public static IAsyncEnumerator<T> WithCancellation<T>(this IAsyncEnumerable<T> enumerable, CancellationToken cancellationToken)
    {
        return enumerable.GetEnumerator(cancellationToken);
    }
}

Not sure if the WithCancellation can be done entirely with generic constraints to avoid boxing; if it could that would be nice.

Manual

class ManualAsyncEnumerable : IAsyncEnumerable<int>
{
    public ManualAsyncEnumerator GetEnumerator(CancellationToken cancellationToken = default(CancellationToken))
    {
        return new ManualAsyncEnumerator(cancellationToken);
    }
};

struct ManualAsyncEnumerator : IAsyncEnumerator<int>
{
    private CancellationToken _cancellationToken;
    private int _current;
    public ManualAsyncEnumerator(CancellationToken cancellationToken)
    {
        _cancellationToken = cancellationToken;
        _current = -1;
    }
    public async Task<bool> MoveNextAsync()
    {
        if (_current < 9)
        {
            await Task.Delay(100, _cancellationToken);
            _current++;
            return true;
        }
        return false;
    }
    public int Current => _current;
    public void Dispose() { }
};

Automagic

async IAsyncEnumerable<int> AsyncYieldingEnumerator(CancellationToken cancellationToken = default(CancellationToken))
{
    for (var i = 0; i < 10; i++)
    {
        await Task.Delay(100, cancellationToken);
        yield i;
    }
}

Add constraint that foreach ... await in ... needs to have the method GetEnumerator(CancellationToken cancellationToken be or IAsyncEnumerable<T>; which helps differentiate from foreach ... in ... to give compile errors if you interchange the types; making:

async Task Method(CancellationToken cancellationToken)
{
    ManualAsyncEnumerable enuermable = new ManualAsyncEnumerable();

    foreach (var val await in enuermable)
    {
        // ..
    }
    // non boxed cancellation
    foreach (var val await in enuermable.GetEnumerator(cancellationToken))
    {
        // ..
    }
    // boxed cancellation
    foreach (var val await in enuermable.WithCancellation(cancellationToken))
    {
        // ..
    }
    foreach (var val await in AsyncYieldingEnumerator())
    {
        // ..
    }
    // cancellation
    foreach (var val await in AsyncYieldingEnumerator().WithCancellation(cancellationToken))
    {
        // ..
    }
}

benaadams on 28 Apr 2016

ofc not having a different sig would mean you could blend sync and async in one enumerator - which may be a good or bad thing...

struct AsyncEnumerator : IAsyncEnumerator<T>, IEnumerator<T>
{
    public bool MoveNext()
    {
       ...
    }
    public async Task<bool> MoveNextAsync()
    {
       ...
    }
    public T Current => _current;
    public void Dispose() { }
};

benaadams on 29 Apr 2016

@benaadams https://github.com/dotnet/roslyn/issues/261#issuecomment-215581366 I'm almost certain foreach by spec requires the IEnumerable interface GetEnumerator and then uses the one it can by a pattern based lookup on the type or on IEnumerable<T> or it requires GetEnumerator, MoveNext and Current to be in the correct places, so by not implementing IEnumerable and using MoveNextAsync on these one could not do the former there :grin: (§8.8.4)

bbarry on 29 Apr 2016

@bbarry but you could implement both; except you'd have to choose which one was hidden as an explicit interface as you couldn't have two methods differing by return type. Which would mean you couldn't use a struct enumerator for both as one would have to go via interface.

benaadams on 29 Apr 2016

I believe you could implement both implicitly as long as your type on Current was the same.

@ljw1004's proposal doesn't necessitate the IAsyncEnumerable<T> interface used either, merely a pattern of:

CollectionType {
  IteratorType GetEnumerator();
}
IteratorType {
   ElementType Current {get;}
   AwaitableType<bool> MoveNextAsync();
}

edit and under option 1 (https://github.com/dotnet/roslyn/issues/261#issuecomment-215308615), the collection type is not necessary either. edit: (my vote would be for option 2 but for the sake of consideration everything after that comment I am thinking about option 3)

bbarry on 29 Apr 2016

@bbarry sorry I wasn't really very clear; I mean you might want two different enumerator types one for async and one for sync as although in practice they smudged together in the same; the are generally very different things. (e.g. the async would have extra data for cancellation state and its statemachine/builder)

Putting them together means people often end up doing bad things like only implementing one and then doing the other as sync-over-async or async-over-sync; which normally ends unhappily.

benaadams on 29 Apr 2016

Yeah I got that. I am not sure how you might do it with a compiler supported async+yield based method compiled into a builder, but surely if you were writing one manually you could (and it is desirable to do so) wind up with a single type that implements both (I mean that is effectively what any async supporting BCL stream-like type already is right? the methods are merely named something else).

bbarry on 29 Apr 2016

but you could implement both; except you'd have to choose which one was hidden as an explicit interface as you couldn't have two methods differing by return type. Which would mean you couldn't use a struct enumerator for both as one would have to go via interface.

A good reason to name the async version differently, e.g. GetAsyncEnumerator().

divega on 29 Apr 2016

I am trying to create a ValueTaskEnumerable<T> and ValueTaskEnumerableBuilder<T> to play with the spec proposal and am having trouble getting past:

The return type for an async iterator method must be a generic Tasklike Tasklike<T>.

~~I don't think this is true. The iterator method must be Tasklike<bool> right (we are talking about MoveNextAsync() right?)?~~

edit: oh nvmd me

bbarry on 29 Apr 2016

@bbary The "async iterator method" is the one that's written

async IAsyncEnumerable<int> f()
{
}

This method f must have a return type IAsyncEnumerable<int> which is a _tasklike_. I'm re-using this word from the previous proposal. All this work means is that it has an attribute on it that points to the builder:

[Tasklike(typeof(MyAsyncEnumerableBuilder<>))]
interface IAsyncEnumerable<T>

The word _tasklike_ is a poor choice of word within this feature. It would be better renamed [HasFactory(...)] interface IAsyncEnumerable<T> or [HasBuilder(...)] interface IAsyncEnumerator<T>.

Note: the stuff in my spec describes the requirements of the builder type. What I've written is (I think) correct for an enumerator-builder, but incorrect for a "factory" (i.e. an enumerable-builder). I have to go back and revisit that.

ljw1004 on 29 Apr 2016

https://gist.github.com/bbarry/9ae25647d65ad82bfe218125087677ea

Seems to work. I haven't attempted to get too crazy (hand writing the compiler generated state machine gets old fast).

bbarry on 30 Apr 2016

@bbarry do you reckon we could juggle things around so

if you call an async iterator that returns an enumerable and iterate through it entirely on the hot path then you get zero allocations
if you call an async iterator that returns IAsyncEnumerable then there'll obviously be one heap allocation for the IAsynEnumerable, but can it be done with zero extra?

Feel free to change the dance between state machine and builder to achieve this...

ljw1004 on 30 Apr 2016

I think that is possible. I have ValueTaskEnumerable as a class because I am assigning the state machine from Start, so I currently have an allocation in Create for that. There is a bit of complexity in remembering that I'm handling multiple mutable structs which I was unprepared for at 6pm on a Friday (I thought I had it written this way correctly at first but it was writing each yielded value to the screen twice when I ran it).

I was going to write a ValueTaskEnumerator and associated builder and try and refine the api first.

bbarry on 30 Apr 2016

I've implemented a prototype version of C# which supports _consumption_ of async enumerators and async enumerables, foreach (await var x in e). Install instructions are here:
https://github.com/ljw1004/roslyn/blob/features/async-return/docs/specs/feature%20-%20async%20iterators.md

(I'm on vacation until May 16th so won't start work on the _production_ side until then. First step, before implementing any production changes in the compiler, will be to figure out exactly what the builder and state machine should look like in order to be as efficient as possible.)

ljw1004 on 1 May 2016

Upon further thought I don't think it is possible for zero allocations (for a compilation produced machine). One would need type information for the state machine in the enumerator in order to hold the state between calls of MoveNext, but I don't know the name of the type.

Essentially I think I need to write something like this:

public static async ValueTaskEnumerable<int, *> YieldingStuff()
{
  await Task.Delay(1000);
  yield return 1;
  await Task.Delay(1000);
  yield return 2;
}

Which could be compile time transformed to ValueTaskEnumerable<int, Cg3>.

On the bright side I am fairly certain I can do it with vNext generators (#5561):

public static ValueTaskEnumerable<int, Af.Enumerator> AfYieldingStuff()
{
  Af.Await(Task.Delay(1000));
  Af.Yield(1);
  Af.Await(Task.Delay(1000));
  Af.Yield(2);
  return Af.Break<ValueTaskEnumerable<int, Af.Enumerator>>();
}

and generate the necessary method and state machine pre-compilation.

bbarry on 1 May 2016

@bbarry I imagined something like a generic class MyEnumerableBuilder so the state machine strict can be embedded as a field in the builder. I wonder if that has any mileage?

ljw1004 on 1 May 2016

I think you need that to build the right type (at least a generic method that creates the correct tasklike...) but you still need the type information somehow in the ValueTaskEnumerable<> and/or ValueTaskEnumerator<> types in order to hold references to them in the containing methods, don't you?

bbarry on 1 May 2016

For the zero-allocation version, I'm imagining that a single struct ValueTaskEnumerator<> which has a field which is the struct builder, which in turn has a field which is the struct state machine. Or some other arrangement of the last two.

For the single-allocation version (which returns an IAsyncEnumerable), I'm imagining that it returns a class which implements both IAsyncEnumerable and IAsyncEnumerator, and where GetEnumerator() just returns this if it's the first enumerator request of it, and the class has a field which is the struct builder, which in turn has a field which is the struct state machine.

ljw1004 on 1 May 2016

I've updated the implementation I wrote of ValueTaskEnumerable to explicitly implement the interfaces alongside the patterns and separated out the Enumerator implementation and builder.

The interfaces I am implementing are:

public interface IAsyncEnumerable<out T>
{
  IAsyncEnumerable<T> ConfigureAwait(bool continueOnCapturedContext);
  IAsyncEnumerator<T> GetEnumerator();
  IAsyncEnumerable<T> WithCancellationToken(CancellationToken token);
}

public interface IAsyncEnumerator<out T>
{
  IAsyncEnumerator<T> ConfigureAwait(bool continueOnCapturedContext);
  T Current { get; }
  ConfiguredTaskAwaitable<bool> MoveNextAsync();
  IAsyncEnumerator<T> SetCancellationToken(CancellationToken token);
}

Notes (in no particular order):

It still allocates the the enumerator because the only alternative is to make the type of the compiler generated type of the state machine known as a field in the enumerator. As near as I can tell some sort of additional language support would be necessary to make that happen. Allocation doesn't need to happen inside the loop on a non-awaited hot path (talking about allocations of Task<bool>...); these allocations _do_ happen in the explicit IAsyncEnumerator usage (though I believe they are cached instances in Task.FromResult).
@jaredpar is right.
I haven't found any need for the stateMachine parameter on YieldValue
Adding type safety around calling ConfigureAwait and WithCancellationToken such that you cannot call them more than once is possible but annoying.
Writing them as extension methods doesn't seem to provide any benefits and makes the code significantly more complex.
It would be really nice if ConfiguredValueTaskAwaitable could be implicitly cast to ConfiguredTaskAwaitable.
I have no idea what needs to be in the compiler generated SetStateMachine method
What if cancellation was implicit to the state machine?
Producing an "enumerable" [Q2] seems pointless beyond the fact that it more closely matches the pattern currently used in foreach.
Consuming an "enumerator" implies that the enumerator type IDisposable implementation must be allowed to be called more than once.
I really don't like copying the implementation between the pattern implementation and interface implementation for MoveNextAsync.

https://gist.github.com/bbarry/9ae25647d65ad82bfe218125087677ea

bbarry on 2 May 2016

For science, here's something that is zero allocations and almost works (I bet I could make it work with a bit of effort) but demonstrates what I mean about needing the state machine exposed: https://gist.github.com/bbarry/0fca79b6ac8f9ea642a768024560aaa5

bbarry on 2 May 2016

Are async iterator methods even needed as a language feature? I believe it is possible to write them as a library solution today. Like this:

public static IAsyncEnumerable<T2> Select<T1, T2>(IAsyncEnumerable<T1> items, Func<T1, Task<T2>> selector) {
  var yieldSource = new YieldSource<T2>(); //design like TCS
  var sequenceTask = SelectCoreAsync(items, selector, yieldSource.YieldConsumer);
  return yieldSource.GetEnumerable(sequenceTask);
}

static async Task SelectCoreAsync<T1, T2>(
       this IAsyncEnumerable<T1> items,
       Func<T1, Task<T2>> selector,
       AsyncYieldConsumer<T2> yieldConsumer) {
  foreach (await var item in items) {
   var result = await selector(item);
   await yieldConsumer.Yield(result); //libary-based yield
  }
}

Seems clean enough. It does not accomplish all design goals but a lot of them at zero language cost.

YieldSource is a library-provided helper class. It should not be too hard to write that. This design is a "push pull adapter" but thanks to async there is no need for background threads. Basically, every yieldConsumer.Yield call deposits the value and completes the current outstanding MoveNextAsync task. YieldSource only needs to buffer a single item at a time.

To clarify, this proposal is only about the production side. I do not question async foreach for the consumption side.

GSPP on 7 May 2016

@GSPP do you not need an initial IAsyncEnumerable<T> to pass in to both of those methods?

bbarry on 7 May 2016

@bbarry it can be implemented manually by deriving from that interface. There will not be async enumerator CLR support. You can do anything manually that the C# 7 compiler otherwise would generate for you.

GSPP on 7 May 2016

Are async iterator methods even needed as a language feature?

@bbarry While it can be done in current C#, adding language support could make it a lot easier to implement it efficiently -- the same sorts of gains you'd see with ContinueWith vs async/await.

scalablecory on 7 May 2016

@bbarry That thing about "needing the state machine exposed" will, I think, sink the idea of having a non-allocating iterator. Here's why... @CyrusNajmabadi

ValueIterator<int> f() { yield return 1; yield return 2; }

foreach (var x in f()) { Console.WriteLine(x); }

The above code is a simple case of a non-allocating iterator. It relies upon f() returning a value-type, and we subsequently call MoveNext() upon it. But the compiler has to emit the body of MoveNext() somewhere. It can only ever emit it this body into a compiler-generated type struct fStateMachine.

_Therefore, ValueIterator<int> must contain that state-machine as a field._

@bbarry achieved this by actually having the method return ValueIterator<int,fStateMachine>. But this is weird, because the compiler-generated state machine type ends up being exposed in the signature. We couldn't allow this. (Indeed, the developer can't even utter the name of the state machine type).

The only other solution I can imagine is something like anonymous types...

private var f() { yield return 1; yield return 2; }

The compiler will say: "I see the method returns var and has yield return in it. Therefore I will construct a compiler-generated value-type enumerable. The user can either cast it to IEnumerable<int>, or can foreach over it directly to avoid all allocation." The method would have to be either private or a local function to allow this. It would unfortunately have to rely upon type inference, and we'd need to solve circular cases:

private var f() { yield return g().First(); } // infers element type "int"
private var g() { yield return 1; }

private var f2() { yield return g2().First(); }
private var g2() { yield return f2().First(); } // error: circular

I guess the compiler-generated type would indeed be ValueEnumerable<T,TStateMachine> just as @bbarry suggested. This type would have to exist in the framework and be blessed by the compiler. There'd be no way for the compiler to use a different type other than ValueEnumerable.

But you wouldn't even be able to write efficient LINQ extension methods on it because only VB (not C#) allows the receiver of an extension method to be ref...

T First<T,TSM>(ref ValueEnumerable<T,TSM> enumerable)

And what about async iterators? Once again, the compiler would have to tie the feature to one concrete value-type-async-iterator from the framework.

ljw1004 on 10 Jun 2016

The method would have to be either private or a local function to allow this.

This would also be extremely unfortunate as you would not be able to use this for types you want to be publicly enumerated in a non-allocating fashion.

It can only ever emit it this body into a compiler-generated type struct fStateMachine

You hit the nail on the head. If we wanted to allow these methods to be public, then we'd have to allow something like the following (syntax not final):

``` c#
public struct Enumerator GetValues() { yield ... };

or, alternatively

``` c#
class C
{
    public struct enum Enumerator;
    public Enumerator GetValues() { yield; }
}

In both these cases, the effective idea is to be able to name the struct that will hold the state machine. In either case, you are not allowed to supply any body for the type. Instead, you are simply naming it, and stating what accessibility/scope it would have.

The compiler would then fill this type when necessary. It would then be an error to reference this type more than once as the return type of a yielding method.

This would work, but would unfortunately still not solve the Linq problem. Now you have a uniquely named type for each yielding scenario. There would be no way to write a consistent set of linq methods that could operate on it...

_unless_ we had these structs implement some special interface that they then explicitly implemented.

The linq methods would be defined like:

c# First<T>(this ref T value) where T : SomeSpecialInterface

But, as lucian mentioned, we can't use ref in extension methods in C#.

However, i think this scenario is pretty compelling. So maybe we could relax that C# restriction and put it more in line with VB.

CyrusNajmabadi on 10 Jun 2016

@ljw1004, is the implementation of the interfaces required or can it be pattern based?

paulomorgado on 14 Jun 2016

@paulomorgado I think it should be etirely pattern based

ljw1004 on 14 Jun 2016

👍2

Just a little _bump_ to see where this is and how we can get some progress?

clairernovotny on 1 Aug 2016

👍1

I've been working on other things and am starting vacation tomorrow. I'm planning to get back to it in September.

ljw1004 on 2 Aug 2016

👍3

@aelij wrote a sampler for AsyncEnumerable using arbitrary async returns https://arbel.net/2016/08/10/n-async-the-next-generation/

benaadams on 10 Aug 2016

Okay, it's September now, so as promised I'm getting back to async streams. My current plan of attack is:

Gather together as many "stakeholders" as possible, i.e. folks who have APIs or frameworks or whatever that might work nicer with async streams
Concurrently, continue work with my prototype to do the "production" side. As before, it will be a flexible prototype that supports a _superset_ of the different possible approaches, so we can evaluate them all.
Have me and others implement [1] on top of [2] in a variety of ways. See which way wins.

Things to make sure we answer with a definite yes or no (and if yes, then with working code):

IAsyncDisposable (required to allow await inside finally blocks of async iterators)
zero-allocation (as per the above discussion, seems impossible without some goofy language features)
where do async streams get cancellation-token and ConfigureAwait
if "we" provide IAsyncEnumerable, who is "we", and where does it go?
can we do IAsyncOperationWithProgress or IAsyncActionWithProgress

So far for [1] I have EF and RX and BCL streams/files (e.g. @NickCraver above) on my list. If anyone has other ideas of APIs that you think would benefit from async streams, please let me know.

ljw1004 on 1 Sep 2016

👍2

Yep, my work with Google Cloud APIs would benefit. We're currently using System.Interactive.Async and IAsyncEnumerable, if that's relevant. Feel free to include me in any emails, and I'd be happy to join a Hangout or Skype call.

jskeet on 1 Sep 2016

@ljw1004 Service Fabric reliable collections expose yet another async enumerable.

aelij on 1 Sep 2016

@ljw1004 Another area which must perhaps benefit from async enumerables is filesystem access APIs. Right now Directory.EnumerateFiles returns just a synchronous IEnumerable<string>, StorageFolder.GetFilesAsync returns asynchronously the whole list, or allows to fetch files asynchronously in bacthes.

I think, async enumeration fits perfectly here.

vladd on 1 Sep 2016

👍1

An important use could would by async LINQ and async PLINQ. Even if just Where, Select and ToList/Array are supported that's a big win. It's rare to order, group or join by a key that's obtained asynchronously.

GSPP on 1 Sep 2016

@ljw1004 I've long used Ix-Async (now System.Interactive.Async) in my CSV/etc. parsing library.

I've also experimented making my own "performant" Ix-Async (it is quite chatty) by having a chunked extension to IAsyncEnumerable to pull down multiple items at once, as is the case with most I/O. I unfortunately don't have this code anymore, but with ValueTask that complexity may no longer be prudent.

scalablecory on 2 Sep 2016

@vladd unfortunately the underlying win32 filesystem apis are not async.

scalablecory on 2 Sep 2016

there may also be some crossover here with the "channels" stuff that David Fowler is looking at; this is _primarily_ IO focused - essentially it reverses the direction of Stream from pull to push (with async goodness). Perhaps viable as a potential "source" for an async iterator

mgravell on 2 Sep 2016

@mgravell not to be dense here, but isn't Rx already the opposite of Ix? That was the mathematical design. The channels could probably produce IObservable sequences, right?

clairernovotny on 2 Sep 2016

Recap of async streams

We propose to do a language feature and associated library support for C#8 to support async streams. An async stream is a sequence of values where you might need to await for the next sequence.

0. Async stream scenarios

I like streams in general for reading files and communicating over pipes. I wonder if this language feature will become a preferred alternative to the normal .NET stream types, in cases where you want to communicate records rather than bytes?
RX
IX
Azure ServiceFabric
.NET BCL APIs, e.g. enumerate directories? what others?
BufferBlock?
IAsyncOperationWithProgress? IAsyncActionWithProgress?
... ?

1. Language: consumption of async streams

Here's a strawman:

IAsyncEnumerable<int> xs = GetStream();
foreach (await var x in xs) { ... }

// Expansion:
{
    try {
        while (await e.MoveNextAsync()) {
            T x = e.Current;
            ...
        }
    }
    finally {
        (e as IDisposable).Dispose(); // plus a few idiosyncracies here
    }
}

I've suggested the syntax foreach (await var x in xs) but this is up for discussion
I've suggested it should operate upon an IAsyncEnumerable but it's up for discussion whether it should operate upon an IAsyncEnumerator
Presumably it should all be pattern-based, so it works with any xs that satisfies a pattern
Up for discussion how .ConfigureAwait(false) will fit in. I assume xs.ConfigureAwait(false), but this means ConfigureAwait can't be an extension method on IAsyncEnumerable.
Up for discussion how CancellationToken will work: as an argument to GetStream(), or GetEnumerator(), or MoveNext().

2. Language: production of async streams

Here's a strawman:

async IAsyncEnumerable<int> GetStream()
{
   yield 1;
   await Task.Delay(1, async.CancellationToken);
   yield 2;
}

This should be pattern-based to return any suitable type, similar to how normal async methods can return any tasklike type as of C#7. The language team will define the pattern that _builders_ must satisfy, and library authors will implement that pattern to enable their type to be used as the return type from an async iterator method.
Can you have await inside a finally block for an async stream? -- this would require us to solve IAsyncDisposable and I don't like it...
Up for discussion whether to require the async modifier
Can you have both yield and return inside, e.g. for IAsyncOperationWithProgress<T,U>?
Where does cancellation get in? I propose a contextual keyword async inside the method, which binds to a pattern-based part of the builder, and the builder can choose which methods/properties to expose via this keyword.
Can we implement a zero-heap-allocation version of this? -- No, per discussion earlier in this thread, at least not without a serious language design.
Can folks use this for yes (non-async) iterators? -- up for discussion
Should it warn if you lack await operators in the method? -- hopefully not!
In VB, where iterators can be used as lambdas, we'll need some more design work for type inference. In C#, where iterators can be used as local functions, we'll have to verify that it works.

3. Async stream pattern

Here's my strawman of an example type that satisfies the async stream pattern for consumption. I've written it suggestively as an interface, but if it's all pattern-based then you can completely bypass this interface while still fulfilling the pattern.

interface IAsyncEnumerable<T>
{
   IAsyncEnumerator<T> GetEnumerator(CancellationToken cancel = default(CancellationToken));
   IAsyncEnumerable<T> ConfigureAwait(bool b);
}

interface IAsyncEnumerator<T>
{
   IAsyncEnumerator<T> ConfigureAwait(bool b);
   T Current {get;}
   ConfiguredTaskAwaitable<bool> MoveNextAsync();
}

4. Library: .NET implementation of `IAsyncEnumerable`

Where will the .NET implementation of IAsyncEnumerable live? and its builder?

IX have implemented the type and most of the LINQ operators on it.
Azure ServiceFabric has a definition of the interface too.
If it's a core part of .NET, will it have to move into System ? (Which team will own it?)

5. Library: LINQ operators on `IAsyncEnumerable`

We'll presumably want LINQ operators. I hope they will be fast enough. I don't know which ones.

IAsyncEnumerable<int> xs;
IEnumerable<int> ys;

xs = xs.Select(x => x + await t);
xs = ys.Select(x => x + await t); // ???

6. Library: `IObservable`

I'd expect this feature to be a handy way to produce IObservables. Is there anything here for consuming them as well?

ljw1004 on 6 Sep 2016

👍2 ❤1 🎉1

Please consider a ValueTask type for the return of MoveNext()

That's not necessary. If MoveNext completes synchronously, it can return a cached Task<bool> (which is what async methods already do if they return bool and complete synchronously). And if it completes asynchronously, a task would still need to be allocated, even with ValueTask. That actually then ends up making ValueTask a more expensive choice, as it'll always be handing back both a T and a Task<T> (so more data being moved), plus it'll likely end up slightly bloating whatever builder it's stored in.

stephentoub on 6 Sep 2016

👍1

Up for discussion how .ConfigureAwait(false) will fit in. I assume xs.ConfigureAwait(false), but this means ConfigureAwait can't be an extension method on IAsyncEnumerable.

@ljw1004, I didn't understand this part. I do not think any of the ConfigureAwait stuff should be part of the interface. If foreach is pattern based, then ConfigureAwait can be an extension method on IAsyncEnumerable, providing its own MoveNextAsync/Current members, with MoveNextAsync returning a ConfiguredTaskAwaitable, and the foreach will bind to that. You'd have something like:

``` C#
public static ConfiguredAsyncEnumerable ConfigureAwait(bool continueOnCapturedContext) { ... }
...
public struct ConfiguredAsyncEnumerable // optionally but not required to implement interface
{
public ConfiguredAsyncEnumerator GetEnumerator() { ... }
}

public struct ConfiguredAsyncEnumerator // optionally but not required to implement interface
{
public ConfiguredTaskAwaitable MoveNextAsync() { ... }
public T Current { get { ... } }
}
```

stephentoub on 6 Sep 2016

👍2

Up for discussion whether to require the async modifier

I'm likely in the minority, but I'd actually like to see us:

Add an iterator keyword
Allow but not require iterator on iterators (not require for reasons of compat)
Require "async iterator" on async iterators

Then it's clear just from looking at the signature what kind of transformation is going to be done.

iterator => sync iterator
async => async method
async iterator => async iterator

I personally get annoyed with iterators in C# today that I need to scour the method's implementation looking for yields to know whether this is a method the compiler is going to transform or not.

Where does cancellation get in? I propose a contextual keyword async inside the method, which binds to a pattern-based part of the builder, and the builder can choose which methods/properties to expose via this keyword

This is useful, but it's also orthogonal to async streams and I believe should be discussed separately. It provides similar value for arbitrary async return types of regular async methods, not just iterators. It's also not specific to cancellation. For example, the canonical example we've used is returning an IAsyncOperationWithProgress, in which case you'd like this async-exposed object to provide both a CancellationToken for cancellation (triggered by a consumer calling IAsyncOperationWithProgress.Cancel() as well as an IProgress<TProgress> for outputting progress reports.

Can you have both yield and return inside, e.g. for IAsyncOperationWithProgress?

I would prefer to see this example handling via the previously discussed async keyword allowed in the body. I would prefer not to see both of these allowed in the body; seems too subtle and confusing, plus extra complication for the builder to be able to handle both.

Should it warn if you lack await operators in the method? -- hopefully not!

How does this differ from regular async methods? Are you proposing removing the warning from there, too?

I've suggested the syntax foreach (await var x in xs)

Up for discussion how CancellationToken will work: as an argument to GetStream(), or GetEnumerator(), or MoveNext()

My preference would be for it to be embedded into the enumerator. While there may be a few niche/corner-case scenarios where passing a token to MoveNext such that it can be changed on each call would be desirable, it'd potentially be a huge performance hit to support that (e.g. registration for cancellation per MoveNext rather than once for the whole iteration), and it does not seem like a good tradeoff to me. By building the token into the enumerator, you allow for any operation on the enumerator to be canceled while being able to ammortize any associated costs for cancellation across the whole iteration.

If it's a core part of .NET, will it have to move into System ? (Which team will own it?)

It could be added to System.Threading.Tasks.Extensions, which is already a standalone NuGet package that works across various versions. Or we could introduce a separate package for it. But I'm not seeing why it would need to be pushed into a core library like mscorlib.

We'll presumably want LINQ operators. I hope they will be fast enough. I don't know which ones.

"which ones" has a few parts to it... it's not just which operators, but also which combination of sync/async with each operator, e.g. for the single Select method we have today of the form:

``` C#
public static IEnumerable Select(this IEnumerable source, Func selector);

introducing async enumerables means potentially adding multiple forms for this one overload:

``` C#
public static IAsyncEnumerable<T> SelectAsync<T,U>(this IEnumerable<T> source, Func<T, ValueTask<U>> selector);
public static IAsyncEnumerable<T> SelectAsync<T,U>(this IAsyncEnumerable<T> source, Func<T, U> selector);
public static IAsyncEnumerable<T> SelectAsync<T,U>(this IAsyncEnumerable<T> source, Func<T, ValueTask<U>> selector);

and that's just for an overload that takes a single source and a single delegate. The potential combinations explode if you start considering allowing some delegates in an overload to be async or not, to allow methods that operate on multiple enumerables to mix and match whether those are sync or async, etc.

Whatever is done, I suggest it be its own standalone library in corefx, e.g. System.Linq.Async, separate from the definitions of the interfaces, and a package re-usable across all relevant platforms.

stephentoub on 6 Sep 2016

👍4

It could be added to System.Threading.Tasks.Extensions, which is already a standalone NuGet package that works across various versions. Or we could introduce a separate package for it. But I'm not seeing why it would need to be pushed into a core library like mscorlib.

Wouldn't it need to go into mscorlib to enable adding API's that expose async iterators to types defined there? Otherwise, it seems to already have a home in System.Interactive.Async as the de-facto location and implementation of the operators? The assembly name and namespaces could be updated if it's better to use something else.

clairernovotny on 6 Sep 2016

Wouldn't it need to go into mscorlib to enable adding API's that expose async iterators to types defined there?

Only if there were existing mscorlib types to which we wanted to add such APIs. And if/when that happens, then the types can be type-forwarded down. I do not believe we should do that from the get-go. We already have many "mscorlibs", for desktop, for coreclr, for .NET Native, for Xamarin, for Unity, ... adding these as a separate package basically makes it "just work" for all of them, and for previous versions already shipped. Then later if/when we want an API added, we can factor in the work necessary.

stephentoub on 6 Sep 2016

👍3

@ljw1004 What happens if I await a async sequence the never returns?
What benefits would language support bring rather than libraries?

AdamSpeight2008 on 6 Sep 2016

That's not necessary. If MoveNext completes synchronously, it can return a cached Task (which is what async methods already do if they return bool and complete synchronously). And if it completes asynchronously, a task would still need to be allocated, even with ValueTask.

Note for interested readers: the builder is where this code would be. An async method a user might write would be something like this:

async IAsyncEnumerable<int> Foo() {
    yield 1;
    yield 2;
}

the builder for this would create a state machine:

public Task<bool> MoveNextAsync() {
    switch (currentstate) {
    case 0: //starting
      value = 1;
      currentstate = 1;
      return Task.FromResult(true); //implementations of the builder need to do the right thing here
      break; 
...

This (combined with the fact that you couldn't have a zero allocations implementation overall without considerable language support) is why ValueTask<T> isn't really helpful for IAsyncEnumerable<T>.

edit: I did write an implementation above using ValueTask and it is rather pointless (as it must allocate the enumerator and can use cached tasks for the MoveNext results).

bbarry on 6 Sep 2016

@stephentoub,

I'm likely in the minority, but I'd actually like to see us:

Add an iterator keyword

Allow but not require iterator on iterators (not require for reasons of compat)

Require "async iterator" on async iterators

Then it's clear just from looking at the signature what kind of transformation is going to be done.

iterator => sync iterator

async => async method

async iterator => async iterator

I personally get annoyed with iterators in C# today that I need to scour the method's implementation looking for yields to know whether this is a method the compiler is going to transform or not.

👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍

I pushed for that for C#!

paulomorgado on 6 Sep 2016

👎2 👍1

yeah, i'm definitely not a fan of that. I prefer to not force ceremony when coding. I would have preferred to not have 'async' either, but the desire to have things like "await (expr)" unambiguously parse overrode that hope.

Ceremony is ok if it means that the general code inside will be simpler and cleaner. If, however, there are no ambiguity problems, then i would not want any ceremony.

CyrusNajmabadi on 7 Sep 2016

👍2

Note, IMO, that syntactic ship has likely sailed. C# has definitely (for many releases now) erred on the side of brevity. VB has been the language to use if you prefer a more verbose approach to the structure of your code. I do not envision any reason why the next version of C# of VB would change this. It's really one of the few reasons you would pick one of these languages over the other.

CyrusNajmabadi on 7 Sep 2016

👍1

Since you brought up VB, @CyrusNajmabadi, the point I'm are making (and, I guess, @stephentoub) is that C# is all about "there is no magic". async tells you that "This is not really a method! The compiler will just let you write this as if it were and will do all the tedious work for you.". The same for iterators, except that you need to read the code to find out that it's an iterator.

paulomorgado on 7 Sep 2016

The same for iterators, except that you need to read the code to find out that it's an iterator.

Yes. For example, take this code:

``` C#
public IEnumerable GetResults(string key)
{
if (key == null)
{
throw new ArgumentNullException(nameof(key));
}
...

Just from reading the beginning of the method, I can't tell you whether passing in null will end up throwing an exception or not, which is very counterintuitive given that in any other unannotated method, I could definitively tell you that.  If it were instead:

``` C#
public iterator IEnumerable<Result> GetResults(string key)
{
    if (key == null)
    {
        throw new ArgumentNullException(nameof(key));
    }
    ...

I can see just from the signature that there's something special here and that the normal rules don't apply.

I realize the language is never going to enforce adding such a keyword to iterators; that would be a gigantic breaking change. But with it available as an option, we can have our own style enforcement to ensure that it's used. And as we're introducing new kinds of methods (e.g. async iterators), we can make it available (and, ideally from my perspective, required).

stephentoub on 7 Sep 2016

👍3

Then it's clear just from looking at the signature what kind of transformation is going to be done.

I'm curious, why is it important? However, I think for C# an analyzer could enforce annotating such methods with an attribute, because as @CyrusNajmabadi said, this is one of the things that distinguishes C# from VB in terms of conciseness. Not everyone would like that, considering that it wouldn't be ambigious otherwise — I assume that's why yield return is chosen instead of yield in the first place.

alrz on 7 Sep 2016

I'm curious, why is it important?

Because, at least in the code bases I work in most of the time, one is a bug, the other isn't. There's also a performance impact to iterators. As someone who reviews a ton of code, the more obvious such cases, the more likely it is that I'll notice bugs (behavioral or performance) before they ever get checked in. And as someone writing code, having to explicitly mark the method in order to enable the compiler's significant transformation makes the transformation very explicitly opt-in rather than it happening almost as an accident of how the method got implemented.

stephentoub on 7 Sep 2016

👍3

I kinda of agree. It's a bit too late to fix it with iterators (although maybe a last-minute change to local function iterators?) However I'd probably be satisfied if VS were to show iterators differently, and if an iterator keyword were to be added to the language support a code style that would error if it was omitted.

HaloFour on 7 Sep 2016

👍3

the more obvious such cases, the more likely it is that I'll notice bugs

As I said in my edit, I think an analyzer suffices (you could even forbid it altogether!). Unless there is an actual ambiguity that the keyword could resolve, I don't really think that it would worth to introduce a keyword that was not required up untill now.

alrz on 7 Sep 2016

I don't really think that it would worth to introduce a keyword that was not required up untill now

We will need to agree to disagree.

stephentoub on 7 Sep 2016

👍3 😄1

While I don't like verbosity, I have to admit that the iterator keyword would be a responsible addition which would make code a bit safer, like in strongly typed FP langs. Speaking of strongly typed FP langs, another stream library that might make a good reference is https://github.com/nessos/Streams.

lambdakris on 7 Sep 2016

Should the possibility of reserving async in arbitrary tasklike methods be considered as a separate issue item for potential inclusion in the same release as #10902 in the event that this slips?

bbarry on 7 Sep 2016

@bbarry I'm no longer concerned about reserving async. Mads suggested that it can follow exactly the same kind of rules as nameof -- i.e. look it up, and bind to a symbol if there's one defined with that name, otherwise treat it as the contextual keyword. ("It's a keyword only in async method/lambda places where it's not already an identifier").

ljw1004 on 7 Sep 2016

Now that we have analyzers, i definitely lean toward the idea that the language should decide on a good, consistent, default. People with specialized needs can write analyzers to help call out the issue.

For example, for @stephentoub's case, i would write an analyzer that would warn me if i had argument checking at the start of a method that then contained yields within it. I might even go further and have a code fix provider that hten refactored the method to call a sibling helper (or maybe even a nested function :)). That would then be a cool analyzer/fix combo to make available as an extension for those who want it.

CyrusNajmabadi on 7 Sep 2016

👍1

interface IAsyncEnumerator

Changing topics to another part of the proposal...

The main approach we've all been assuming for the interface is:

``` C#
public interface IAsyncEnumerator
{
Task MoveNextAsync();
T Current { get; }
}

With that, a loop like:

``` C#
foreach (await T item in enumerable)
{
    …
}

would compile down to something like:

``` C#
var e = enumerable.GetEnumerator();
while (await e.MoveNextAsync())
{
T item = e.Current;
…
}

I think we should consider (at least discuss) an alternative.

The existing `IEnumerator<T>` has a well-known design concern: it requires two interface calls per element, one for MoveNext and one for Current.  There are various ways to address that for an `IAsyncEnumerator<T>`, one of which would be along the lines of this alternative:

``` C#
public interface IAsyncEnumerator<T>
{
    Task<bool> WaitForNextAsync();
    bool TryGetNext(out T item);
}

which with the aforementioned foreach loop would compile down to something like:

``` C#
var e = enumerable.GetEnumerator();
while (await e.WaitForNextAsync())
{
while (e.TryGetNext(out T item))
{
…
}
}

The idea is that WaitForNextAsync would return a synchronously completed task if data was already available (or if it knew there would never be more data) and otherwise would perform whatever operation was necessary to bring down the next piece of data, but it wouldn’t actually take that data from the enumerator; that would be done by TryGetNext, which would return data available if there is any, otherwise returning false.

One obvious advantage of this is that it addresses the two-interface-calls-per-iteration issue. Worst case, there are two interface calls per iteration, if each call to WaitForNextAsync only makes a single item available for TryGetNext, but best case, there’s only one interface call for each element.

However, the `IEnumerable<T>` two-call design has another downside: it’s not possible to create a thread-safe implementation.  Without locking external to the interface implementation, multiple consumers can’t consume the same enumerator, as the call to MoveNextAsync/Current can’t be made atomic.  That’s not the case with the alternate model.  You can implement a thread-safe WaitForNextAsync/TryGetNext pair, as the TryGetNext itself can use whatever synchronization it needs internally to both get the next item and tell you whether it was successful  Worst case, if the caller loses a race condition with another consumer, TryGetNext can return false and you can loop back around to try again.

Now, today an `IEnumerator<T>` isn’t thread-safe, and as such no one can use it as such. but I think there are more use cases for supporting thread-safety in the async world.  These async enumerators are likely to be used in some cases for producer/consumer models, and once you get there, it’s not far before you want to allow multiple consumers off of the same data stream, parallelized processing of results, etc.  It would be possible to enable such multi-consumer scenarios off of an enumerable, where the enumerable coordinates safety over a single underlying stream, handing out single-consumer enumerators that all coordinate with each other.  But that violates the notion of how we’ve been talking about enumerables, as something where another call to GetEnumerator essentially restarts the operation.

There's also another use case for this model. Imagine you wanted a construct like:

``` C#
IAsyncEnumerator<Student> students = …;
IAsyncEnumerator<Teacher> teachers = …;
select
{
    case students as Student s:
        HandleStudent(s);
        break;
    case teachers as Teacher t:
        HandleTeacher(t);
        break;
}

Anyway, food for thought.

stephentoub on 8 Sep 2016

👍11 ❤1

how about:

await foreach (var thing in Method()) { //I still prefer this syntax
   ...
}

//becomes

var eble = Method();
var etor = eble.GetEnumerator();
while (await eble.WaitForNextAsync(ref etor))
{
    while (eble.TryGetNext(ref etor, out T item))
    {
        ...
    }
}

method is async IAsyncEnumerable<Foo> Method() { ... }
...

then via generators functionality it would be possible to do something like:

public async ValueAsyncEnumerable<Foo, GenerateStateMachine> Method()
{
   if(NotHotPath) { await Something(); }
   yield new Foo();
}
public partial struct GenerateStateMachine {}

where ValueAsyncEnumerable<T, Ttor> implements the pattern IAsyncEnumerable<T>

and the generator can provide the implementation and voila: zero allocation IAsyncEnumerable:

public replace ValueAsyncEnumerable<Foo, GenerateStateMachine> Method()
{
    var eble = default(ValueAsyncEnumerable<Foo, GenerateStateMachine>);
    eble.ThisArg = this;
    return eble;
}
public partial struct GenerateStateMachine {
   ...
}

bbarry on 8 Sep 2016

@bbarry

while (await eble.WaitForNextAsync(ref etor))

Except that the implementation can not be async,

CS1988: Async methods cannot have ref or out parameters

I'm skewed towards Task<(bool, T)> TryGetNext(). It's an await but a single call per element in all cases vs. two interface call per element in the worst case and no await in other cases. Absence of WaitForNextAsync would also simplify the implementation but I'm not sure if that is a concern.

alrz on 8 Sep 2016

I'm skewed towards Task<(bool, T)> TryGetNext()

That's very expensive: we can cache a Task for true and for false, but we can't cache Tasks for any possible (bool, T). The end result would be an allocation on every TryGetNext call, even the ones that complete synchronously.

stephentoub on 8 Sep 2016

👍2

@stephentoub Right, that is definitely a concern. However, it can be declared covariant over T on master unless I'm missing something.

public interface IAsyncEnumerator<out T> {
    Task<(bool, T)> TryGetNext();
}

While out parameters types are not allowed to be covariant because they are represented as ref in CLR.

F# uses option for this (source), I'm assuming that interoperability between the two is not an issue for now,

type IAsyncEnumerator<'T> =
    abstract MoveNext : unit -> Async<'T option>
    inherit IDisposable

In C# however, there is no such type. If we don't need to be able to yield nulls from an async sequence, we can take advantage of #9932,

Task<T?> TryGetNext();

I think Option<T> is still a good investment though.

alrz on 8 Sep 2016

What about ValueTask<(bool, T)> TryGetNext()? Should benchmark this against the version with two virtual calls.

I think using two while loops complicates things a bit, both for the iterator implementation and the consumer.

aelij on 8 Sep 2016

Is it even clear that per-item overhead matters so much for the async case? Async usually is being used for IO heavy code. It tends to be not particularly chatty.

So if I run an async stream over 100 URLs to download the per-item overhead is unmeasurable for all of the solutions that were discussed here.

The only chatty use case I can think of is async ADO.NET using data readers where each field is accessed using an async API. But that's almost always a bad idea in a web app because no throughput can be gained doing that. If a lot of threads are waiting on the database then the database is overloaded anyway. So we must assume few threads which makes async memory savings moot.

Maybe the solution here is a less chatty data reader API. For example, there could be a DataReader.FillObject<T>(T obj) that fills in an entire object (or object[] with all the column values). Could use runtime code gen to optimally deserialize the TDS stream directly into the object.

Are there other chatty use cases? We'd need to talk about 100,000 operations per second per core before async overhead becomes meaningful.

GSPP on 8 Sep 2016

@GSPP long stretches of code running without any allocations in the heap means you could do things like run in environments with an altered GC designed under that constraint (or potentially no GC at all, allocate a block at the start of a run where all "allocations" inside that call stack are made).

It isn't necessarily about the performance of some specific bit of code, but about what environments the code might be able to run in.

bbarry on 8 Sep 2016

@arlz why would the implementation of Task<bool> WaitForNextAsync<T>(ref T enumerator) need to be async?

The general population of users writing async IAsyncEnumerable<T> methods would not need to care and the rare people implementing the pattern would likely simply do something with the value and then perhaps call an internal async method. For example a BufferedNetworkStream might be implemented:

public class BufferedNetworkStream : IAsyncEnumerable<byte, int>
{
    List<byte> _buffer = new List<byte>(); 
    ...
    private asnyc Task<bool> ReadFromUnderlyingAsync()
    {
        if (_remote.Finished) return false;
        if (_remote.HasAvailable) { _buffer.AddRange(_remote.ReadAvailable()); }
        else { _buffer.AddRange(await _remote.ReadAvailableAsync()); }
        return true;
    }
    public Task<bool> WaitForNextAsync(ref int index)
    {
        if (index < _buffer.Length) { return Task.FromResult(true); }
        if (_remote.Finished) { return Task.FromResult(false); }
        return ReadFromUnderlyingAsync();
    }
    public bool TryGetNext(ref int index, out byte value)
    {
        if (index >= _buffer.Length)
        {
            value = default(byte);
            return false;
        }
        value = _buffer[index++];
        return true;
    }
    public int GetEnumerator() => 0;
}

bbarry on 8 Sep 2016

Is it even clear that per-item overhead matters so much for the async case? Async usually is being used for IO heavy code. It tends to be not particularly chatty.

IO can be incredibly chatty; it might not seem so at aggregate "GetURLAsync" level but it is at the socket level.

benaadams on 9 Sep 2016

@GSPP

Is it even clear that per-item overhead matters so much for the async case? Async usually is being used for IO heavy code. It tends to be not particularly chatty.

I believe a very common pattern in website APIs is paging (for example Wikipedia API and Stack Overflow API use it, to name the ones I've used). The way it works is that you make a request for the first _n_ results, then another request for the next _n_ results, etc.

And in these cases, some form of async enumerable would be very useful. But (unless you want to have ugly interface using something like IAsyncEnumerable<IEnumerable<Item>>), they're also fairly chatty.

svick on 9 Sep 2016

@alrz I'm pretty sure variance shouldn't work with tuples, so I have reported it: https://github.com/dotnet/roslyn/issues/13705.

svick on 9 Sep 2016

👍1

@svick I was suspicious myself hence the "unless I'm missing something" part. 😄

Same would apply to reference-type-capable Nullable or non-existent Option along with out params. So we probably wouldn't get covariant IAsyncEnumerable anyways.

alrz on 9 Sep 2016

Please don't make IAsyncEnumerable invariant if there is any way around it. IEnumerable variance has been a tremendous boon for me.

jnm2 on 9 Sep 2016

👍3

Maybe the solution here is a less chatty data reader API. For example, there could be a DataReader.FillObject(T obj) that fills in an entire object (or object[] with all the column values). Could use runtime code gen to optimally deserialize the TDS stream directly into the object.

@GSPP Not sure how well this is documented but ReadAsync() is already supposed to work like that unless you specify CommandBehavior.SequentialAccess on ExecuteReaderAsync(CommandBehavior). I.e. ReadAsync() will only complete when enough has been read from the network that none of the calls to the sync GetXyz(int ordinal) methods would block for that row.

divega on 10 Sep 2016

@svick API paging is very non-chatty with respect to async overhead because the HTTP call is about 1e6x more expensive.

@benaadams that is a good point, but should apply less to sequences than it applies to "scalar" tasks. Or does it? Maybe new patterns will emerge. But I have written quite a few IO layers of various kinds and for some reason I don't recall any instance where I had a chatty sequence based on IO. For some reason it'S either scalars or low-volume sequences.

My favorite trick to make chatty sequences non-chatty is chunking. I sometimes do: items.AsChunked(1000).AsParallel().Select(chunk => chunk.ConvertAll(x=> ...)).AsEnumerable().Flatten(). So that's a workaround that will apply to async sequences as well. (But so far that trick was only needed for CPU-bound work since IO is too bulky to need it.)

@divega I was thinking about that as well.

GSPP on 10 Sep 2016

Currently, neither Task<(bool, T)> TryGetNext(); nor ValueTask<(bool, T)> TryGetNext(); will work where T is Span<T> as it is a stack only type and can't be awaited for.

So you'd need the two steps where one was async but didn't return T and the other was sync and returned T to be able to work with Span<T>.

benaadams on 10 Sep 2016

Following on from @stephentoub https://github.com/dotnet/roslyn/issues/261#issuecomment-245616932, @bbarry https://github.com/dotnet/roslyn/issues/261#issuecomment-245644988 / https://github.com/dotnet/roslyn/issues/261#issuecomment-245676226 (and @jaredpar's article)

Take it all the way back to sync also?

public interface IEnumerable<T, TEnumerator>
{
    TEnumerator StartEnumeration { get; } 
    bool TryGetNext(ref TEnumerator enumerator, out T item);
}

public interface IAsyncEnumerable<T, TEnumerator>
{
    TEnumerator StartAsyncEnumeration { get; } 
    Task<bool> WhenNextAsync(ref TEnumerator enumerator);
    bool TryGetNext(ref TEnumerator enumerator, out T item);
}

benaadams on 13 Sep 2016

😕1 👍1

I have a similar fear about the second type parameter that I had about variance in the first place. Is that going to hamper abstraction? For example, if I need to consume an IAsyncEnumerable with item types deriving from MyBaseType, how would I do it with IAsyncEnumerable`2? Must all code that consumes an enumerable become generic? If I take three enumerables, am I forced to add three otherwise useless generic parameters along the call chain?

jnm2 on 13 Sep 2016

Same as before
For async

foreach (await MyBaseType item in Collectionish)
{
    // …
}

For non-async

foreach (MyBaseType item in Collectionish)
{
    // …
}

benaadams on 13 Sep 2016

@benaadams I'm thinking specifically of method signatures. For example, I'll commonly consume a (possibly variant) IEnumerable as a constructor param. If IEnumerable has two type parameters, what do I do? I'm not adding a generic parameter TCtorParam1Enumerator to the whole class just to be able to receive an IEnumerable in the constructor.
Or in a method that takes multiple enumerables, won't I be forced to expose one generic type parameter in the method signature for each enumerable?

Or, let's say I want to see if I can enumerate MyBaseTypes from some object. What does that look like? var enumerable = collection as IEnumerable<MyBaseType, ?>;

The real issue is that TEnumerator is an implementation detail of the enumerable that no consumer will ever want to care about or have polluting the API.

jnm2 on 13 Sep 2016

@jnm2 This problem is solved in Rust/Swift with "associated types" i.e. generic types that we might not care about most of the time, something like this:

interface IEnumerable<out TItem> {
  type TEnumerator;
  ...
}

However, this probably needs proper CLR support and an extensive type inference among other things.

alrz on 13 Sep 2016

@alrz That's where my mind went; I figured some language had a solution for this. But what if we want IAsyncEnumerable before or without those changes to the CLR?

jnm2 on 13 Sep 2016

The implementation for async, yielding methods need not use the enumerator generic:

a user might type:

async IAsyncEnumerable<int> Foo() {
    yield 1;
    await SomethingAsync();
    yield 2;
}

The data type that the compiler generates could be:

public struct GeneratedMachine : IAsyncStateMachine, IEnumerator<int> {...}

And the method could become:

IAsyncEnumerable<int> Foo() {
    var sm = default(GeneratedMachine);
    sm.ThisArg = this;
    sm.State = -1;
    return AsyncEnumerableBuilder.Create<int, GeneratedMachine>(sm);
}

(or some similar syntax)

The built in implementation would still be allocating, but at some point in the future (when generators come) a non-allocating version could be generated which makes use of the named machine class so that it could be exposed on the method signature.

bbarry on 13 Sep 2016

C# Language Design Meeting, 2016.09.07 - async streams

Today's C# language design meeting was about async streams. I'm putting these notes up mostly raw and unedited.

Quick summary and tentative course of direction

You can write async iterator method that returns ABLE and ATOR
foreach will work on ABLEs but not ATORs
IAsyncEnumerable<int>.Configure(cancel, false) will produce a ConfiguredAsyncEnumerable. It can be an extension method.
Hopefully there'll be overloads so it's easy to provide either cancellation token or ConfigureAwait boolean if you don't need both.
The returned type ConfiguredAsyncEnumerable<T> will not implement IAsyncEnumerable<T>, but will fulfill the foreachable pattern.
GetEnumerator(CancellationToken cancel = default(CancellationToken))
The expansion of foreach will just be .GetEnumerator(), using whatever default was provided by the API.
In this way the compiler doesn't even know about cancellation tokens!
(Question: I wonder if the existing foreach on iterators also just invokes the syntax .GetEnumerator()
and allows for default parameters and CallerMemberName?)
We will call it .GetAsyncEnumerator() not .GetEnumerator() for returning an async enumerator.
[editor's note: I don't understand the following sentence]
That way, configure-await and cancellation can be done in a uniform manner on MoveNextAsync().
You can have await in finally blocks in async iterators. IAsyncEnumerator will implement IAsyncDisposable
but not IDisposable. (There is a larger design space around IAsyncDisposable that we'll do).
foreach (await var x in xs) { ... } <-- there will be special syntax to indicate that foreach is async different.
The choice of syntax is still up for discussion.
Note: if an existing IEnumerable type wishes to also implement IAsyncEnumerable, then the existing foreach (var x in xs) syntax
necessarily have to continue binding to the synchronous version.
The interface design. Do the async ones inherit the synchronous ones? <-- no, don't like implicit syncification of async stuff.

What about IAsyncDisposable as well? <-- not sure. Think of RX CompositeDisposables. Or ASP.NET request scoping
which calls Dispose on the things that are done with the request [Oren]. That reflection-based stack would have
to be enlightened about preferring IAsyncDisposable.
Since lots of components do a runtime type check...
Not clear if "block" or "fire-and-forget" is the best approach. I guess block for client-side resources.
Should we have a non-generic version of async iterator methods? e.g. if we're producing something out of reflection? -- Not sure.
Cancellation. The thing you're trying to cancel is the ENUMERATION, not the factory.

Should we have GetAsyncEnumerator() or GetAsyncEnumeratorAsync()?

If there is any async work needed to obtain the enumerator, it's always possible to defer that to the first MoveNextAsync.

Can we avoid the double-interface-dispatch of the current MoveNext/Current pattern?

When you foreach over a normal IEnumerable, it gets back an object of static type IEnumerator. Then for every element in the sequence it calls IEnumerator.MoveNext() and IEnumerator.Current. That is two interface-dispatches. They're not the fastest. Can we reduce it to just one interface-dispatch? e.g.

Task<bool> TryGetNextAsync(out value) // <-- doesn't work because you can't use out parameters for async work

while (await enumerator.GetNextChunkAsync())
{
   while (enumerator.TryMoveNext(out value)) <-- this would work since it's not async
   {
   }
}

This API pattern would reflect the chunkiness of typical buffered async streams. The hope is that the inner loop could be more efficient.
Indeed it avoids having both ".Current" and ".MoveNext" (two interface calls).

Concern: when you write an async iterator method, then each chunk would have only 1 element inside it, so you'd still have two interface dispatches.

Concern: in ADO.NET, each payload might have < 1, == 1 or > 1 frames in it. I guess that GetNextChunkAsync() would keep fetching payloads until it got >= 1 frames in it.

Should IAsyncEnumerable inherit from IEnumerable

This goes back to an EntityFramework observation that they'd like to have types expose both a synchronous foreach (var x in e) API, and also an asynchronous foreach (await var x in e) API.

Presumably if a type implemented both, then its synchronous IEnumerable version would typically block.

It would always be possible to write a helper method IAsyncEnumerable<T>.AsEnumerable() which takes an async stream and produces a blocking synchronous enumerable from it.

Today we have no (few?) cases of implicit async->sync conversion. Conclusion: no, IAsyncEnumerable should not inherit from IEnumerable.

Non-async iterator methods?

Should we do these things for non-async iterator methods to? -- (1) allow a custom builder and a custom return type, (2) maybe have a more performant pattern that avoids double-interface-dispatch and maybe even avoids heap allocation entirely.

If we do, then it'd be weird to require the modifier "async" on the iterator method.

Also if we do, then I can imagine that there'd be a lot more use of "custom iterator return types" than there is for "custom tasklike return types". The only custom tasklike you really need is ValueTask. But for custom iterator return types, there might be as many as one custom iterator return type per collection type. If so, it would be weird to still be using an attribute [AsyncMethodBuilder] rather than a proper language keyword.

Disposable vs Enumerator

If you foreach over an enumerator then it shouldn't dispose
If you foreach doesn't dispose then it's too dangerous and shouldn't exist

Cancellation

CONSUMPTION SIDE: how do we pass cancellation token into the thing? Here are possible ways we could pass in Cancellation and ConfigureAwait...

var xs = GetAsyncEnumerable();
foreach (await var x in xs.WithCancellation(cancel))
{
}

foreach (await var x in xs.ConfigureAwait(false))
{
}

foreach (await var x in xs.Config(cancel,false)) { ... }  <-- machinery for passing it to the right place
foreach (await var x in xs) (cancel) { ... }  <-- second parameter list passed to implicit .GetAsyncEnumerator() call
foreach (await var x in xs) .ConfigureAwait(false) { ... } <-- implicitly done to the implicit .GetAsyncEnumerator() call

PRODUCTION SIDE: If cancellation were to be passed in at the enumerable level, it's not a good idea. Here's how it would look:

async IAsyncEnumerable<int> f(CancellationToken cancel)
{
   await Task.Delay(1, cancel);
   yield return 1;
   await Task.Delay(2, cancel);
}

Problem is that this way means that a single cancellation will destroy the factory for hereafter. It can't provide a fresh cancellation token for each one.

_Interesting point_. This style of cancellation makes it harder to _compose_ async streams like we do with LINQ.

But it would satisfy the 99.9% case though!

PRODUCTION SIDE: If cancellation were to be passed in at the Enumerator level, how would it look?

// Idea 1: contextual keyword "async" inside an async iterator method
// This is a bigger more general more powerful feature.
async IAsyncEnumerable<int> f()
{
  yield return 1;
  await Task.Delay(1, async.Cancel)
}


// Idea 2: A async iterator method can take a SECOND parameter list, which is used
// for the implicit declaration of the GetEnumerator method.
async IAsyncEnumerable<int> f() (CancellationToken cancel)
{
  yield return 1;
  await Task.Delay(1, async.Cancel)
}

IAsyncDisposable

Let's consider an async iterator method with an "await" inside a "finally". It looks like the code below. The only way to support that "await inside finally" is if we have an IAsyncDisposable.

async IAsyncEnumerable<int> f()
{
  try
  {
     yield return 1;
     await t;
  }
  finally
  {
     //await t;
  }
}

// Here's the expansion of consuming this:
{
    IAsyncEnumerator<int> enumerator = enumerable.GetEnumerator(cancel);
    try
    {
        while (await enumerator.MoveNextAsync())
        {
            var x = enumerator.Current;
            Console.WriteLine(x);
        }
    }
    finally
    {
        await? (enumerator as IAsyncDisposable)?.DisposeAsync();
    }
}

_Interesting point_. What kind of ConfigureAwait or CancellationToken will be passed to the implicit await DisposeAsync(). ANSWER: we establish the convention that any time you have an IAsyncDisposable in hand, you trust that it was already created with an appropriate cancellation token and configure-await inside it. In an async using block, you'd probably pass a cancellation token to the method that constructed and return to you the IAsyncDisposable.

If we have IAsyncDisposable, then inside the body of an async foreach or using clause, there are several control-flow constructs which do an IMPLICIT AWAIT. I never like implicit awaits. One idea is to explicitly signal with the await keyword that, inside such a code block, these control-flow constructs will do an await (or, in the case of the return statement, maybe more than one)...

foreach (await var x in xs)
{
   Console.WriteLine(x);
   if (x > 10) await return 5; // could do arbitrarily many awaits!!!
   await break;
   await goto label;
}

await using (var ad = GetAsyncDisposable())
{
   // the same "await return" and "await break" and "await goto"
}

EntityFramework. From an expressivity perspective, e.g. ADO.NET, it really is desirable to have an async disposal of the stream.

Sync and Async together

EntityFramework. We care about co-existence of Sync and Async.

If an object implements both IDisposable and IAsyncDisposable, then which one should folks pick who have an object in hand?

Guidelines used to say "If you implement IDisposable then you should call Dispose". THey will have to be
updated to say that IAsyncDisposable is also valid. And double-dispose should continue to not throw, even if one of them is disposed via IAsyncDisposable.

Syntax

What syntax to use for async foreach? How about using the "async" modifier? ...

async foreach (var x in xs) { ... }
async using (var x = f()) { ... }

I like "await" because async says "here is something that can await" while await says "I might yield"

await foreach (var x in xs) { ... } <-- what are we actually awaiting? It looks like we're awaiting the entire foreach?
await using (var x = f()) { ... }

foreach (await var x in xs) { ... }
using (await var x = f()) { ... }

We've said we want to add IAsyncDisposable. If so, then there will have to be an "async using" construct. It would probably be nice if the syntax for async-using and async-foreach be aligned.

LINQ

xs.Select(x => x + 1)
xs.SelectAsync(async (x) => x + await t)

We'd have to change the translation of LINQ syntax so that if an await is inside the body then it should stick in an async modifier on the lambda.

What to do about overloads? Here they are:

1. IEnumerable<U> Select<T,U>(this IEnumerable<T>, Func<T,U> lambda)
2. IAsyncEnumerable<U> Select<T,U>(this IEnumerable<T>, Func<T,Task<U>> asyncLambda)

3. IAsyncEnumerable<U> Select<T,U>(this IAsyncEnumerable<T>, Func<T,U> lambda)  <-- this exists as of several years in IX
4. IAsyncEnumerable<U> Select<T,U>(this IAsyncEnumerable<T>, Func<T,ValueTask<U>> asyncLambda) <-- ValueTask is new, hence not much of a corner problem!

But consider if someone wrote code xx.Select(x => Task.FromResult(y)). It would be weird if this used to generate IEnumerable<Task<int>>, but we
added an overload, and now it generates IAsyncEnumerable<int>. -- So how about we just remove the first two overloads? If you want to convert to async, then you call .AsAsync() on an IEnumerable.

What to do about syntax in LINQ expressions? If the following treated await as a keyword rather than identifier, it would technically be a breaking change:

var yy1 = from x in xx.AsAsync()
          select x + await (t);

How about: if you're writing a LINQ expression inside an async method, then await is already a reserved keyword. For the cases where you want to write an async LINQ expression inside a synchronous method, you can also stick an async modifier there:

var yy2 = async
          from x in xx.AsAsync()
          from y in yy
          select x + await t;

Expression trees

IQueryable<U> Select<T,U>(this IQueryable<T>, Expression<Func<T,U>> quotedLambda)

What about executing IQueryable asynchronously? I.e. what about introducing expression trees for await? or do we want an entirely new IAsyncQueryable<T> interface?

Also, what about async methods like .FirstAsync() ?

BCL: we should take what IX has and see what we want to promote.

I don't understand if IAsyncQueryable represents an async network connection to a sync database, or whether it represents a connection to an async server.

We need to discuss expression trees for await.

ljw1004 on 15 Sep 2016

👍6

Idea for non-allocating iterators. The central problem is that if you want a non-allocating iterator, then callers of your method have to declare on their stack an instance of the corresponding state machine. Therefore they need to know the state machine type.

Folks in this thread have suggested possibilities, e.g. @CyrusNajmabadi suggested some magic similar to how VB does it (VB creates unutterable typenames for anonymous methods; it also implicitly creates utterable backing properties when you declare a WithEvents property). @bbarry suggests using the generator feature.

Here's another idea, courtesy of @gafter. It's born from the observation of how we already write non-allocating iterators over ImmutableList<T>. The way it's done today is that you declare the type of your state machine explicitly, and you hand-author the state machine. How about instead if you still have to declare the type, but the compiler fills in the implementation?

public async iterator MyEnumerator<int, struct SM> GetEnumerator() { ... }

This declares a method named GetEnumerator. It also declares alongside that method a nested type named SM with the same accessibility as the method. The compiler fills in the definition of this nested type itself.

This way, consumers who invoke GetEnumerator() will be able to keep an instance of the state machine on their stack.

ljw1004 on 15 Sep 2016

👍3

I think that should be:

that method a nested type named MyEnumerator

But yes, this is definitely an approach that can work. I've discussed this with @MadsTorgersen before and i think it's quite nice.

another option would be to just have public async struct IEnumerator<...> GetEnumerator() and have the compiler come up with a struct type with an unutterable name. The downside is that we don't generally like having public types with awful .Net runtime names. I personally don't mind, but it's probably a lot more work rather than just having the user declare teh name themselves.

We can avoid a new contextual keyword by having them do something like:

public async struct MyEnumerator GetEnumerator() btw.

CyrusNajmabadi on 15 Sep 2016

👍1

Note: i was enamored with the form public async struct IEnumerator<...> GetEnumerator() because it's somewhat similar to what rust does where you just state what traits you have, but that doesn't require that you actually be allocating on the heap. I think it would be very interesting to have that capbility in our language across a wide gamut of cases. But it's a much larger thing to design if we go that route.

CyrusNajmabadi on 15 Sep 2016

🎉1

I am not really tied to the generator feature (I am, but for so many other reasons I'm looking forward to). I'd be perfectly happy if this just worked:

public partial struct SM {} //user puts nothing here
public async ValueAsyncEnumerable<int, SM> Foo() { ... }

I'm simply saying if the consuming foreach lowering accepts the pattern that permits the allocation free form to exist, we get the feature for the cost of someone (perhaps me if nobody else does it first) writing that generator library with no further changes to the compiler.

bbarry on 16 Sep 2016

await? (enumerator as IAsyncDisposable)?.DisposeAsync(); caught my eye. Please say that await? will happen! It's the exact analog of the synchronous foo?.Bar() and it's made its absence felt.

jnm2 on 16 Sep 2016

@jnm2 I've missed await? e as well. I don't think anyone's made a solid proposal for it. I wrote it here just because I was too lazy to write it out longhand. To implement it would be a small amount of straightforward work.

But...

await? e;
await (e ?? Task.CompletedTask);
var x = await (e ?? Task.FromResult(0));

This would only work for await statements (i.e. where you discard the result). It looks pretty cryptic, so much so that I don't think it's better than the longhand using ??. In my personal opinion it wouldn't meet the language design bar.

But...

If you like it, please file an issue, and we'll get C# LDM to discuss it!

ljw1004 on 16 Sep 2016

I've missed await? e as well. I don't think anyone's made a solid proposal for it.

I'm trying to figure out the scenario here. This seems to indicate that this would happen when you had a null task and wanted to 'await' it.

However, that seems super weird to me. First, if you have an actual 'async' method, you would never get a null task. So this would only be because your method chose to return null. But why use null when Task.CompletedTask exists?

In general, you use null because either every value in your domain is taken, or there is no good sentinel value that your domain can provide. Neither of those is the case here. It seems like you'd have to go out of your way to use 'null', which seems odd given how poorly it would interact with everything else in the task world.

CyrusNajmabadi on 16 Sep 2016

@ljw1004

My vote for await within the statement, e.g.:

foreach (await var x in xs) { ... }
using (await var x = f()) { ... }

It just feels cleaner to me.

HaloFour on 16 Sep 2016

i'm a fan of :

c# async foreach (var x in xs) { ... } async using (var x = f()) { ... }

It's a 'foreach' statement that operates 'asynchronously'. It's a 'using' statement that operates 'asynchronously'.

'await var x' reads very strangely to me. While i can _sorta_ see it in the foreach (you're going to await the availability of each variable), it doesn't make any sense to me for 'using' (where you're going to await the disposal of the instance).

'await' very much feels like a marker telling me "ok, right here i'm going to yield and come back once the value is available". But that's not what i'm trying to indicate with these constructs. Instead, i'm just trying to say that the operation i'm performing will operate asynchronously instead of synchronously.

'async' currently means 'awaits' can/will happen inside (i.e. when you have an async-lambda). That fits what's going on here, except that it's also telling the compiler: insert the 'awaits' implicitly as per the pattern we spec out.

CyrusNajmabadi on 16 Sep 2016

👍2

@CyrusNajmabadi

I'm trying to figure out the scenario here. This seems to indicate that this would happen when you had a null task and wanted to 'await' it.

It's simpler than that. We used to have to do if (foo != null) foo.Bar(); and syntactic sugar was added so that ? replaced the if: foo?.Bar();
The same sugar could be applied to if (foo != null) await foo.BarAsync();. It feels more natural beside the synchronous version.

Hey look what I found: https://github.com/dotnet/roslyn/issues/7171

jnm2 on 16 Sep 2016

Ah... i see the use-case now. Note: i could see this being a general null-accepting pattern elsewhere in the language. i.e. "foreach? (var v in foo?.GetValues())"

Essentially, the case is "i'm using ?. elsewhere, so that's how null flows into the system. But then common language patterns (await/foreach/etc) explicitly die on these nulls".

THanks for making this clear for me!

CyrusNajmabadi on 16 Sep 2016

👍1

@CyrusNajmabadi

'await' very much feels like a marker telling me "ok, right here i'm going to yield and come back once the value is available". But that's not what i'm trying to indicate with these constructs.

Isn't that exactly what you're trying to indicate with these constructs? You're are going to yield and come back once the operation has completed, specifically the operation of moving to the next value or of disposing of the resource. You are awaiting per the grammar and conventions as established by C# 5.0. Although I do agree that given the convention there doesn't seem to be an appropriate place to use await with using.

Given the choice I'd prefer await foreach or await using but I don't like the look of the two word keyword phrases.

HaloFour on 16 Sep 2016

ok, right here i'm going to yield and come back once the value is available

Isn't that exactly what you're trying to indicate with these constructs

No. For 'using' i'm not yielding in order to come back one the value is available. so 'using (await var v...' doesn't make sense to me. I'm eventually 'await'ing a call to DisposeAsync, bcause the _using_ itself is asynchronous.

For 'foreach' i can squint and somewhat see it. In that case i am 'await'ing the value being made avialable to me. But that's still because it's an asynchronous foreach. I'm still going to have to 'await' other things as well (eg. the DisposeAsync if the IAsyncEnumerable returns an IAsyncDisposable). To me, the overall operation (which is an aggregation of many things) happens asynchronously. As such, as hte marker is on the construct, it makes sense for it to be 'async' to me, not 'await'.

Because using (await var v...) feels so strange to me, that leads me to want anything ther than that. And once i have something else (like await using (var v... or async using (var v ...), then i want my 'foreach' to match.

But my brain keeps coming around to "what's the difference between this foreach and normal foreach, or this using and a normal using". The answer is always "it executes asynchronously", and as such the async keyword seems to click with my brain the best.

CyrusNajmabadi on 16 Sep 2016

👍2

Note: this is all a personal opinion. Just wanted to share some insight on how my brain groks this stuff.

CyrusNajmabadi on 16 Sep 2016

@CyrusNajmabadi

No. For using i'm not yielding in order to come back one the value is available.

No, you're not. You're awaiting the completion of a task, that task being the disposal of a resource. That's also exactly why you can await a simple Task, no value involved.

But I agree, using is a bizarre syntax case. There isn't a good place to stick the await. I think it should follow what foreach does just out of consistency.

But my brain keeps coming around to "what's the difference between this foreach and normal foreach, or this using and a normal using". The answer is always "it executes asynchronously", and as such the async keyword seems to click with my brain the best.

Yes, but it doesn't work with C# 5.0 where async does not mean this. An async method isn't any more asynchronous that a non-async method, and has no capacity for being asynchronous without await. If it weren't for the syntax ambiguity that required resolution it's even questionable if the async keyword would've appeared in C# at all.

Just wanted to share some insight on how my brain groks this stuff.

This isn't the first argument over the keywords chosen and how they are used when it comes to asynchrony. But I think that C# should remain consistent with itself even if it doesn't quite align with how those words work.

And of course this is also personal opinion. 😀

HaloFour on 16 Sep 2016

No, you're not. You're awaiting the completion of a task, that task being the disposal of a resource. That's also exactly why you can await a simple Task, no value involved.

I'm amenable to that interpretation. But here's why it's not what my brain naturally clicks to. Specifically, when i use 'await', i'm awaiting 'something awaitable'. (And, for this discussion, i'm just going to say "i'm awaiting a task"). As such, i should be able to take that task and do whatever else i wanted with it. i.e. i can write:

``` c#
await complex_expr;

Or i can do:

``` c#
var v = complex_expr;
await v;

That's not the case here with using/foreach. They are not tasks themselves. They are not 'awaitables' themselves. I cannot do:

c# var v = foreach (...) { } await v;

As such, this doesn't compose properly with the construct as we originally added it. I await an awaitable thing. That awaitable thing is something i can otherwise manipulate. As that's not what's going on here, 'await' feels strange to me.

On the other hand, 'async' fits here for me both because it describes the operation of the statement (it executes asynchronously), and because it allows for 'awaits' inside (albeit ones that are implicitly inserted by the compiler).

Note: i am amenable to 'await statement', it's not awful to me :) And i far prefer it to 'statement (await var'.

CyrusNajmabadi on 16 Sep 2016

👍2

'await var x' reads very strangely to me. While i can sorta see it in the foreach (you're going to await the availability of each variable), it doesn't make any sense to me for 'using' (where you're going to await the disposal of the instance).

My thoughts exactly.

I agree that it would be nice to have similar syntax for "async foreach" and "async using", but I think it's more important for each to have an appropriate and intuitive syntax, even if it's not the same one.

thomaslevesque on 16 Sep 2016

I like async foreach better, it is also what F# uses to consume async sequences,

async { for url, length in pages do
  printfn "%s (%d)" url length
} |> Async.Start

Actually, async just defines an _asynchronous context_ which may await (yield) at some point. Note that you only use asyncSeq if you want to produce an async sequence which will be donated by async IAsyncEnumerable<T> or async iterator IAsyncEnumerable<T> in C#.

As for async using I think it reads better also in case of RAII (#181), because you don't await in the declaration location. async using res = new Res(); it starts an asynchronous resource acquisition block.

Should we have a non-generic version of async iterator methods? e.g. if we're producing something out of reflection?

I really like to see this be tackled in the language (#6248?). Non-generic/generic interface hierarchies are really awkward and they are not limited to iterators. You can have the same type with IAsyncEnumerable<object> anyways. And of course, I really hope it does not end up being invariant.

alrz on 16 Sep 2016

The foreach itself is not asynchronous, asynchronous is the sequence it is iterating over. And also the using block is not asynchronous, it just operates on the resource that has asynchronous disposal. Therefore I propose this syntax:

foreach (async var x in xs)

and

using (async var x = new AsyncDisposable())

It also removes ambiguity in the following example:

async
using (var a = new Disposable())
using (var b = new Disposable())
{

} // which resource is disposed synchronously and which asynchronously?

stepanbenes on 16 Sep 2016

I prefer async.
await var x, await using (...), and await foreach (...) look like awaiting for a result of var, using, and foreach _expressions_. These are not expressions so far, but possible in future, especially var expressions (declaration expressions) with high probability.

ufcpp on 16 Sep 2016

The only thing that I don't like about async foreach is that it blocks the whole iteration once it's awaiting a task. It would be nice to be able to "move on" when we are iterating over an async sequence (#8014).

alrz on 16 Sep 2016

Still perfer await before in as it suggests a per loop await

foreach (var x await in xs)

As the await happens on each loop as per var x = await xs or suggesting a pattern similar to

while (var x = await xs.GetNextAsync())

Whereas await before var suggests single await

foreach (await var x in xs)

Rather than a per loop await

The function containing the foreach would need to be async, but adding async to the foreach doesn't seem to add any clarity.

On the other hand for using that would be closer to an async item?

using(async var x = f()) 
{ 
    x + await t;
}

Similar to how the aforementioned lambda's work

xs.SelectAsync(async (x) => x + await t);

Though async lamdbas often confuse me, so not sure its the best pattern; but it does match what's there more?

benaadams on 16 Sep 2016

❤4

I love await in.

jnm2 on 16 Sep 2016

Following on from that is an async using Task returning?

var t0 = using(async var x = f()) 
{ 
    x + await t;
}
var t1 = using(async var x = f()) 
{ 
    x + await t;
}

await Task.WhenAll(t0, t2);

await Task.Run(using(async var x = f()) 
{ 
    x + await t;
});

Should you be awaiting an async using?

await using(async var x = f()) 
{ 
    x + await t;
}

benaadams on 16 Sep 2016

@benaadams

Still prefer await before in as it suggests a per loop await

Then await in await is possible.

C# foreach (var x await in await GetDataAsync())

It will be confusing novices :)

omariom on 16 Sep 2016

If the GetDataAsync returns Task<IAsyncEnumerable<T>> though is that any different to

foreach (await var x in await GetDataAsync()) { ... }

benaadams on 16 Sep 2016

I'm not fan of async at all. That drastically changes the existing meaning (which is 'awaiting is allowed') or introduces the concept of implicit awaiting, something which I am also not a fan of. If the statement causes an await, I want to see the keyword await even if it's in a weird place.

await in await GetDataAsync()

Actually I think it's motivating how clear and precise this is. It tells novices that they are initially awaiting before the loop starts and they are also awaiting each item in the collection.

jnm2 on 16 Sep 2016

👍2

@jnm2 That's my issue with async, too. I don't disagree that the current usage is ambiguous/confusing, but I think that the language shouldn't redefine what it means based on where it's used.

As for foreach (await var x in await GetDataAsync()) { ... } or foreach (var x await in await GetDataAsync()) { ... } I think that both convey pretty accurately that you're awaiting a scalar containing a sequence. I prefer the former as it puts a little more space between where the two awaits go which I think can aid in avoiding confusion. I can see the StackOverflow posts where the developer put await on the wrong side of in for the situation.

HaloFour on 16 Sep 2016

Still not getting why that would happen? IAsyncEnumerable isn't awaitable IAsyncEnumerator is awaitable; so as long as it wasn't returning itself in the common case

foreach (var x await in await data)

Would be a compile error?

foreach (var x await in data)

Would be fine

foreach (var x in await GetDataAsync())

Would sync enumerate over an IEnumerable that was async returned from GetDataAsync() (e.g. Task<IEnumerable>)

benaadams on 16 Sep 2016

@benaadams

For IAsyncEnumerable<T> by itself, probably yeah, assuming nobody sticks a GetAwaiter extension method somewhere. But I think double-await would be perfectly legal for Task<IAsyncEnumerable<T>> or anything that supports GetAwaiter with a Result that has GetAsyncEnumerator.

HaloFour on 16 Sep 2016

Concern: when you write an async iterator method, then each chunk would have only 1 element inside it, so you'd still have two interface dispatches.

Why? If I write code like:

``` c#
async IAsyncEnumerable f()
{
foreach (var batch in batches)
{
var results = await batch.GetAsync();

    foreach (var result in results)
    {
        yield return result;
    }
}

}
```

Then the compiler-generated code could produce one chunk for each batch, since there is no await while a single batch is being processed, no?

svick on 16 Sep 2016

@svick I guess you're right! GetNextChunkAsync() will return an IEnumerable<T>. And this IEnumerable<T> can just keep returning values as it encounters them, until it comes to an await, whereupon it will say that it has no values left. Nice.

ljw1004 on 16 Sep 2016

``` C#
using (var disposable = GetDisposable())
{
// ...
}

roughly stands for:

``` C#
var disposable = GetDisposable();
try
{
    // ...
}
finally
{
    disposable.Dispose();
}

So, if an async disposable would work roughly like this:

``` C#
var disposable = GetAsyncDisposable();
try
{
// ...
}
finally
{
await disposable.DisposeAsync();
}

it should be expressed like this:

``` C#
await using (var disposable = GetAsyncDisposable())
{
    // ...
}

paulomorgado on 20 Sep 2016

As for an async foreach the foreach syntax doesn't explicitly express an assignment to the variable. It just declares it and the assignment is implicit.

So, how would one express a variable that awaits for its value? async var v? var async v?

I think I prefer foreach(await var v in vs).

paulomorgado on 20 Sep 2016

Could we constrain the T to awaitable? ie IAsyncEnumerable<async T>
(See #13928)

AdamSpeight2008 on 20 Sep 2016

Could we constrain the T to awaitable? ie IAsyncEnumerable<async T>

Interesting idea, but it should probably be a separate proposal (what you do with the T isn't relevant to how async sequences are handled).

Anyway, I'm not sure how it would work; await is pattern based and is a purely language-level feature, so I don't think the CLR could enforce the constraint.

thomaslevesque on 20 Sep 2016

_Thoughts on syntax_

When it comes time to retire this thread and start a new one, I'm going to start _two_ new threads: one for syntax, one for semantics.

ljw1004 on 23 Sep 2016

😄2

Async Streams - use-cases

Several architects met to discuss their use-cases of async streams, and what light those use-cases shed on API patterns. These are the meeting notes. It was a free-flowing and fast-moving discussion, and I took notes as best I could. Sorry in advance for any errors that crept into the notes, or confusing bits.

Azure ServeFabric (Jeff Richter, Vaclav Turecek)
IX/RX (Bart de Smet, Oren Novotny)
BCL and TPL (Stephen Toub)
Entity Framework (Diego Vega, Andrew Peters)
C# language (Mads Torgersen, Lucian Wischik)
[WinRT? - maybe Brent Rector has opinions on all the patterns that currently use continuations for chunky APIs]
[OData for its continuation tokens? - not interested right now]
[Azure tools? - should check with Jeff, and Jason Hogg the storage PM]
[XML deserialization? - should check up on this]

Cancellation granularity

Q. What level should you cancel at?

IAsyncEnumerator.MoveNextAsync(CancellationToken cancel)       // <- at the MoveNext level
IAsyncEnumerable.GetAsyncEnumerator(CancellationToken cancel)  // <- at the enumerATOR level
IAsyncEnumerable GetStream(CancellationToken cancel)           // <- at the enumerABLE level

It doesn't make sense to cancel one MoveNextAsync() and then continue to use the rest of the sequence.

If you have an enumerABLE then it's fine if the only cancellation granuality is at the enumerator. But if enumerATOR is the only one, then granularity of cancellation is more subtle.

EntityFramework. They're content with current IX shape of IAsyncEnumerable. For cancellation, IX takes it in MoveNextAsync(CancellationToken), but they don't actively use that per-move-next granularity... if we said just one cancellation token per enumeration then no customer would notice. However they are concerned about patterns that associate cancellation with the whole enumerable.

EntityFramework. It does mixed server+client evaluation - figures out what parts we can do on server vs must do on client. We already flow cancellation through to server. So if we have one entry point for cancellation, that wouldn't be a problem.

ServiceFabric would be okay with doing a timeout (i.e. cancellation) once per iteration, rather than once per MoveNext.

Q. Is it enough to offer cancellation only in the GetAsyncEnumerator() pattern itself? Or must it be common to all async streams via the IAsyncEnumeraable interface?

I don't know if that question is a meaningful one. But it lead to an important use-case...

LINQ scenario. Imagine .Select(_).Select(_).Where(_) and you want cancellation to be propagated through each of the enumerators in the chain.

It's still an open question of you'd actually cancel when you have one of those LINQ queries. PLINQ had an operator .WithCancellation() which automatically propagates it to the whole query. Maybe we'd do the same.

Cancellation token optional?

Q. Would it be a bad thing to suggest using a default value in GetAsyncEnumerator(CancellationToken cancel = default(CancellationToken))? Against the CLS guidelines?

Q. Can you enumerate something without requiring folks to provide a cancellation token?

ServiceFabric scenario. It is _critically important_ here to force folks to pay attention to cancel.

We don't expect most users to see the GetEnumerator method. Most users will foreach over it.

Maybe a Roslyn analyzer could warns if someone does foreach (await var x in e) upon something an e that hasn't yet had cancellation passed to it? Or you could even have an enumerable-like type which _isn't even foreachable_ (doesn't satisfy the foreach pattern), and you have to explicitly call .WithCancellation() on it to get back something that can be foreach'd.

_This is interesting_. We can make IAsyncEnumerable.WithCancellation(...) return an IAsyncEnumerableWithCancellation type, if we wish to enforce anything about it.

What about a syntax specifically for cancellation?

foreach (await var x in xs) cancelToken { ... }   // <- a space specially for the cancellationToken
foreach (await var x in xs) (cancelToken) { ... } // <- a second argument list to be passed to the implicit call to GetEnumerator()

What are the boundaries of a buffered fetch?

ServiceFabric. Imagine you're iterating over an async sequence. It likely buffers under the hood, e.g. fetching 100 records in one go. There are cases where each fetch actively costs money. So you'd like a way to know in advance whether the next MoveNextAsync() will be on a buffer boundary or not. This could be represented either by IAsyncEnumerable<IEnumerable<Record>>, or by just IAsyncEnumerable<Record> with some additional signal.

EntityFramework. This doesn't have an obvious ABI way to know where the boundaries of the buffer are - you get a payload from the server and you don't know if you're going to find == 1, > 1 or < 1 part of a record in the buffer. There isn't a good way to know whether the next thing can be resolved immediately or not. If IAsyncEnumerable wanted a functionality to ask "will the next one be synchronous or not", then ADO.NET can't answer that.

_*This is interesting_. There are circumstances like ServiceFabric where knowing whether the next one will be async is interesting, and others (like async iterator methods and ADO.NET) where it's not possible.

GetAsyncEnumerator vs GetAsyncEnumeratorAsync.

The equivalent of GetAsyncEnumerator in ADO.NET is ExecuteReaderAsync() (i.e. the GetEnumerator equivalent _itself_ is async). Then it has multiple ReadAsync() on it. But in EF this is actually wrapped up inside IX, and the asynchronous work of ExcecuteReaderAsync is deferred until the first MoveNextAsync.

A type that supports both synchronous and asynchronous iteration?

interface IAsyncEnumerable<T> { IAsyncEnumerator<T> GetEnumerator(); }
class C : IAsyncEnumerable<int>, IEnumerable<int> { ... not very nice ... }

// vs

interface IAsyncEnumerable<T> { IAsyncEnumerator<T> GetEnumeratorAsync(); }
class C : IAsyncEnumerable<int>, IEnumerable<int> { ... works fine ... }

foreach (await var x in c) // <-- consume it asynchronously
foreach (var x in c)       // <-- or consume it synchronously; it's up to you

If queries are still going to be executable both synchronously and asynchronously, then a single object would have both .GetEnumerator() and .GetAsyncEnumerator(cancel). This looks weird, but we hope that users don't typically see any calls to GetEnumerator anyway.

EntityFramework. EF has some linq extension methods that are asynchronous. EF team wanted to provide an async experience. .FirstAsync(), .CountAsync(). ALso, if you write the LINQ, and then do ForEachAsync() on it or .ToListAsync(). In general want to get out of this hack business. Our IQueryables happen to also be IAsyncEnumerables which means that so long as you stay in this world, we can defer the decision about how to execute it until right at the end. Likewise our IQueryProviders.

_This is an interesting question_. Can .GetEnumerator() and .GetEnumeratorAsync() exist on a single type?

IAsyncDisposable

[I can't find my notes on this topic]. But it sounded like there's a solid need - from EntityFramewwork at the very least - to support async disposal.

Actor scenario

Actors are built on ServiceFabric. We get proxies to various types of sequences - we do "Get me an observable", and get a local proxy to a remote observable. Then write we queries on it. Then we call SubscribeAsync(). Almost exactly the same as RX. The only difference is that Subscribe is async, and it takes a query identifier. There's also a DisposeAsync() to dispose of a bunch of them.

Expression trees

Bart. We have a Metadata API to discover what's available in the service. It exposes IAsyncQueryable, with all the expected extension methods on them. When you do the .ForEachAsync() extension method, it sends the expression tree to the server, and translates the async expression tree to the equivalent synchronous query to execute on the server. This is where client really wants ability to write metadata queries to the server to say what are the streams that are available, but don't want to do it using classic blocking IQueryable.

Exactly the same thing in Reactive, e.g. when folks want to do async inside .Where() clauses. Even though extpression trees can't capture await, we let them do .Where(x => DummyAwait(...)) for a method we defined T DummyAwait<T>(Task<T> x). The expression tree shows a call to this method. It's the responsibility of whoever consumes (interprets/compiles/executes) that expression tree to do the right thing -- maybe do it synchronously on the server, or do it asynchronously on the client.

EntityFramework. We have similar scenario for queries that need to eb translated.

.NET Framework APIs

We don't have any particular APIs in the .NET Framework that are burning to move over to IAsyncEnumerable.

Probably most of the libraries that adopt IAsyncEnumerable will be distributed-systems APIs.

Where will the types go?

IAsyncEnumerable is currently defined in IX (and used by EntityFramework). It is also separately defined in ServiceFabric.

We'd like to make sure it has a common home.

IX has already implemneted the common LINQ operators for IAsyncEnumerable - has parity. (It used to be done with TaskCompletionSource but they've just finished a rewrite into familiar System.Linq iterator pattern, but would still benefit from language features for async enumerable).

The remaining TODOs on this IX project is: we don't have any Select(this IAsyncEnumerable<T> xs, Func<Task<T,U>> lambda) - i.e. don't have anything that takes in a Task-returning lambda. Partly that's because we want to see where the dust settles on ValueTask-returning lambdas.

IX is a .NET Foundation project.

ServiceFabric. This defines its own IAsyncEnumerable. ServiceFabric provides a reliable distributed dictionary. It lets you iterate over key+value pairs. It lets you serialize key+value pairs to clusters. Its current IAsyncEnumerable has Reset() and MoveNext(CancellationToken).

Note: if one particular library needs some special functionality in its async enumerator type, it can expose extra fields/methods on the concrete enumerable, even if they're lacking from the main IAsyncEnumerable.

ljw1004 on 23 Sep 2016

👍4 ❤3

What are the boundaries of a buffered fetch

Could do something like

interface IAsyncEnumerable 
{
    // ...

    bool NextMoveWillNotYield();
}

Which will help with the fast path batch pattern @svick outlined

benaadams on 24 Sep 2016

@benaadams

I like how @stephentoub's proposal does this.

C# var e = enumerable.GetEnumerator(); while (await e.WaitForNextAsync()) { while (e.TryGetNext(out T item)) { … } }

omariom on 24 Sep 2016

👍3

Covariance

Folks like @alrz have talked about covariance in this thread already.

// VS2010 introduced this conversion in C#:
IEnumerable<string>   -->   IEnumerable<object>

// I think it's important to allow this one too:
IAsyncEnumerable<string>   -->   IAsyncEnumerable<object>

That has implications on the pattern, also as mentioned by @alrz:

interface IAsyncEnumerator<out T>
{
   bool TryGetNext(out T item);  // error: out parameters can't be covariant

   Task<(bool, T)> TryGetNext(); // bad, involves too many allocations

   ITask<T?> GetNextAsync(); // again, involves an allocation each time, and needs new lang features

   T TryGetNext(out bool succeeded); // is pretty weird
}

The only one of these that works without extra allocations is the last one. It's not nice on the consumption side...

// I like consuming "bool TryGetNext(out T value)"
while (en.TryGetNext(out var x)) { ... }

// I don't like consuming "T TryGetNext(out bool succeeded)"
while (true)
{
   var x = en.TryGetNext(out bool b); if (b) break;
   ...
}

// I suppose I could consume it in just a single line...
while (TryGetNext(out var b) is var x && b) { ... }

That last single-liner was proposed by @MadsTorgersen and uses pattern matching.

The option T TryGetNext(out bool succeeded) is also weird because it involves synthesizing a returned value even in cases where it failed. That's doable of course with default(T) but feels weird.

ljw1004 on 26 Sep 2016

👍2

This solves the "too many allocations" problem:

c# Task WaitForNextAsync(); (bool, T) TryGetNext();

Whatever the outcome, I strongly feel that variance is more important than ease of consumption.

jnm2 on 26 Sep 2016

@jnm2 As @svick pointed out, that is not possible because tuples are invariant, so it is not an option.

alrz on 26 Sep 2016

Oh right. So we're forced to have a T return value, whether from a Current property or the MoveNext method, or an IMoveNextResult<out T> or T[] which is an allocation. Returning T? is definitely the coolest here and I want that new language feature, but I'm guessing waiting for that isn't a good option.
Covariance is worth it.

Someday, out could be introduced as a first-class construct to the CLR separate from ref so that out params don't block covariance. Or, since we should move to tuple returns instead of out parameters... why again is it good for tuples to be invariant?

jnm2 on 26 Sep 2016

If we're going to choose the two-method path, I think the following has a sensible structure,

public interface IAsyncEnumerator<out T>
{
    Task<bool> WaitForNextAsync();
    T? TryGetNext();
}

With non-nullable reference types on the way, I think this should be really considered as an actual use case.

alrz on 26 Sep 2016

@alrz what if null is a valid value of the sequence? It's not very common, but not unheard of either.

thomaslevesque on 26 Sep 2016

@alrz

You very likely couldn't have T? without either class or struct constraints.

HaloFour on 26 Sep 2016

@thomaslevesque How many times it occurred to you to return null form an async Task<T> method? Besides of being rare, it's absurd. Since ADTs are also on the table, I think you should use something like Maybe<T> to indicate that the sequence may yield Nothing.

@HaloFour That is filed as #9932. It would be unfortunate to be not able to use an unconstrained T? IMO.

alrz on 26 Sep 2016

👎1

@thomaslevesque How many times it occurred to you to return null form an async Task method?

About as often as returning null from a method that returns T, which is not rarely at all. Admittedly it's much less common in sequences, but I don't think the API should forbid it entirely.

thomaslevesque on 26 Sep 2016

while (TryGetNext(out var b) is var x && b) { ... }

I kinda like that. x being readonly is an interesting bonus also. It is unfortunate that this call is backwards compared to every other TryGet* method out there though.

I don't think handwritten consumption shapes should weigh heavily on the design here.

bbarry on 26 Sep 2016

@alrz

That has two problems. First it binds this proposal to the fate of another must-less-certain proposal, one that would likely require CLR and BCL changes in order to accomplish. Two, it still results in the loss of variance since structs can't be variant.

HaloFour on 26 Sep 2016

Pattern: familiar vs efficient

_(I'm summing up some currently-under-discussion design points)..._

What should the async foreach pattern be like? First option is to be familiar like IEnumerable. Here's an example type that would satisfy the pattern:

interface IAsyncEnumerable<out T> {
   IAsyncEnumerator<T> GetAsyncEnumerator();
}

interface IAsyncEnumerator<out T> {
   T Current {get;}
   Task<bool> MoveNextAsync();
}

The other option is to be more efficient. We can be more efficient in a few ways: (1) by avoiding heap allocation and returning just structs which include state machine and method builder and enumerator, so the caller can allocate them all on the stack; (2) by avoiding the double-interface dispatch; (3) by having a tight non-async inner loop. There's been lots of discussion on fine-tuning the precise best way to achieve these efficiency goals, but here are some simplistic versions:

// (1) avoiding heap allocation entirely
// Declaration side:
async iterator MyAsyncEnumerable<int, struct StateMachine> GetStream() { ... }
// Consumption side:
foreach (var x in GetStream()) { ... }

// (2) avoiding the double-interface-dispatch
interface IAsyncEnumerator<out T> {
   Task<Tuple<T,bool>> TryGetNextAsync()
}

// (3) avoiding async when working through the batch...
while (await enumerator.GetNextChunkAsync())
{
   while (enumerator.TryMoveNext(out value)) { ... }
}

_As the discussion has progressed, I've seen the "efficient" versions become steadily less appealing..._

Heap allocations. Why do we care about avoiding heap allocations entirely? I see that eliminating heap allocation is desirable for in-memory data structures that you iterate over with IEnumerable. But when it comes to _async_ streams, if the entire stream ever gets a cold await, then it will necessarily involve allocation. The only folks who will benefit from heap-free async streams are those who write middleware, e.g. something that can sit equally on top of either a MemoryStream or a NetworkStream, and still be as efficient as possible in the first case. (We did indeed see such a need when we introduced async).

In all, the heavy-weight language work needed to support heap-free async streams seems way disproportionate to the benefit. (That language work might be struct SM like I wrote in the above code, or the ability for a method to have var as its return type, or the ability to declare something that looks like a struct and also a method).

Heap-free streams as we've envisaged them will only apply to consumption by foreach: as soon as you use the LINQ extension methods, then you get boxing. @MadsTorgersen and @CyrusNajmabadi spent some time exploring LINQ alternatives that avoid boxing. The gist of it was that you could pass the type of the enumerable and enumerator. This is clever, but looks pretty heavyweight.

// This is familiar LINQ extension method
static void Select<T,U>(this IEnumerable<T> src, Func<T,U> lambda)

// We could plumb more information through to avoid boxing
static void Select<T,Table, Tator, ...>(this IEnumerable<T,Table,Tator> src, ...)

At this point we waved our hands and said "We've discussed and investigated escape analysis in the past -- something where some part of the infrastructure can see that the IEnumerable doesn't escape from the method even despite its use in LINQ queries. If we had such escape analysis then it would benefit everyone, including the folks who need it most, rather than being an efficiency oddity specific to async streams that requires everyone to rewrite their code."

Conclusion: should give up on heap-free async streams.

Avoid double interface dispatch. Sometimes we believe that interface dispatch onto a struct is one of the slowest parts of .NET. Other times we think it's cached and so pretty fast. We haven't benchmarked this yet. There's no point pursuing it for efficiency's sake unless it's been benchmarked.

The downside of "avoid double interface dispatch" is that it doesn't work nicely with covariance. And covariance is more important. The only way to retain covariance in just a single interface dispatch is something like this: T TryGetNext(out bool succeeded). That's hard to stomach.

One attractive feature of a TryGetNext method is that it makes the enumerator _atomic_. (When it's split into a MoveNext method call followed by a Current property fetch, that can't be atomic, and will lead to race conditions of two threads try to consume an enumerator at the same time). Atomicity is nice. It means, for instance, that async streams could be consumed by a "choice" language primitive (similar to the choice operator in CCS and pi calculus, similar to what Go uses).

Avoid async when working through the batch. Let's work through concretely how this would work when you iterate over an async iterator method. There are subtleties here that aren't at first obvious...

async iterator IAsyncEnumerable<int> GetStream()
{
   while (true)
   {
      await buf.GetNextByteAsync();
      yield return buf.NextByte;
   }
}

var enumerator = GetStream().GetEnumerator();
while (await enumerator.GetNextChunkAsync())
{
   while (enumerator.TryMoveNext(out value)) { ... }
}

The question is, _what granularity do the chunks come in?_

The easy answer is that GetNextChunkAsync() will progress as far as the next yield return, and then TryMoveNext will succeed exactly once. This is easy to implement but it defeats most of the value-prop of chunking in the pattern -- because each chunk will be exactly one item big.

A more complicated answer is that TryMoveNext will execute as much of the method as possible. It will have the ability even to kick off an await by calling GetAwaiter() and then awaiter.IsCompleted. Only when it comes to the end of the async iterator method or to a _cold await_ will it finally return false. There's something a little hairy about this, about having the await be finished by a different method from the one that started it. Does it also have implications for IAsyncDisposable? Not sure.

Conclusion: we should of course benchmark the single-interface-dispatch and the buffers. But they would have to show _dramatic_ wins to be worth the concommitant ugliness.

ljw1004 on 26 Sep 2016

👍7

Cancellation and ConfigureAwait

_(I'm summing up some currently-under-discussion design points)..._

We'd talked about cancellation being done in two ways:

// way 1
using (var ator = able.GetAsyncEnumerator(token))

// way 2
foreach (await var x in xs.WithCancellation(token))

To avoid the compiler+language having to know about cancellation, we could define the first method with a default parameter: IAsyncEnumerator<int> GetAsyncEnumerator(CancellationToken cancel = default(CancellationToken)).

We'd talked about ConfigureAwait being done only in the second way:

foreach (await var x in xs.ConfigureAwait(false))

We'd talked about a shorthand .Configure() method to let easily both provide a cancellation token and configure the await.

We'd talked about how when you obtain an obejct that implements IAsyncDisposable, then knowledge of how it should await/cancel has already been stored inside the object, and so a consumer need only await it.

_QUESTIONS_.

Q1. Why do we need both "way1" and "way2"? Can't we just do it with "way2"?

Q2. Normally you can do foreach (var x in xs) and common datatypes give you back a struct type for your enumerator, e.g. List<T>.Enumerator. Is this still possible with the .ConfigureCancellation() approach?

Q3. Does .ConfigureCancellation() get defined in IAsyncEnumerable? Or is it an extension method on IAsyncEnumerable? Or is it left to be defined by any concrete types that wants to offer it?

Q4. Does .ConfigureCancellation return an IAsyncEnumerable that can be composed further? Or does it return a ICancelledAsyncEnumerable which can't be used by the LINQ combinators but which does satisfy the foreach pattern?

Q5. For folks like ServiceFabric who wish to _force_ you to provide a cancellation token, would we make it so IAsyncEnumerable itself _doesn't_ satisfy the async foreach pattern? -- Answer: no. This can be done by an analyzer. This enforcement doesn't belong in IAsyncEnumerable.

ljw1004 on 26 Sep 2016

👍2

Home of IAsyncEnumerable

_(I'm summing up some currently-under-discussion design points)..._

Where is the home of IAsyncEnumerable? There's one defined in IX (which doesn't quite satisfy our needs because it has cancellation as a parameter of MoveNextAsync). There's one defined in Azure ServiceFabric (which again isn't quite right).

It feels like the interface type IAsyncEnumerable itself should be defined in System or mscorlib or somewhere central, similar to IObservable. But the LINQ library methods should be provided in IX.

Not sure about the extension method IAsyncEnumerable<T> AsAsyncEnumerable<T>(this IEnumerable<T> src).

What do folks @onovotny think about this?

ljw1004 on 26 Sep 2016

I think that it makes sense to have the interface itself live somewhere central so that methods from mscorlib or System could potentially return it as a signature with its own internal implementation.

For the home of the LINQ implementation, I would recommend IX. We would of course welcome any and all contributions from Microsoft and other teams. IX would adapt and use whatever the final signature is for the IAsyncEnumerable, so I'm not too concerned about the current implementation having a MoveNext with a cancellation token. We would need to coordinate with partner teams, like EF, on how/when to make that breaking change, but a major version increment should cover that scenario with good justification.

I would think the extension methods should go along with IX as well -- basically, if you want to do IX Async, that's the real library to reference for the main logic.

clairernovotny on 27 Sep 2016

@ljw1004 I think new extension methods for IAsyncEnumerable should be part of the BCL, not IX/RX. The set of people using RX (or that even know about RX) is a tiny fraction of the total user base of .NET.

Personally, I also find RX really hard to use because the documentation is spotty / dated. We shouldn't tie a brand-new language feature to an old library designed for a world in which the language feature didn't yet exist.

MgSam on 27 Sep 2016

👍3

The set of people using RX (or that even know about RX) is a tiny fraction of the total user base of .NET.

And that's even more true of IX

thomaslevesque on 27 Sep 2016

@MgSam @thomaslevesque

I completely agree. However I think the reason for that failing is largely due to the lack of attention Rx/Ix received from Microsoft. My opinion has always been that since Rx is so far along with its implementation of LINQ on both push and pull streams that it makes a great deal of sense for Microsoft to take advantage of their own work and build async streams on top of that. If that entailed bringing Rx under the BCL proper, which I imagine would require a bit of a refactoring, I would not be opposed to that. It likely requires it just from a consistency point of view since Rx and TPL diverged quite a bit in common areas.

It makes me sad that Rx seems to be getting more love on the JVM these days than on .NET where it was born. And here we are arguing over the details of reimplementing much of it.

HaloFour on 27 Sep 2016

👍1

@HaloFour By the time C# 7+1 ships Rx/Ix will likely be over 10 years old. It is filled with outdated paradigms and the web filled with outdated documentation.

I think a much better tact would be adding a new library for IAsyncEnumerable to the BCL, and taking the parts of Rx/Ix that fit with the new paradigm and are worth keeping and adding them in.

Rx/Ix never went through the same kind of rigorous API review that BCL methods do. Adding the good parts to the BCL will ensure that they do.
Importantly, the new library should also not be called Rx/Ix. Rx/Ix have had too many changes over time, and too much outdated documentation exists. This makes Google searches for the library a minefield of bad info. In addition, many devs that I've spoken to have given up when trying to use Rx, so it likely has a negative connotation for many.

MgSam on 28 Sep 2016

This thread has raised a several good points about Ix and Rx.

Few thoughts here:

Some of the comments have addressed Rx as opposed to Ix. Ix, despite living in the Rx repository, can version and release independently of Rx. For the purposes of this thread and disccussion, I think it would be helpful to focus on Ix rather than Rx. Of course, we'd be more than happy to have discussions about the current state and future of Rx over on the Rx repo.

There was some disccussion here around the lack of discoverability around Ix. What do you think could help fix that? Ix today already has the System.Interactive.Async NuGet package name to better reflect its core framework nature. Is it something that can be addressed by having more direct references to it from the BCL teams?

When it comes to documentation, we wholeheartedly agree that it could be made better. That's also an area that can and will be improved. We would very interested in what kinds of changes would make the documentation better.

About suggestions that Ix hasn't had the same sort of API review and was designed for a time pre-async and has outdated techniques, I would respectfully disagree. The API of Ix has been carefully thought out by the team and treated with the same level of review that an "in-box" library would have. Furthermore, Ix Async itself was rewritten this summer to use a modern approach based on what System.Linq does today. The resulting code has fewer allocations and is much easier to reason about and debug. Code coverage for its tests is over 95%.

I would suggest that Ix overall is hardened and time-tested with very thorough test coverage and review practices. The location of the code should not matter, but lest that be a blocker, the code could split into its own repo should there be a need.

Overall, the path to implementing whatever shape of the interface resuls from this discussion is far shorter and mostly done aleady. That's a huge boost.

I would also like to state that community contributions are very welcome. In fact, it was a community member that did the initial refactor to introduce async and await to Ix. We're very open to external contributions.

clairernovotny on 30 Sep 2016

👍2

@onovotny

We would very interested in what kinds of changes would make the documentation better.

My main issue with the documentation of Rx, is that the only API reference on MSDN is horribly out of date.

The situation seems to be even worse for Ix: there's nothing on MSDN or anywhere else, and sometimes methods don't even have XML documentation comments.

For example, when I search for "AsyncEnumerable.Buffer" (or "IAsyncEnumerable Buffer"), I find:

https://fuqua.io/Rx.NET/ix-docs/html/M_System_Linq_AsyncEnumerable_Buffer__1.htm, which looks like an unofficial auto-generated documentation, but doesn't explain what the method does at all (likely caused by the next point)
https://github.com/Reactive-Extensions/Rx.NET/blob/master/Ix.NET/Source/System.Interactive.Async/Buffer.cs, the actual source, with no XML (or any other) documentation comments

svick on 1 Oct 2016

👍3

Interesting to compare notes with the concurrent development in JavaScript. http://www.2ality.com/2016/10/asynchronous-iteration.html

jnm2 on 12 Oct 2016

@onovotny I agree with your comments about Ix being carefully designed and implemented. As you know, we were able to switch vNext of EF Core (currently on our dev branch) to the new version that contains the reimplementation of the operators, and we appreciate the advantages of the new implementations.

That said there are still a few issues we should discuss with regards to the API aligment with the idiomatic patterns used across .NET for async. In particular we believe that query operators that return a single T should use the Async suffix. The fact that they don't is one the reasons we have so far kept the current IAsyncEnumerable<T> type mostly hidden from application developers using EF Core.

In summary, I actually don't have a strong opinion on where the LINQ operators for IAsyncEnumerable<T> should live, but if a future major version of Ix adopted the new definition of the type in the BCL and switched to names that align better with the async pattern that would help a lot.

cc @anpete

divega on 1 Dec 2016

👍2

@divega AFAIK, the reason the Async suffix isn't used is so that the operators work with the LINQ syntax conventions. If updating the LINQ syntax to resolve *Async versions is in scope, then that would answer that, but I'm not sure that's under consideration.

clairernovotny on 1 Dec 2016

@onovotny are you referring to VB query comprehension syntax? Otherwise do you know which operators map to C# comprehension syntax for which this would be a problem? As far as I remember VB provides sugar for many more operators but is strict about their return types. If I am remembering correctly, this means the current naming probably doesn't help.

To clarify, I said previously that it was about IAsyncEnumerable<T> operators that return T, but it is actually applicable to all awaitable operators, including things like ToListAsync().

BTW, this could be an interesting criteria to decide where to place operators. Let's say there are two groups:

Those that facilitate the _consumption_ of async collections
Those that facilitate query _composition_

I believe the second group is for more advanced/less common scenarios and having to include an extra dependency to use them wouldn't hurt too much.

divega on 1 Dec 2016

I'm specifically referring to the LINQ syntax from, select, groupby, that uses a duck pattern match to find matching methods/overloads. It doesn't care about the return types and works today with Ix Async :)

I agree there seems to be two groups as you describe, and there's no keyword equiv of ToListAsync, FirstAsync or SingleOrDefaultAsync, etc. Those could easily be renamed as part of the version that implements the new interface design.

clairernovotny on 1 Dec 2016

@onovotny ok, I was referring specifically to the ones that can be awaited. For from, select, where, groupby, join, etc. I agree the names should remain the same.

divega on 1 Dec 2016

👍1

This is now tracked at https://github.com/dotnet/csharplang/issues/43

gafter on 28 Mar 2017

👍1

Here's my take on async enumerators/sequences using C# 7's Task-Like types. The approach could potentially act as a playground for the language feature.

Andrew-Hanlon on 9 Jul 2017

👍1

@Andrew-Hanlon AsyncEnumerator<> looks like the approach I've taken so far.
The thing I'm afraid of is that finally blocks (using statements) won't run if the enumeration is incomplete, even though the enumerator is disposed.

jnm2 on 9 Jul 2017

👍1

@jnm2 That's a good point. Making the AsyncEnumerator<> itself disposable could work, and simply set and bury a specific exception on the inner tasks, thereby triggering any inner disposables. I'll add this to the repo. Thanks.

Andrew-Hanlon on 10 Jul 2017

Shouldn't it be IDIsposableAsync? 😄

paulomorgado on 10 Jul 2017

😄2

Just wanted to add my vote for enabling await within linq queries. In the proposal, there's a statement: "and this isn't a particularly high-value thing to invest in right now", but I disagree. It would make real-life interaction with APIs sooo much more elegant. There are many situations where you need to make several async API calls, each referring to results from previous API calls. Stringing them all together with nested select or let clauses in a single linq query would make that sort of code much more pleasant.

gzak on 23 Oct 2017

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Static String.Equals() should never be allowed to silently revert to Object.Equals()

johndog · 3Comments

Question : How can I update wiki page

codingonHP · 3Comments

Inline temporary variable makes function return wrong value

nlwolf · 3Comments

More information about CodeActions (needs UI, can add Documents)

ashmind · 3Comments

Interactive Window should redirect standard input

glennblock · 3Comments

Roslyn: Proposal: language support for async sequences

Most helpful comment

All 279 comments

The Dispose question

The CancellationToken question

Design options

Recap of async streams

0. Async stream scenarios

1. Language: consumption of async streams

2. Language: production of async streams

3. Async stream pattern

4. Library: .NET implementation of IAsyncEnumerable

5. Library: LINQ operators on IAsyncEnumerable

6. Library: IObservable

C# Language Design Meeting, 2016.09.07 - async streams

Quick summary and tentative course of direction

Should we have GetAsyncEnumerator() or GetAsyncEnumeratorAsync()?

Can we avoid the double-interface-dispatch of the current MoveNext/Current pattern?

Should IAsyncEnumerable inherit from IEnumerable

Non-async iterator methods?

Disposable vs Enumerator

Cancellation

IAsyncDisposable

Sync and Async together

Syntax

LINQ

Expression trees

Async Streams - use-cases

Cancellation granularity

Cancellation token optional?

What about a syntax specifically for cancellation?

What are the boundaries of a buffered fetch?

GetAsyncEnumerator vs GetAsyncEnumeratorAsync.

A type that supports both synchronous and asynchronous iteration?

IAsyncDisposable

Actor scenario

Expression trees

.NET Framework APIs

Where will the types go?

Covariance

Pattern: familiar vs efficient

Cancellation and ConfigureAwait

Home of IAsyncEnumerable

Related issues

4. Library: .NET implementation of `IAsyncEnumerable`

5. Library: LINQ operators on `IAsyncEnumerable`

6. Library: `IObservable`