Both C# and VB have support for iterator methods and for async methods, but no support for a method that is both an iterator and async. Its return type would presumably be something like IObservable<T>
or IAsyncEnumerable<T>
(which would be like like IEnumerable
but with a MoveNext
returning Task<bool>
).
This issue is a placeholder for a feature that needs design work.
I'd personally would love to see support for both IObservable<T>
and IAsyncEnumerable<T>
. The former interface at least already exists in the BCL and there is fantastic support for it in Rx. The latter interface has already been defined in the sister-project Ix (and under System.Collections.Generic no less) so would this language feature involve taking a dependency on that project or duplicating that interface in the BCL?
Ultimately switching between the two would be pretty easy (and already supported in Rx/Ix), but from a sequence producer point of view they would behave a little differently in that yield return
for IObservable<T>
would likely continue executing immediately whereas for IAsyncEnumerable<T>
it would wait until the next invocation of MoveNext()
.
Also, if considering support for IObservable<T>
you might want to consider requiring that the generator method accept a CancellationToken
which would indicate when the subscriber has unsubscribed.
From the consumer point of view they should probably behave the same way. Observable.ForEach
allows the action to execute concurrently and I think that it would probably be pretty unintuitive to allow the foreach
body to have multiple concurrent threads (assuming that they're not being dispatched through a SynchronizationContext
). If the implementation is similar to how await
works whatever intermediary (SequenceAwaiter
, etc.) could handle the details of buffering the results from an IObservable<T>
or an extension method could just turn it into an IAsyncEnumerable<T>
.
@HaloFour Observable.Create
already provides an optimal implementation of this that language extensions wouldn't add any value to.
IAsyncEnumerable
, however, has no optimal way to generate a sequence other than implementing the interface manually. It's fairly easy to make something that emulates yield return
but it is super inefficient so this is badly needed.
I don't disagree. Rx is awesome like that. I advocate for it mostly to bring Rx closer to the BCL so that people are more aware that it exists, and also because those core interfaces are at least a part of the BCL whereas IAsyncEnumerable<T>
is brand new to the BCL (and duplicates Ix).
I'm not familiar with Ix, so I can't comment on any existing IAsyncEnumerable, but I would rather the team start fresh when thinking about async enumerables rather than try to build off IObservable. Rx was an interesting project, but it was designed mostly before async existed and then later tried to bolt the two concepts together with varying success. Present-day Rx also has a very cluttered API surface area with poor documentation all around.
async/await enables code that looks almost identical to synchronous code- I'd like to be able to work with asynchronous sequences as effortlessly as you can work with IEnumerable today. I've definitely wanted to mix yield return
and async/await
before so this is a feature that would be very helpful.
Indeed, there is a lot of duplication between the two because they were developed independently and Rx never had the resources that BCL/PFX had. I also don't think that Rx/Ix could be merged into the BCL as is.
The Ix IAsyncEnumerable<T>
interface is exactly as described here, basically identical to IEnumerable<T>
except that MoveNext()
returns Task<bool>
. As mentioned the big difference between something like IObservable<T>
and IAsyncEnumerable<T>
is that the latter is still a pull-model as the generator really couldn't continue until the consumer called MoveNext()
again. In my opinion this would make it less suitable for certain concurrent processing scenarios since the producer code isn't running between each iteration. An IObservable<T>
async iterator could continue executing immediately after yielding a value.
In my opinion supporting both would be worthwhile. The compiler could generate different state machines depending on the return type of the async iterator.
I've been wishing for this feature ever since C# 5 came out. Being able to write something like yield return await FooAsync()
would be very useful; currently when I have an async method that returns a collection, I just return a Task<IReadOnlyCollection<T>>
, because implementing lazyness has too much overhead.
I noticed that Roslyn already has an IAsyncEnumerable<T>
interface here. That's pretty much the design I had in mind, although I had forgotten about cancellation. To make it really useful, we would also need an async version of foreach
(including a way to pass a CancellationToken
to MoveNextAsync
).
@thomaslevesque, the Roslyn link is 404.
@thomaslevesque, the Roslyn link is 404.
Uh... looks like it's no longer there. A search for IAsyncEnumerable
returns nothing (the name only appears in a comment). Perhaps it was moved and renamed to something else, or it was just removed.
Entity Framework uses the IAsyncEnumerable pattern to enable async database queries. In EF6 we had our own version of the interface, but in EF7 we have taken a dependency on IX-Async.
@anpete Seems to me that if async streams depends specifically on a new BCL IAsyncEnumerable<T>
interface that not only will it not be a very usable feature until more projects more to the newer frameworks but there will also be a lot of confusion between the different-yet-identical interfaces that already exist.
Perhaps the compiler could support the different interfaces by convention, or have an easy way to unify them through a common utility extension method. But if, for whatever reason, they need to be converted back to their proper specific interface that would still pose problems.
I believe quite strongly that not at least trying to integrate the BCL and the Roslyn languages with Rx/Ix is a massive wasted opportunity.
Just to provide some additional background, this can already be done in F# (because F# "computation expressions", which is a mechanism behind both iterators and asyncs is flexible enough). So, the C# design might learn something useful from the F# approach to this. See:
Probably the most interesting consideration here is what is the programming model:
You can convert between the two, but going from Rx to AsyncSeq is tricky (you can either drop values when the caller is not accepting them, or cache values and produce them later).
The thing that makes AsyncSeq nicer from sequential programming perspective (i.e. when you write statement-based method) is that it works well with things like for
loops. Consider:
asyncSeq {
for x in someAsyncSeqSource do
do! Async.Sleep(1000)
processValue x }
Here, we wait 1 second before consuming the next value from someAsyncSeqSource
. This works nicely with the pull-mode (we just ask for the next value after 1 second waiting), but it would be really odd to do this based on Rx (are you going to start the loop body multiple times in parallel? or cache? or drop values?)
So, I think that if C# gets something like asynchronous sequences (mixing iterators and await), the pull-based design that is used by F# asyncSeq is a lot more sensible. Rx works much better when you use it through LINQ-style queries.
EDIT: (partly in reply to @HaloFour's comment below) - I think that it makes sense to support the async iterator syntax for IAsyncEnumerable<T>
(*), but not for IObservable<T>
, because you would end up with very odd behavior of foreach
containing await
on Task<T>
.
(*) As a side-note, I find IAsyncEnumerable<T>
quite odd because it lets you call MoveNext
repeatedly without waiting for the completion of the first - this is probably never desirable (and AsyncSeq<T>
in F# does not make that possible).
@tpetricek The difference in behavior between IAsyncEnumerable<T>
and IObservable<T>
is exactly why I think async iterators should support both, it gives the programmer the capacity to decide whether it's a push- or pull-model and abstracts the difference to the consumer. I think a lot of scenarios benefit from a push-model, such as launching a bunch of operations simultaneously and wanting to process the results as they are available.
Beyond that hopefully both interfaces will enjoy support of all of the common LINQ operators plus those operators that apply to asynchronous streams.
@tpetricek - The FSharp.Control.AsyncSeq documentation has been clarified to use the terminology "asynchronous pull", rather than just "pull", i.e. a pull operation that returns asynchronously, Async<T>
. I'll leave it to others to debate what exactly the difference is between an "asynchronous pull" and a "synchronous push" :)
It would be nice if the reading from async sequence had constant stack usage and simple associative operations like concatenation had decent performance no matter whether left- or right-associated. Eg. reading from IEnumerable
s constructed by following functions
static IEnumerable<int> LeftAssocEnum(int i)
{
var acc = Enumerable.Empty<int>();
while (i > 0)
{
acc = Enumerable.Concat(acc, new int[] { i });
i--;
}
return acc;
}
static IEnumerable<int> RightAssocEnum(int i)
{
var acc = Enumerable.Empty<int>();
while (i > 0)
{
acc = Enumerable.Concat(new int[] { i }, acc);
i--;
}
return acc;
}
causes StackOverflowException
for sufficiently large i
and both IEnumerable
s have quadratic complexity.
@radekm For your kind of usage (sequence is materialized, size is known in advance) you can already use List<int>
.
static IEnumerable<int> LeftAssocEnum(int i)
{
var acc = new List<int>(i);
while (i > 0)
{
acc.Add(i);
i--;
}
return acc;
}
Does your request mean that all possible implementations of IEnumerable<T>
(including immutable and lazy ones) should behave like List<T>
?
@vladd It was only a simple example, you can take for instance Fib()
static IEnumerable<BigInteger> Fib()
{
return Fib(BigInteger.Zero, BigInteger.One);
}
static IEnumerable<BigInteger> Fib(BigInteger a, BigInteger b)
{
yield return a;
foreach (var x in Fib(b, a + b))
{
yield return x;
}
}
which has the same problems. What I want is to compose complex asynchronous sequences from very simple and reusable parts. To do this the operators like concatenation must be efficient. Since I don't know how to do this in C# I'll give a few examples in Scala with scalaz-stream.
1) Recursion can be used to define streams:
def fib(a: BigInt = 0, b: BigInt = 1): Process[Nothing, BigInt] =
emit(a) ++ fib(b, a + b)
There is no risk of stack overflow and reading the first n items takes O(n) not O(n^2) (assuming that a + b
is computed in constant time which is not true).
Note: fib(b, a + b)
is passed by name so the above code terminates.
2) Even transformations of streams are easily composable:
process1.take[Int](5).filter(_ > 0) ++ process1.id
This applies the filter only to the first 5 integers of the stream. You can use it with operator |>
Process(1, 2, -3, -4, -5, -6, -7) |> (process1.take[Int](5).filter(_ > 0) ++ process1.id)
and it gives you 1, 2, -6, -7.
I think that it would be wise to have language parity with F# for supporting async pull sequences (e.g. IAsyncEnumerble<T>
, AsyncSeq<'T>
).
@tpetricek and @dsyme make very valid points here and the links are excellent and well worth reading as there appears to be confusion between when it is appropriate to use async pull vs IObservable<T>
.
That leads me on to making some comments about Rx and why I _dont_ think it needs any language support (right now).
IObservable<T>
is in the BCL. Fine, so people know about it.@thomaslevesque says "I've been wishing for this feature ever since C# 5 came out.".
It seems to me that his example is a great candidate for Rx (async, lazy and support for cancellation).
@HaloFour "Observable.ForEach" _shudder_.
Please don't use this method.
It needs to be removed.
It has no cancellation support, nor does it have any error handling/OnError
@LeeCampbell I'd largely be happy if the C# team did the same thing they did with await
and provided a pattern that could be used to describe anything as an asynchronous stream. Then Rx could easily support that pattern, probably through an extension method that would describe the correct back-pressure behavior.
I think that there is a massive amount of information for Rx out there, but if nobody knows to look it might as well not exist. I think that it needs the same kind of campaign from MS that LINQ and asynchrony received. _Some_ kind of inclusion in the languages pushes that point. I've been doing a lot of Java dev lately and it annoys me how much excitement there seems to be building around Rx that I don't see on the .NET side.
I am interested to see how you would see this work. I think the way you work with and AsyncEnum and the way you work with an IObservable sequence are quite different. The former you poll and pull from until complete and then you move on to the next statement.
IAsyncEnumerable<int> sequence = CreateAsynEnumSeq();
Output.WriteLine("Awaiting");
await sequence.ForEachAsync(Output.WriteLine);
Output.WriteLine("Done");
The later you set up a subscription providing call backs and then move on immediately. The callbacks for an Observable sequence are called at some future point in time.
IObservable<int> sequence = CreateObservableSeq();
Output.WriteLine("Subscribing");
sequence.Subscribe(Output.WriteLine, ()=>Output.WriteLine("Done"));
Output.WriteLine("Only Subscribed to, but not necessarily done.");
With this in mind, they (to me at least) are totally different things, so I am not sure why or how language support would help here. Would like to see a sample of your ideas. I can see some usefulness for language support of AsynEnum sequences, again, at least to get language parity with F#
@LeeCampbell
To give you an idea, I already currently have an extension method for IObservable<T>
called GetAsyncEnumerator
which returns my own IAsyncEnumerator<T>
implementation:
public IObservable<int> Range(int start, int count, int delay) {
return Observable.Create(async observer => {
for (int i = 0; i < count; i++) {
await Task.Delay(delay);
observer.OnNext(i + start);
}
});
}
public async Task TestRx() {
Random random = new Random();
IObservable<int> observable = Range(0, 20, 1000);
using (IAsyncEnumerator<int> enumerator = observable.GetAsyncEnumerator()) {
while (await enumerator.MoveNextAsync()) {
Console.WriteLine(enumerator.Current);
await Task.Delay(random.Next(0, 2000));
}
}
}
There are several overloads to GetAsyncEnumerator
depending on if/how you want to buffer the observed values. By default it creates an unbounded ConcurrentQueue
into which the observed values are collected and MoveNextAsync
polls for the next value available in that queue. The other three options are to use a bounded queue, to have IObservable<T>.OnNext
block until there is a corresponding call to MoveNextAsync
or to have IObservable<T>.OnNext
not block but have MoveNextAsync
return the latest available value, if there is one. There are also overloads that accept a CancellationToken
, of course, and IAsyncEnumerator<T>.Dispose
unsubscribes the observer.
I hope that kind of answers your question. It's early and I didn't get much sleep last night. Basically, I am treating the IObservable<T>
as an IAsyncEnumerable<T>
and bridging between the two isn't all that difficult. The big difference is that the observable can continue to emit values and not have to wait for someone to poll.
Guys who are interested in IObservable support -- can you describe the benefit integrating this into the language would bring?
@scalablecory
IObservable<T>
and marries the existing language concepts of asynchrony with LINQ beautifully.async
methods alone.To Devil's Advocate my own arguments:
GetAwaiter
so we're stuck waiting on a BCL change anyway.IAsyncEnumerable<T>
, despite being a pretty massive duplication of effort. Rx already has, it would be silly to do it _again_.IAsyncEnumerable<T>
can wrap a "push" notification source. I'm already doing it.Now, given the _probability_ of Devil's Advocate point 1, some streaming analog to GetAwaiter
, support for IObservable<T>
from the consuming side could be accomplished by extension methods within Rx, and I'd be perfectly happy with that.
Now, for my arguments from the generating side, I'd like to revisit my use case of dispatching a bunch of asynchronous operations. This is something that the current project I work on does incredibly frequently, basically _n+1_ operations against web services where the first response provides a bunch of IDs that then need to be resolved individually*. If async streams return IAsyncEnumerable<T>
where the coroutine isn't continued until the consumer asks for the next value then you don't really have the facility to perform the operations in parallel.
public async IAsyncEnumerable<User> QueryUsers(int organizationId, CancellationToken ct) {
Organization organization = await ws.GetOrganization(organizationId, ct);
foreach (int userId in organization.UserIds) {
User user = await ws.GetUser(userId);
yield return user; // can't continue until consumer calls IAsyncEnumerator.MoveNext
}
}
Granted, there could be BCL methods to make this a little easier, but it feels like something that can be supported out of the box:
public async IObservable<User> QueryUsers(int organizationId, CancellationToken ct) {
Organization organization = await ws.GetOrganization(organizationId, ct);
foreach (int userId in organization.UserIds) {
User user = await ws.GetUser(userId);
yield return user; // Effectively the same as calling IObserver.OnNext(user)
}
}
@HaloFour Just like you can currently decide whether to process IEnumerable<T>
in series (foreach
) or in parallel (Parallel.ForEach()
), there could be a similar distinction for IAsyncEnumerable<T>
; you don't need IObservable<T>
for that.
The problem I have with IObservable<T>
is that it's pretty much impossible to process it in series without either blocking the producer or using some kind of buffer.
@HaloFour Let me rephrase my question.
Putting aside the "push" vs "pull" or "Rx" vs "IAsyncEnumerable" debate, for this proposal to gain weight, it needs to show solid benefits for _language integration_. These benefits have not yet been shown for the Rx side of things.
My two cents is that language integration wouldn't provide a significant benefit over Observable.Create. Its sole purpose would be to provide feature parity with "yield return", which I don't think is a good reason. Bringing popularity to Rx is also not a good reason.
IAsyncEnumerable, on the other hand, has no "Create" method. You can make one, but it's horribly inefficient. It is however possible to implement efficienctly in straight IL, so there'd be a huge benefit to language integration and having the compiler generate the complex state machine around it.
If the argument devolves into simply that one of push or pull is better than the other, and thus we should forget about the inferior one and not bother integrating with it, I think that's pretty short sited. Both models do things that the other is simply unable to _efficiently_ accomplish.
(Also, Ix-Async already implements IAsyncEnumerable with all that good LINQ integration if you want to check it out)
Well put @scalablecory.
@scalablecory
Sorry if it comes across that I'm having a debate, or that it's a either/or proposition. I don't think that "push" is better than "pull" or vice versa, only that there are use cases for both and that it would be nice to support multiple forms of "streams" within async
methods. I think that achieving feature parity with F# would also be a very good thing.
My two cents is that language integration wouldn't provide a significant benefit over Observable.Create.
You're right. I think that it would be _nice_ to have yield
support for IObservable<T>
, but Observable.Create
is already very easy to use.
IAsyncEnumerable, on the other hand, has no "Create" method.
Sure it does, Ix-Async offers AsyncEnumerable.Create
as well as a few other helper methods. Plus you can already convert between IObservable<T>
and IAsyncEnumerable<T>
. Of course if we're talking about a _third_ Microsoft-provided IAsyncEnumerable<T>
then no, nobody has written those methods yet.
What I'm really interested in hearing is any preliminary ideas regarding how these streams would be consumed. What a hypothetical foreach
would look like with async streams. If it's based on a fairly-loosely defined pattern like GetAwaiter
then support for IAsyncEnumerable<T>
and IObservable<T>
should both be quite easy to provide without the compiler having to know about either or any of those interfaces. To me, that's the ideal solution.
@HaloFour I believe you're mistaking EnumerableEx.Create for AsyncEnumerable.Create. There is nothing in Ix that provides "yield return" semantics for async sequence creation.
@scalablecory You're right, I am thinking of EnumerableEx
. Doesn't seem like too far of a stretch to get an AsyncEnumerable.Create
to work with pretty much the same syntax, though:
IAsyncEnumerable<int> ae = AsyncEnumerable.Create(async yield => {
for (int i = 0; i < 10; i++) {
await Task.Delay(1000);
await yield.Return(i);
}
});
I created an implementation of Async Pulll Sequences a while back, with LINQ support.
The syntax for an AsyncEnumerable was quite similar to the proposed one:
AsyncEnumerable.Create<T>(async producer =>
{
await producer.Yield(value).ConfigureAwait(false);
});
As a proof of concept (and also to get some insight into Roslyn), I started implementing async iterators here. On this branch, the compiler is able to compile this file (called AsyncIterators.cs)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace AsyncIterators
{
public class EntryPoint
{
public static void Main(string[] args)
{
var enumerator = new EntryPoint()
.AsyncIterator()
.GetEnumerator();
while (enumerator.MoveNext().Result)
{
Console.WriteLine(enumerator.Current);
}
Console.ReadLine();
}
public IAsyncEnumerable<int> AsyncIterator()
{
for (var i = 0; i < int.MaxValue; i++)
{
await Task.Delay(500);
yield return i;
}
}
}
}
Call the Roslyn compiler like this:
csc "Your\Path\To\AsyncIterators.cs" /reference:"Your\Path\To\System.Interactive.Async.dll"
Get System.Interactive.Async.dll through the corresponding Nuget package.
The compiled .exe should output an incrementing number twice a second (as seen in the code).
The modification was pretty straightforward, there are two rewritings involved: First, the iterator is rewritten into a state machine (just like for synchronous iterators), but instead of IEnumerable
and IEnumerator
, IAsyncEnumerable
and IAsyncEnumerator
are implemented. The implementation of the async MoveNext method (that returns a Task<bool>) is subsequently rewritten into a state machine (just like for async methods). The resulting IL will therefore have two nested state machines.
I also tested enlosing the body of AsyncIterator
by using/try-finally blocks, this also works.
Note that is is essentially only a proof of concept: In this commit, I point out the first language design issue that occured to me. Also, it is unclear how code in async iterators should access the CancellationToken
that is passed to MoveNext
.
Let me hear what you think.
@danielcweber Great job!
@danielcweber
Awesome job. I like how you reused Ix's IAsyncEnumerable
. I wonder if Roslyn would eventually also support that interface as well as a BCL-provided one.
Also, it is unclear how code in async iterators should access the CancellationToken that is passed to MoveNext.
I had been wondering that myself. One potential solution might be to have the iterator method accept (require?) a CancellationToken
argument and the generated state machine would effectively "merge" the two tokens together into one per every invocation of MoveNext
? That might be too much voodoo and it doesn't help if the iterator doesn't accept a CancellationToken
argument.
I'd almost like a syntax sort of like this:
IAsyncEnumerable<int> GetSequence() with token
{
if(token.IsCancellationRequested)
But that may be too extreme of a change...
@HaloFour what's the BCL-provided interface for async iterators?
@scalablecory I also thought about having a variable (eg. named "ct") implicitly in scope, like "this" is also implicitly in scope. Of course, you may not have a parameter with that name but I guess that would be tolerable.
@danielcweber
There isn't one. I am under the assumption that the feature would depend upon such an interface being added to the BCL as to not require a project to take a hard dependency on Ix-Async. Even if the feature is convention-driven and would work with Ix-Async I would expect an identical-looking interface to be added to the BCL.
I believe the only truly systematic and consistent approach is to have the MoveNext operation on the AsyncIterator accept the Cancellation token.
That is, the natural and systematic translation to make any method M asynchronous is as follows: "any operation M
generating result R
translates to an async method M
taking a cancellation token CT
and giving result Task<R>
".
Altering the scope and flow of cancellation tokens to be different to this tends to be like fiddling with assembly code and registers - things that look sensible come back to bite you later. It's possible passing the token to GetEnumerator will work but my guess is it will have problems in some cases.
Note that F# async avoids this problem by hiding cancellation tokens (they are implicitly threaded them through the asynchronous computation structure - you only supply a cancellation token when starting an overall composite Async). In F# code you generally don't have to pass cancellation tokens explicitly at all. That simplification was dropped in the C# version of the feature. Anyway, we should get Tomas Petricek's to add this to his summary of differences between the two models.
Cheers
Don
I believe the only truly systematic and consistent approach is to have the MoveNext operation on the AsyncIterator accept the Cancellation token.
Of course, and that is to be expected. The question is how to expose that token in the iterator.
That is, the natural and systematic translation to make any method M asynchronous is as follows: "any operation M generating result R translates to an async method M taking a cancellation token CT and giving result Task
".
But these sequences themselves also represent an asynchronous operation which can potentially be cancelled as a whole, such as a single HTTP call which is returning a stream of data asynchronously which is parsed into a sequence of elements. You'd need a reliable mechanism to convey cancellation to that process which may need to be separate from cancelling the current iteration. This gets even hairier if you want to distinguish cancelling the attempt to move to the next element from cancelling the sequence.
Altering the scope and flow of cancellation tokens to be different to this tends to be like fiddling with assembly code and registers
Isn't that describing the coroutine shenanigans done with iterator and asynchronous state machines in general? :smile:
I think that ultimately there are probably two options. Provide a keyword that will access the current cancellation token, or add the concept of a current thread-local cancellation token to the BCL.
@HaloFour, when would you need to cancel MoveNext without canceling whte whole iteration and vice versa?
@paulomorgado Probably not. At best canceling the MoveNext
operation would leave the sequence in an indeterminate state. But I do think that we'll want a simple cancellation mechanism that unifies the three that exist in the async sequence pattern:
CancellationToken
passed to the async iterator function.CancellationToken
passed to the IAsyncEnumerator.MoveNext
method.IDisposable
interface implemented by IAsyncEnumerator
.I think that having the MoveNext
method accept a CancellationToken
makes things really hairy. There is no existing syntax that would allow for the iterator to accept that token. If that iterator is based on multiple operations already in flight even if you could obtain that token it couldn't really be used to affect those operations. Any existing foreach
syntax lacks the notion of passing an argument to the MoveNext
function. All of these problems would need to be addressed and I fear the syntax that would arise as the answer.
I'm thinking that maybe we keep it simple and that cancellation via CancellationToken
is optional and only available if the async iterator explicitly accepts one as an argument. When that token is cancelled the entire sequence is then cancelled.
@HaloFour: By "the async iterator explicitly accepts one as an argument" you mean the method that returns IAsyncEnumerable<T>
and contains yield return
statements having a parameter of type CancellationToken
? Or do you mean that the GetEnumerator
method of IAsyncEnumerable
should take a CancellationToken
?
We have to take in account that there are two sides to an enumerable: the producer and the consumer.
Although most developers will be consumers, ease of production will benefit all. So, language support for creating asynchronous enumerators will have it's own value.
We also have to take in account that some types might be both synchronous and asynchornous enumerables. I don't think the existing foreach
keyword can be reused. At least. not alone.
Maybe the best compromise will be something like this:
foreach async(var item in collection, cancellationToken)
{
...
}
But it gets a lot more trickier if we want to use LINQ operatores. Should the pattern be augmented to pass around a cancellation token.
Or should we see how obervables/qbservables fit in all this?
@danielcweber The method that uses yield return
, etc. That's likely the only method the consumer will be calling directly and passing arbitrary arguments.
@paulomorgado Rescanning the thread it doesn't appear that any modifications to foreach
to consume async sequences has been discussed yet. I do expect that some syntax changes would be required to make that happen and maybe there is room to fit in a CancellationToken
as you describe. But you'd also need syntax in the producer to accept that syntax. That could involve syntax changes to yield return
although that could only provide a value for the second iteration and onward.
Observables do this very differently. Once you subscribe you don't ask for the next values, they just come to you. Cancellation can be implied by unsubscribing, which is done by disposing of the subscription. Rx provides a helper class CancellationDisposable
which can trigger a CancellationToken
when disposed which can allow the producer to react to being unsubscribed.
@HaloFour The sequence is cold at the point where you call the generator method. This is not an appropriate place to pass a cancellation token.
GetEnumerator() could work, but it's not exactly in line with existing practice. Right now, just about everywhere in .NET, you pass in a token to the method that does the actual work, not to some encompassing factory instance.
MoveNext() is the most in line with existing practice -- both in the Ix implementation, as well as with streaming sources like DbDataReader and Entity Framework.
@scalablecory I know, it's just the only place where the consumer would normally explicitly pass any arguments and the only place where iterator methods can accept arguments. Anything beyond that is going to require some fun syntax candy for both the consumer and the producer.
The consumer is probably relatively simple. We'll probably end up with something similar to what @paulomorgado suggested.
For the producer, the only thing that really makes sense to me is a new context sensitive keyword or expression that would allow access to the CancellationToken
parameter of MoveNext
. However, where I grapple a little with this is how that might behave if the iterator method also accepted a CancellationToken
parameter, or if the iterator method is a cold enumeration over a hot sequence.
As async streams are now being brainstormed in #5383, I took the occasion and rebased my proof of concept of async streams found here. It still works for this simple example (and probably for complexer ones, too).
Interesting idea.
(I'm assuming discussion on async streams properly belongs here rather than on the language design meeting notes.)
IAsyncEnumerable<T>
which just doesn't flow the context, for any arbitrary IAsyncEnumerable<T>
... but the code in the iterator block would need to get at the desired cancellation token.It feels to me like the GetEnumerator
method should be passed the cancellation token - because I _would_ expect a single token while iterating over the whole sequence.
One extra note: should IAsyncEnumerator<T>
implement IAsyncDisposable
as well? There was a brief mention in the language notes, but it didn't crop up in the example and I haven't seen it mentioned here. This may be quite complicated when it comes to cancellation, as you may need _two_ cancellation tokens - if an iteration operation times out, you still want to get a shot at disposing of the sequence, on the other hand you might want an "overall" timeout. Here be dragons, I suspect.
Wouldn't cancellation tokens simply behave as they always have?
async IAsyncEnumerable<int> SlowNumbersAsync(int from, int to, CancellationToken token)
{
for (var i = from; i <= to && !token.IsCancellationRequested; i++)
{
await Task.Delay(100, token);
yield return i;
}
}
foreach await (var item in SlowNumberAsync(100, 200, token))
A system for configuring arbitrary tokens would be great but I don't think it's necessarily specific to foreach await
.
Well, the question is whether the IAsyncEnumerable itself knows the cancellation token, or whether it's _each time you iterate_ that knows the cancellation token. Would it make sense to have an IAsyncEnumerable<string>
which (lazily) fetched stock tickers from a web service, and which could be reused multiple times, with a different continuation token each time you iterate over it? Maybe, maybe not. The fact that there are three steps involved (creating the sequence, creating the iterator, and then calling MoveNext - multiple times - leads to lots of choices...
Looking at it a few different ways here:
If we go the GetEnumerator()
route, it breaks convention -- you're no longer passing the token to the method doing the work. But, it also presents the most efficient operation in that you won't have to create any proxy cancellation tokens.
If we go the MoveNext()
route, it keeps convention, but will either be confusing to use or inefficient to implement. Consider my previous suggestion:
IAsyncEnumerable<int> GetSequence() with token
{
if(token.IsCancellationRequested)
Here, the GetEnumerator()
route is will ensure token
never changes, as one would normally expect. A naive MoveNext(CancellationToken)
sugar will change the meaning of token
after every yield return
-- efficient, but clearly confusing. A more anchored implementation will need to wrap it in a proxy to ensure token
doesn't change:
CancellationTokenSource proxy;
MoveNext(CancellationToken token)
{
using(token.Register(()=>proxy.Cancel()))
{
// user's code.
}
}
Which is clearly not very efficient if we consider this overhead for every item.
Then there are the syntax issues with potentially dealing with tokens at those three places.
From the iterator side you'd obviously need a way to get at those tokens. I don't really see a great way to accomplish this without some wacky syntax or voodoo.
Probably the most voodoo-y route would be to have the generated iterator class automatically link the three potential CancellationToken
s into a single token replacing the value of the token passed as a parameter to the iterator method (by overwriting the field in the enumerator during MoveNextAsync
). I don't know what overhead that might entail, perhaps it would require decorating a CancellationToken
parameter with an attribute.
From the consumer side I think the least messy option if you wanted to affect cancellation of an existing IAsyncEnumerable<T>
(you're not calling the iterator method directly) would be a ConfigureCancellation
method wrapping that instance of IAsyncEnumerable<T>
passing the specified CancellationToken
to GetEnumerator
and then wrapping that IAsyncEnumerator<T>
also passing the token to MoveNextAsync
. That would remain compatible with a foreach await
syntax. If the consumer wanted more control than that they could always just call GetEnumerator
and/or MoveNextAsync
themselves and pass arbitrary tokens to either.
foreach only works with enumerables and must call GetEnumerator()
on it to get the enumerator.
But what if this was legal?
foreach(var i in Range(0,10).GetEnumerator()) WriteLine(i);
Then we could have this for non-cancellable async enumerators:
foreach(async var i in AsyncRange(0,10)) WriteLine(i);
And this for cancellable async enumerators:
foreach(async var i in AsyncRange(0,10).GetAsyncEnumerator(ct)) WriteLine(i);
For a cancellable MoveNextAsync
one would have to write the whole code. But I expect these cases to be less frequent that having a unique cancellation token for the enumerator.
And nothing prevents the enumerable itself to from being cancellable:
AsyncRange(0,10, ct)
@jskeet I understand your concern now and completely agree. An interesting interaction here is that IAsyncEnumerable.Dispose
is actually an implicit cancellation token. How does that (if at all) interact with a CancellationToken
explicitly provided to the method?
@jcdickinson Goodness knows! But I think both of these features really need to be considered closely together. I can easily imagine that a solution to one may cause issues for the other.
IMO any discussion here should also consider ValueTask, as this is perhaps the type of scenario that _might_ have bunches of "I have stuff without needing to actually be async right now" _and_ have distinct return values, making pre-cached completed tasks not viable.
If IObservable<T>
ever gets language support I think that would be great for events to be able to return an IObservable<(object Sender, EventArgs Args)>
, just like F#. In defence of that, a language construct would be a lot better instead of a Subscribe
with a bunch of lambda expressions IMO.
As for async foreach
and such I'd like to suggest fork
loops which act like fork-join model.
For example,
// waits in each iteration
async Task SaveAllAsync(Foo[] data) {
foreach(var item in data) {
await item.SaveAsync();
}
}
// same as above
async Task SaveAllAsync(Foo[] data) {
for(int i = 0; i < data.Length; ++i) {
await data[i].SaveAsync();
}
}
// this won't wait and just run in parallel
async Task SaveAllAsync(Foo[] data) {
fork(var foo in data) {
await foo.SaveAsync();
} // join
}
// same as above
async Task SaveAllAsync(Foo[] data) {
fork(var i = 0; i < data.Length; ++i) {
await data[i].SaveAsync();
} // join
}
With async sequences, one might use yield return
inside the loop to return a push-based collection,
async IObservable<Result> ProcessAllAsync(Foo[] data) {
fork(var foo in data) {
yield return await foo.ProcessAsync()
}
}
In contrast, if you use a regular foreach
it will return an IAsyncEnumerable
,
async IAsyncEnumerable<Result> ProcessAllAsync(Foo[] data) {
foreach(var foo in data) {
yield return await foo.ProcessAsync()
}
}
Also it should be able to be used to iterate both IObservable<T>
and IAsyncEnumerable<T>
collections.
@HaloFour I'd much rather see Parallel.*
extended for TPL than add a fork
keyword. Language integration gives us nothing here.
@scalablecory Wrong mention. :smile:
@alrz I like the idea. If async
methods could return observable sequences I think it would be pretty useful. However, I also worry that it might be a little _too_ easy and tempt people into using it without understanding the ramifications of having multiple threads silently spawned within their iterative-looking code. All of a sudden all of the nuances of synchronization and thread-safety applies to locals (yes, I am aware that this is true today with closures, but that does involve a little more opt-in.) As a new keyword it could be possible that limitations would be imposed such that variables declared outside of the block could not be modified, but that would differ from any other semantics in C#.
@HaloFour "having multiple threads silently spawned within their iterative-looking code." There can be no thread actually, it depends on the implementation of the awaited method but the point is that it doesn't wait for each await
in each iteration, you can think of it as a list of running tasks and a Task.WhenAll
but a little nicer, it provides language support for processing tasks as they compete which is a common pattern, I guess.
As for synchronization, we can wait for all the tasks in parallel but process them in order, meaning that a completed task might be waiting for another iteration to get to an await
or end of the loop block, this makes sense because all of them will join at the end of the loop. Similarly, it doesn't "move next" until it gets to an await
. But not allowing modifications might be limiting and eventually not really sufficient, one might say in a lock
block once again you _can_ modify variables and so on.
PS: On the second thought (after opening the issue actually) I just convinced that language support doesn't really buy much,
fork(var item in new[] { 2, 3, 1 }) {
await Task.Delay(item*1000);
Console.WriteLine(item);
}
// easy peasy
await Task.WhenAll(new[] { 2, 3, 1 }.Select(async item => {
await Task.Delay(item*1000);
Console.WriteLine(item);
}));
and also it would be worse because the loop hides a task and doesn't support cancellation etc.
@alrz If it's akin to Task.WhenAll
then the default (and expected) implementation would be that more than one of those operations are executing concurrently, so there would definitely be threads involved. Without that there is absolutely no difference between fork
and await foreach
.
@HaloFour However, my example satisfies the first rule of fork
, "it doesn't "move next" until it gets to an await
" but not the second, "a completed task might be waiting for another iteration to get to an await
or end of the loop" so synchronization is not guaranteed.
The problem with await foreach
or await using
(#114) is that await
applies to an expression — the Task
, which mustn't be ignored. with these constructs there is no reasonable way to have ConfigureAwait
, CancellationToken
, etc. And that's ok because these are not related to the language, await
is not specific to Task
, so tying foreach
to these types is not a good idea IMO.
@alrz await
expressions do not have to be Task
s, they can be anything that correctly follows the awaitable convention. await
has no concept of cancellation tokens and ConfigureAwait
just happens to be an instance method of Task
. For example, await Task.Yield()
is legal, but Task.Yield()
returns a YieldAwaiter
, not a Task
, and there is no way to either cancel nor configure the return of that method.
Note that I do think that both cancellation and configuration are useful. I think that offering extension methods off of IAsyncEnumerable<T>
should satisfy the requirement:
IAsyncEnumerable<int> sequence = GetSequenceAsync(1, 10, CancellationToken.None);
CancellationTokenSource cts = new CancellationTokenSource();
cts.CancelAfter(500);
foreach (var number in sequence.ConfigureAwait(false).WithCancellationToken(cts.Token)) {
Console.WriteLine(number);
}
@HaloFour Yeah I know, exactly my point, anyway. As it turns out, Task.WhenAll
is not safe when you have shared state, that's where fork
can be helpful.
I think CoreFxLab's Channels are closely related to the subject.
@omariom Neat. I wonder how it differs from Rx and why there is an effort to seemingly duplicate that library. Can't be NIH, they invented it. @stephentoub ?
@HaloFour Rx is for push-based reactive streams. Channels are pull-based and more similar to TPL Dataflow blocks (though a bit lower on the abstraction stack and less opinionated)
I put together a proposal for async iterators here:
https://github.com/ljw1004/roslyn/blob/features/async-return/docs/specs/feature%20-%20async%20iterators.md
GetEnumerator
rather than MoveNextAsync
IAsyncEnumerable.ConfigureAwait / IAsyncEnumerator.ConfigureAwait
as suggested in LDM #5383IObservable<T>
or IAsyncActionWithProgress<T>
or IAsyncEnumerable<T>
or IAsyncEnumerator<T>
async
contextual keyword to refer to the current instance of an async method, similar to this
and base
. This mechanism is useful more generally for an async method or async iterator method to interact with its builder (equivalently, the object that it returns). Windows tasks like IAsyncAction
can use it to handle IAsyncAction.Cancel()
or IAsyncActionWithProgress.OnProgress()
. I believe that IObservable
would use it too, but need to learn more.I really like the new proposal. I think it handles the CancellationToken problem really nicely. Adding an example to the spec of how the async
contextual keyword would work for your standard Task-returning async method would be useful as well. Does this mean there would no longer be a need for endless overloads in our async methods that accept CancellationToken?
I prefer the foreach (await var x in asyncStream) { ... }
syntax, as it is the most consistent with how we await asynchronous tasks elsewhere currently. It feels weird to put the word async
in there when what is happening is an await
.
I think the iterator
modifier section needs expansion, as it is in the title, but the text doesn't actually ever mention an iterator
modifier. I strongly am in favor of adding an optional iterator
keyword that can be added to the method signatures of normal iterators and async iterators. People who don't want it don't have to use it, but those of us that do can set style rules so that an analyzer and quick fix can make all the methods consistent.
It is much more useful to have the method signature document what kind of method it is, rather than having to scan through the entire body to guess whether its an iterator or not. All that being said, this is a minor point that is probably separable from the rest of the proposal given that it is contentious.
Overall, really looking forward to see a prototype of this feature!
// EXAMPLE 7:
// async C f()
// {
// var ct = async.CancellationToken;
// await Task.Delay(1, ct);
// yield 1;
// }
// expands to this:
I think async.CancellationToken
could be problematic in that async
could be a variable or type in scope inside an async
method already. await.CancellationToken
seems available though...
@bbarry Thanks. Missed the comment there.
Is this going to use the same shape of IEnumerator<T>
i.e. MoveNextAsync
and Current
? Wouldn't it make sense to make it similar to what F# does for its AsyncSeq
?
type IAsyncEnumerator<'T> =
abstract MoveNext : unit -> Async<'T option>
inherit IDisposable
Related: #9932, http://blog.paranoidcoding.com/2014/08/19/rethinking-enumerable.html
@bbarry I suggested that the "async" contextual keyword is only a contextual keyword in async methods that return a tasklike other than void
and Task
and Task<T>
. There are no such methods today. In methods where it's a contextual keyword, then by definition it will never refer to a variable or type in scope, and there won't be a back-compat break. But you're right to bring up await
as another possibility. I guess a prototype could include both options to see which one people like.
@alrz I think Async<'T option>
would be a bad idea for C# because (1) option doesn't exist, (2) it would be prohibitively expensive, causing a heap allocation for every element of the sequence. By contrast Task<bool>
never requires a heap allocation for an already-completed task, since Task.FromResult(true)
and Task.FromResult(false)
are both pre-allocated static singletons in the BCL.
@ljw1004 Why not make contextual async
a possibility for existing Task-returning async methods as well? As I mentioned above, it is annoying code bloat to make CancellationToken overloads for every single async method. First class language support for supporting cancellation would be a huge improvement.
The backwards compat issue doesn't seem major. If you're foolish enough to name a type or local variable async
then the compiler can report an error, and you'll fix the error when you upgrade. The benefit seems worth the cost, IMO.
@MgSam could you spell precisely out how you see this mechanism being able to avoid CancellationToken overloads? I can't see it...
@ljw1004 I'm talking about two interface invocations that is required for every element. However, it could be something like this:
interface IAsyncEnumerator<T> {
Task<bool> TryGetNextAsync(out T value);
}
But since out
is actually ref
in CLR, the T
cannot be defined as covariant.
interface IAsyncEnumerator<out T> {
ITask<T?> GetNextAsync();
}
This one works as long as we could use ?
on unconstrained generics (#9932) and we have a covariant ITask<T>
interface.
Anyhow, since async
has its own overheads, these micro optimizations probably aren't much important.
I'd imagine you'd need a slightly different calling convention.
``` C#
async Task Foo()
{
await Task.Delay(1000);
if(async.CancellationToken.IsCancellationRequested) return;
await Task.Delay(1000);
}
var ct = default(CancellationToken);
await Foo(), ct;
```
@MgSam
What exactly do you propose that translate into? Have the compiler automatically generate overloads? Or automatically add optional parameters? Stuff the CancellationToken
into a thread-local? All of those options sound pretty nasty. I just add a default CancellationToken
parameter and not worry about overloads.
@arlz The interface used here is a "pattern" like how foreach
works elsewhere. You wouldn't have to actually implement IAsyncEnumerable<T>
in order to make it work (I think example 6 is confusing though).
It looks like this type would satisfy enough requirements to work inside a foreach:
class Foo
{
ConfiguredTaskAwaitable<bool> MoveNextAsync() { ... }
int Current {get { ... } }
}
used as:
async Task<int> SumAsync(Foo f)
{
int result = 0;
foreach(async var i in f)
{
result+=i;
}
return result;
}
compiling as if it was:
async Task<int> SumAsync(Foo f)
{
int result = 0;
{
try
{
while(await f.MoveNextAsync())
{
int i = f.Current;
result+=i;
}
}
finally
{
(f as IDisposable).Dispose();
}
}
return result;
}
(and either I am reading ex 6 wrong or it is disposing something it probably shouldn't)
@bbarry My intention is that async enumerators must all implement IDisposable
.
It sounds like you find that odd. Here's another alternative1) if you foreach over a _enumerable_, then the foreach statement has acquired a resource and must dispose of it; (2) if you foreach over an _enumerator_ then you're the one who acquired it and you're the one who must release it...
foreach (async var x in enumerable) { ... }
// becomes
using (var enumerator = enumerable.GetEnumerator())
foreach (async var x in enumerator) { ... }
foreach (async var x in enumerator) { ... }
// becomes
while (await f.MoveNextAsync()) {var x = f.Current; ... }
But actually that feels problematic also. It would mean that these two statements have very different disposal semantics:
foreach (var x in enumerable) { ... }
foreach (var x in enumerable.GetEnumerator(cts.Token)) { ... }
It would also mean that folks who write async enumerator methods (rather than async enumerable) would have to expect that realistically most callers would never end up disposing of them. They'd instead just foreach over them as is easiest.
I wonder if Dispose()
even has any role to play? Why would async enumerators have IDisposable
rather than some hypothetical IAsyncDisposable
? Maybe it would be cleaner to cut out the call to Dispose
entirely?
@HaloFour Generating an overload seems the most straightforward because it guarantees backwards compat but I haven't thought carefully about every edge case.
I also just add a default value for the CancellationToken, but very often people forget to do this, and once they've forgotten, it breaks the cancellation chain. It makes it impossible to cancel the method and any async methods that method itself calls, even if they do support cancellation. And, as you know, optional parameters are problematic in public interfaces.
Manually adding the cancellation parameter is also a ton of extra code bloat on every single async method (which often is most of them, nowadays).
Having it just be in context of every async method would both make it consistent with this new feature and allow methods to more strictly focus on business logic rather than retyping this plumbing over and over again. Combined with an analyzer that warns you if you forget to pass an existing cancellation token to any async method you call, and you'd get a big win in cancellation support over current state, where most code written doesn't use it.
I think it is odd that you are disposing something that you potentially don't own.
You cannot today foreach
over an IEnumerator
, only an IEnumerable
(and similar constructs). If you could I would think it should not be a compiler task to dispose of it as part of the foreach
statement. (passing IEnumerator
-like types around that are designed to be mutated seems smelly in the first place, but that is another point I suppose)
I think the pattern might be better if it were:
interface IAsyncEnumerable<T>
{
IAsyncEnumerator<T> GetEnumerator();
IAsyncEnumerable<T> ConfigureAwait(bool b);
IAsyncEnumerable<T> RegisterCancellationToken(CancellationToken cancel = default(CancellationToken));
}
and so
class Foo
{
Foo GetEnumerator() { return this; }
ConfiguredTaskAwaitable<bool> MoveNextAsync() { ... }
int Current {get { ... } }
}
with the same usage method compiling as if it was:
async Task<int> SumAsync(Foo f)
{
int result = 0;
{
Foo e = null;
try
{
e = f.GetEnumerator();
while(await e.MoveNextAsync())
{
int i = e.Current;
result+=i;
}
}
finally
{
(e as IDisposable).Dispose();
}
}
return result;
}
@bbarry @ljw1004
What about having extension methods for ConfigureAwait
and RegisterCancellationToken
rather than having the IAsyncEnumerable<T>
interface have to implement methods for either?
@HaloFour the IAsyncEnumerable<T>
interface doesn't actually have to be implemented at all (it actually can't exist per se in the general form used by the foreach
statement because the return type from GetEnumerator()
is the pattern IAsyncEnumerator<T>
, not any actual type on its own...). It exists for the purpose of the spec conversation here as the interface methods the state machine would generate in order to support this pattern.
@ljw1004 re the CancellationToken
passed to GetEnumerator()
instead of MoveNextAsync()
: This has always been my preference, however the following was actually one of the strong arguments for having it on MoveNextAsync()
when I discussed it years ago with the RX folks:
This feels weird. We've always passed in cancellation token at the granularity of an async method, and there's no reason to change that.
Copy & paste error or did I miss what you meant? :smile:
@divega I explained badly. Let me rephrase:
Some folks suggest that each individual call to MoveNextAsync
should have its own cancellation token. This feels weird. We've always passed in cancellation token at the granularity of an async method, and there's no reason to change that: from the perspective of users, the async method they see is the _async iterator method_.
It's also weird because on the consumer side, in the _async-foreach_ construct, there's no good place for the user to write a cancellation token that will go into each MoveNextAsync
: from the perspective of the users, they don't even see the existence of MoveNextAsync
.
Also: it would be weird to attempt to write an async iterator method where the cancellation-token that you want to listen to gets changed out under your feet every time you do a yield:
// Here I'm tring to write an async iterator that can handle a fresh CancellationToken
// from each MoveNextAsync...
async IAsyncEnumerable<int> f()
{
var cts = CancellationTokenSource.CreateLinkedTokenSource(async.CancellationToken);
var t1 = Task.Delay(100, cts.Token);
var t2 = Task.Delay(100, cts.Token);
await t1;
yield 1;
// ??? at this point can I even trust "cts" any more?
// or do I need to create a new linked token source for further work that I do now?
yield 2;
await t2;
// ?? is this even valid anymore, given that it's using an outdated cancellation token?
// If someone cancels that latest freshest token passed in the latest MoveNextAsync,
// how will that help cancel the existing outstanding tasks?
}
from the perspective of users, the async method they see is the async iterator method.
Ah, I get what you mean now. Worth clarifying that is from the perspective of foreach
consumers IMO because in the underlying methods things are the other way around, i.e. GetEnumerator()
is not async.
I agree with everything else although it seems to be a bit unfortunate that you have to call GetEnumerator(ct)
explicitly in async foreach
. Ideally users shouldn't need to know about GetEnumerator()
either, like they don't need to know when using non-async foreach
. Any chance we could get language sugar for passing the CancellationToken
? E.g.
c#
foreach(await var o in asyncCollection using ct)
BTW, foreach(await ...
looks good, but I suggest also considering await foreach
for the list of possible syntax.
Some folks suggest that each individual call to MoveNextAsync should have its own cancellation token. This feels weird. We've always passed in cancellation token at the granularity of an async method, and there's no reason to change that: from the perspective of users, the async method they see is the async iterator method.
@ljw1004 Remember, MoveNextAsync()
_is_ the async method here. In .NET, you pass the token to the thing doing the work, which is not GetEnumerator()
.
I think people are getting hung up here in that this is both a .NET add and a C# add. Those two things shouldn't compromise each-other. IAsyncEnumerable
needs to remain intuitive and consistent with the rest of .NET, because it won't always be used with C# sugar on top.
Look at the very similar DbDataReader
class.
Throwing my two cents in here but has this syntax been considered?
foreach(var item in await collection)
{
}
IMO it would make sense in C++ too:
foreach(auto item : await collection)
{
}
Without the await the code reads as: "For each item that has been retrieved from the collection's values"
With the await the code reads as: "For each item that has been retrieved from the collection by awaiting the collection for values"
Or alternatively, we have async and await -- why not introduce an awaitin keyword?
foreach(var item awaitin collection)
{
}
Re
c#
foreach(var item in await collection)
{
}
That is what you would write if collection was a Task<IEnumerable<T>>
(or similar awaitable type). The distinction that you may need to await for each element is important.
@bbarry Oh my! This IDisposable
stuff is crucial. Let me try to lay down step-by-step the issues and their consequences...
try/finally
blocks, you _certainly_ expect the finally
blocks to execute. Everything in the language+libraries should be geared to support this.IDisposable
(which is the only way that an object can indicate that it needs some final action to be run), and the foreach
must do a dispose.async foreach
construct is able to consume _enumerables_ then of course it calls GetEnumerator()
and then calls Dispose()
on the enumerator. Nothing new here.async foreach
construct is able to consume _enumerators_ then it must also call Dispose()
on the enumerator. This is unusual -- because foreach
doesn't look like using
, and it's strange that foreach
disposes of something it didn't acquire. But it's necessary. Because if we don't do it, then we'll push developers into a pit of failure where they typically fail to dispose of enumerators, and hence fail to execute the finally
blocks in async methods. For instance:// This is a common and easy idiom. We need to make sure it calls Dispose().
foreach (async var x in GetStream()) {
if (x == 15) break;
}
// It would be ugly boilerplate if the user instead was expected to manually wrap
// each foreach inside a using:
using (var enumerator = GetStream()) {
foreach (async var x in enumerator) {
if (x == 15) break;
}
}
There is one saving grace. The meaning of an enumerator is that it can be consumed exactly once. So if you async foreach
over it once, then no one can ever foreach over it again. Therefore it doesn't matter that you've disposed of it.
There are three places at which cancellation tokens might be communicated:
MoveNextAsync(token)
.yield
; and there's no way in the async foreach
construct to provide a different token after each one.GetStream(token)
if async iterators can return enumeratorsGetEnumerator(token)
if you have an enumerable in handI'd initially discounted option 3 because I thought that everyone who kicks off a stream deserves to be able to cancel the stream themselves; because it felt weird that you could cancel it once and then everyone who tries to get the enumeration in future will find it cancelled without really knowing why. It means that the enumerable isn't a true pure _factory_ of independent identical streams.
But actually, IEnumerable
s today aren't true factories either. Consider:
class Baby
{
private List<string> poops = new List<string> { "morning", "noon", "night" };
public void DoPoop(string s) => poops.Add(s);
public IEnumerable<string> GetPoops(string prefix) => poops.Where(p => p.StartsWith(prefix));
}
var b = new Baby();
var p = b.GetPoops("n");
foreach (var s in p) Console.WriteLine(s); // "noon, night"
p.DoPoop("now!!!");
foreach (var s in p) Console.WriteLine(s); // "noon, night, now!!!"
(My wife gave me a 30 minute break from baby-care duties so I could write up these notes...)
Anyway, I've kind of persuaded myself that it's might be tolerable to have cancellation done at the level of enumerable, not enumerator.
In the light of the two points above, let's talk through the design options again...
IAsyncEnumerator<T>
, not enumerables. The async foreach
construct can only consume enumerators, not enumerables.async foreach
has to dispose of the enumerator, which feels weird, but at least has the "saving grace" that the stream could in any case never be re-used.async foreach
construct can only consume enumerables; it does so by calling GetEnumerator()
and then disposing of the enumerator. There is no special support for cancellation, so folks have to cancel at the level of the _enumerable_ not the _enumerator_.async foreach
construct can symmetrically consume either, and calls Dispose
no matter which one it's given. Cancellation is done at the level of the _enumerator_ by leveraging the novel async
contextual keyword within async methods.async foreach
. That's what allows you to either consume foreach (async var x in GetStreamFactory())
or foreach (async var x in GetStreamFactory().GetEnumerator(token))
, and not have to think about it.async
contextual keyword is useful for other things such as IAsyncAction
. I conjecture it might also be needed for IObservable
but don't yet know.async foreach (enumerator)
disposes of the enumerator when it's done -- but at least like Option1 it still has the saving grace that the enumerator could never be re-used anyway so it doesn't matter. The main disadvantage is that the new async
contextual keyword is a heavyweight addition to the language.At the moment, I think Option2 has the best chance of getting into the language. I think I should prototype Option3 since it's a strict superset of the other options. That'll allow for some good experiments.
@ljw1004 sorry if I am being repetitive... Could we have option 2 but with a twist: that async foreach
adds optional syntax that accepts a CancellationToken
to be passed to the GeEnumerator()
call? Then a cancellation would only affect that particular enumerator but users wouldn't need to call GeEnumerator(cancellationToken)
themselves.
@divega why not do that in the library? Why not have a function
interface IAsyncEnumerable<T>
{
IAsyncEnumerable<T> WithCancellation(CancellationToken c);
}
with the meaning that every enumerator which is subsequently gotten from this new IAsyncEnumerable will end up using the cancellation token? In the typical idiom, async foreach (var x in f.WithCancellation(c))
, there'll only be a single enumerator gotten from it in any case...
Yeah sorry I did edit but not sure why it didn't get saved.
How about awaitin?
foreach(var item await in collection)
{
}
async, await and awaitin
@ljw1004
Anyway, I've kind of persuaded myself that it's might be tolerable to have cancellation done at the level of enumerable, not enumerator.
I think of cancellation like any other form of composition allowed in LINQ, like TakeWhile
, and in that case it does make sense for whatever method this is to return a composed IAsyncEnumerable<T>
. Then I could compose the cancellation into the sequence and hand it off to some other method to enumerate over without that method having to be aware and without that method needing an overload for IAsyncEnumerator<T>
.
I think that this design would also allow the CancellationToken
argument to be moved back to the MoveNextAsync
method which would make it more consistent with async method design. The standard enumeration emitted by C# with foreach
could pass CancellationToken.None
, still leaving room for exploring the potential of having some other way of providing that token in the language. The IAsyncEnumerator<T>
provided by the decorated IAsyncEnumerable<T>
could then replace that with the token provided in the WithCancellation
method, or merge them if they're both cancellable, or whatever.
@ljw1004
why not do that in the library? Why not have a function
That of course would work, but it may not be as nice as it could be. See, unlike with async iterators, we can already do a decent job for async foreach
completely in the libraries, e.g. for current incarnations of IAsyncEnumerable<T>
we have extension methods that go more or less like this:
``` C#
public static async Task ForEachAsync
this IAsyncEnumerable
Action
CancellationToken cancellationToken = default(CancellationToken))
{
CheckNotNull(source, nameof(source));
CheckNotNull(action, nameof(action));
using(var enumerator = source.GetEnumerator())
{
while (await enumerator.MoveNextAsync(cancellationToken)) {
action(enumerator.Current);
}
}
}
```
Besides the obvious extra method and delegate I happen to regard async foreach
integration in the language mostly as sugar that can make IAsyncEnumerable<T>
much nicer to consume. So if it is language sugar and if cancellation is such an important feature of async, why not take it the full way?
BTW, I am very sympathetic with any resistance to add arbitrary things to the language, e.g. adding a language feature that is aware of CancellationToken (although perhaps it could be just about passing any state to the GetEnumerator()
method?) is perhaps a leap we haven't taken before and also having to pick among new and existing keywords and patterns for this starts to sounds very expensive. But I just wonder if it is worth doing it in this case.
(although perhaps it could be just about passing any state to the
GetEnumerator()
method?)
Considering IEnumerable
to be an IEnumerator
factory (which 99% of the time in normal workflow code is only ever asked to create a single instance), I think it is more about enabling IEnumerable
to generate the IEnumerator
instance with the correct state.
Perhaps, :spaghetti: (pretend IEnumerable
/IEnumerator
didn't matter as an interface but as a pattern), it is really about some arbitrary parameter list to the .MoveNext
method:
foreach (var x in enumerable) (state, ...)
{
...
}
translates that parameter list to the MoveNext
method call (as a completely orthogonal feature to async foreach
)?
It would then be reasonably intuitive for a syntax:
var cancellationToken = ...;
await foreach(var x in asyncenumerable) (cancellationToken)
{
...
}
(you could reassign cancellationToken here if you so desired a new one for the next MoveNext
inside the block for the foreach
statement)
@HaloFour If we do move cancellation token into MoveNextAsync(...), could you try writing the body of an async enumerator or enumerable which takes advantage of it? and which handles the case where a different token gets passed in each time? I'd love to see a concrete example. I persuaded myself it can't be done.
@ljw1004
Admittedly that's a little tricky and there probably isn't a method that doesn't involve some degree of voodoo. You either need some kind of "ambient" token, like your async.CancellationToken
concept or something buried in the framework like a CancellationToken.Current
or you need something that could wire it up to any CancellationToken
defined as a parameter. You get this problem regardless of where the cancellation token comes from, though, unless the only place you can provide one is in the call to the async iterator directly.
The last option is very voodoo-y. 🍝 In short, the async iterator state machine would take any CancellationToken
parameter and instead of using it as is would create and manage its own CancellationTokenSource
and pass along that token to the iterator method. Then the state machine would wire up each token provided on MoveNextAsync
so that if it is cancelled or becomes cancelled that it cancels the token source which would in turn cancel the token in the iterator. I don't know if the overhead associated with something like that would be too excessive.
Yes, I think registering each token is the only way it could be done. Unfortunately the common path of CancellationTokenSource.InternalRegister
is non-trivial, not the kind of overhead I'd love to have on every loop iteration.
@ljw1004 @HaloFour personally I prefer the CancellationToken
on GetEnumerator()
for many reasons. One that comes to mind right now is that conceptually I want the use the CancellationToken
to cancel the whole enumeration, so calling MoveNextAsync()
after the enumeration got cancelled should be futile. If the CancellationToken
was instead in MoveNextAsync()
and I could pass a different one each time, then it seems that the cancellation doesn't affect the whole enumeration and I am free to retry calling MoveNextAsync()
with a different CancellationToken
after I got the first exception.
So please consider my suggestion to have optional syntax to pass the CancellationToken
in async foreach
on its own merits :smile:
BTW, the idea of having an optional parameter list for GetEnumerator()
alongside foreach()
doesn't seem completely crazy to me.
@divega
The experience would be the same regardless of whether the token was passed to GetEnumerator
or MoveNextAsync
. You'd either need some kind of change to the syntax of foreach
to accept that token, or you'd need a method (extension or otherwise) to decorate the IAsyncEnumerable<T>
with the token. In the case of foreach
syntax it could simply pass the same token on each iteration. If you _needed_ to pass a different token each time then you'd have to manually create and consume the IAsyncEnumerator<T>
.
From the async iterator it doesn't matter if the token comes from GetEnumerator
or MoveNextAsync
, it's going to result in some kind of voodoo in order to consume that token.
As such I'd much rather follow the already defined convention for async methods and have the CancellationToken
exist on MoveNextAsync
since it is the only actual async method.
I think conceptually
interface IAsyncEnumerable<T>
{
IAsyncEnumerator<T> GetEnumerator(CancellationToken token = default(CancellationToken));
};
Makes the most sense to me; however I'm not sure how if fits nicely with foreach
.
@ljw1004 exploring the WithCancellation
approach; were you thinking something like
interface IAsyncEnumerable<T>
{
IAsyncEnumerator<T> GetEnumerator(CancellationToken token = default(CancellationToken));
};
static class IAsyncEnumeratorExtensions<T>
{
static IAsyncEnumerator<T> WithCancellation(this IAsyncEnumerable<T> enumerable, CancellationToken cancellationToken)
{
return enumerable.GetEnumerator(cancellationToken);
}
}
Then something like this would work
foreach (var item await in collection)
{
...
}
// or
foreach (var item await in collection.WithCancellation(ct))
{
...
}
@HaloFour I disagree. For an async iterator method to consume a different cancellation token from each individual MoveNext requires an entirely new level of voodoo. Consider:
// Here I'm tring to write an async iterator that can handle a fresh CancellationToken
// from each MoveNextAsync...
async IAsyncEnumerable<int> f()
{
var cts = CancellationTokenSource.CreateLinkedTokenSource(async.CancellationToken);
var t1 = Task.Delay(100, cts.Token);
var t2 = Task.Delay(100, cts.Token);
await t1;
yield 1;
// ??? at this point can I even trust "cts" any more?
// or do I need to create a new linked token source for further work that I do now?
yield 2;
await t2;
// ?? is this even valid anymore, given that it's using an outdated cancellation token?
// If someone cancels that latest freshest token passed in the latest MoveNextAsync,
// how will that help cancel the existing outstanding tasks?
}
If you want a fresh cancellation token on each MoveNext you will have to tell me the semantics of how one should write an async iterator method that responds to cancellation correctly. You can assume we have "minor level voodoo" in the form of async.CancellationToken
which retrieves the most recent cancellation token that was passed to MoveNext. But I want you to build on top of that to explain how you'll actually author the async iterator method in a sane coherent way, building upon that minor level voodoo. I think it can't be done, as in my example above.
@ljw1004 That is a problem for C# to solve, not for IAsyncEnumerator
to solve. The solution should be one that doesn't result in one or the other feeling out of place.
@ljw1004
Not necessarily. The async iterator state machine could provide a single async.CancellationToken
that never changes, but internally it manages its own CancellationTokenSource
and links the tokens on each MoveNextAsync
. That way even if you grab async.CancellationToken
at the very beginning of the iterator into a local and reuse that local throughout the rest of the body the semantics would be as expected. The issue is, of course, the potential overhead. In the most common cases the CancellationToken
would be either not cancellable (CancellationToken.None
) or would be the same CancellationToken
. Both situations are easy to detect and linkage can simply be skipped.
@scalablecory
Exactly, I'm arguing this on the premise that the language should adhere to the conventions of the framework, not change the conventions to suit the needs of the language.
With the cancellation token given to GetEnumerator
the implementation of IAsyncEnumerator
can then just take the token in its .ctor
@benaadams
Yes, which nobody (very few) people will actually be writing. So you still end up with voodoo in actually consuming that token from within an iterator, and you're breaking the async method convention.
@benaadams You asked "WithCancellation approach; were you thinking something like...". No, what I was thinking was this:
interface IAsyncEnumerable<T>
{
IAsyncEnumerator<T> GetEnumerator();
IAsyncEnumerable<T> WithCancellation(CancellationToken);
}
In other words, I was thinking of it in exactly the same light as LINQ TakeWhile
, following the suggestion of @HaloFour
@ljw1004 does adding the extra method to the interface; add any extra value vs just adding and extension method? i.e. people will have to implement two methods. You can still add a WithCancellation
to the strongly typed class to override it.
Re: syntax I like foreach (var item await in asyncCollection)
As what it represents is conceptually like
foreach (var task in asyncCollection)
{
var item = await task;
// ...
}
So you are awaiting what is returned from the in
@HaloFour Cancellation should exist as the IAsyncEnumerator
level not at MoveNext
; unless you are suggesting cancelling one operation of the iteration then continuing iterating; rather than cancelling the whole iteration.
@benaadams if you mean at the IAsyncEnumerator
level, 👍
@divega lol, yes - edited :grin:
@benaadams
I'm not suggesting that you can cancel one MoveNextAsync
and start another, no. That doesn't make much sense. Neither does being able to call MoveNextAsync
twice or any other variety of garbage that we get saddled with by making IAsyncEnumerator<T>
follow the IEnumerator<T>
pattern. Certainly once a single iteration is cancelled as is the entire sequence.
But if we're going to move the voodoo back that far I'd say that we'd be better served by nixing the voodoo completely. You pass the CancellationToken
directly to the async iterator method and nowhere else. That way there's no voodoo in the iterator method body at all. You lose the functionality of being able to cancel the async operation once it's in flight, but we have precedence of this behavior already with async methods since you can't cancel a Task
. At most, maybe wire up some voodoo so that Dispose
on the IAsyncEnumerator<T>
triggers a cancellation, then at least the client has some control.
I think I've been thinking of this too narrowly...
The language is mostly neutral whether cancellation happens at enumerable, enumerator or movenext level. With some exceptions...
async
contextual keyword voodoo.So the debate about cancellation at MoveNext vs Enumerator vs Enumerable should be framed in terms of which of those basic language features are an acceptable price to pay: is the voodoo acceptable? is it acceptable to foreach over an enumerator?
I want to write a prototype which does support those two basic language features. If I get that far, then the cancellation debate is worth having by people implementing different libraries/builders that implement cancellation at the different levels, and seeing which one "wins".
There's a third basic language feature which @divega brought up that's more important to address: should cancellation be added as a first-class concept to the language, and if so then what syntax should the foreach statement use for it? Diego suggested foreach (var x await in xx using cancel)
. Are there any other contenders? Which other parts of the language would this new feature work in? And what exactly should it do?
Diego suggested
foreach (var x await in xx using cancel)
. Are there any other contenders?
I am not emotionally attached to that choice of syntax or keyword :smile:
@HaloFour @bbarry suggested this:
C#
foreach (var x await in xx) (cancel)
In his suggestion I like to interpret as (cancel)
as a parameters list to be passed to GetEnumerator()
which at that point can contain arbitrary parameters and not just the CancellationToken
.
@ljw1004 For the sake of prototyping I think that the entire cancellation argument could be deferred. Why stress over voodoo that may not be necessary? Are we terribly inconvenienced that we can't cancel Task
s returned from async
methods today?
@divega
@HaloFour suggested this:
Nope, @bbarry did. I personally think that's pretty hideous. :grin:
@HaloFour sorry, my mistake.
@HaloFour, @divega
I suggested:
await foreach (var x in xx) (cancel)
(await
before foreach)
The reason is, foreach
becomes(ish):
var e = xx.GetEnumerator();
while(e.MoveNext())
{
var x = e.Current;
}
...
in the idea that you might want to pass a parameter list to the MoveNext()
method:
//foreach (var x in xx) (param0, param1)
var e = xx.GetEnumerator();
while(e.MoveNext(param0, param1))
{
var x = e.Current;
}
with the notion that foreach (var x in xx)
is roughly a standin for the method call MoveNext
...
Anyway with this concept in mind (having nothing to do with async) it becomes a straightforward leap to think "oh I want to await the MoveNext
method", so await foreach
(resulting in while(await e.MoveNextAsync())
or while(await e.MoveNextAsync(token))
for await foreach(var x in xx) (token)
I personally like await foreach(var x in xx)
as the async foreach syntax (over foreach(async var x in xx)
) but I really don't care much; syntax is best left to a late decision made long after we have figured out and proposed acceptable semantics.
I don't really understand why someone might want to pass in a different token to each call of MoveNext
, but the general pattern of
var e = xx.Factory();
while(e.SomeWork(param0, param1))
{
var x = e.CurrentEvaluatedState;
...
}
could be succinctly represented with a foreach
pattern as:
foreach (var x in xx) (param0, param1) { ... }
and maybe?? that is useful?
My take
Interfaces
interface IAsyncEnumerable<T>
{
IAsyncEnumerator<T> GetEnumerator(CancellationToken cancellationToken = default(CancellationToken));
};
interface IAsyncEnumerator<T> : IDisposable
{
Task<bool> MoveNextAsync();
T Current { get; }
}
public static class IAsyncEnumeratorExtensions
{
public static IAsyncEnumerator<T> WithCancellation<T>(this IAsyncEnumerable<T> enumerable, CancellationToken cancellationToken)
{
return enumerable.GetEnumerator(cancellationToken);
}
}
Not sure if the WithCancellation
can be done entirely with generic constraints to avoid boxing; if it could that would be nice.
Manual
class ManualAsyncEnumerable : IAsyncEnumerable<int>
{
public ManualAsyncEnumerator GetEnumerator(CancellationToken cancellationToken = default(CancellationToken))
{
return new ManualAsyncEnumerator(cancellationToken);
}
};
struct ManualAsyncEnumerator : IAsyncEnumerator<int>
{
private CancellationToken _cancellationToken;
private int _current;
public ManualAsyncEnumerator(CancellationToken cancellationToken)
{
_cancellationToken = cancellationToken;
_current = -1;
}
public async Task<bool> MoveNextAsync()
{
if (_current < 9)
{
await Task.Delay(100, _cancellationToken);
_current++;
return true;
}
return false;
}
public int Current => _current;
public void Dispose() { }
};
Automagic
async IAsyncEnumerable<int> AsyncYieldingEnumerator(CancellationToken cancellationToken = default(CancellationToken))
{
for (var i = 0; i < 10; i++)
{
await Task.Delay(100, cancellationToken);
yield i;
}
}
Add constraint that foreach ... await in ...
needs to have the method GetEnumerator(CancellationToken cancellationToken
be or IAsyncEnumerable<T>
; which helps differentiate from foreach ... in ...
to give compile errors if you interchange the types; making:
async Task Method(CancellationToken cancellationToken)
{
ManualAsyncEnumerable enuermable = new ManualAsyncEnumerable();
foreach (var val await in enuermable)
{
// ..
}
// non boxed cancellation
foreach (var val await in enuermable.GetEnumerator(cancellationToken))
{
// ..
}
// boxed cancellation
foreach (var val await in enuermable.WithCancellation(cancellationToken))
{
// ..
}
foreach (var val await in AsyncYieldingEnumerator())
{
// ..
}
// cancellation
foreach (var val await in AsyncYieldingEnumerator().WithCancellation(cancellationToken))
{
// ..
}
}
ofc not having a different sig would mean you could blend sync and async in one enumerator - which may be a good or bad thing...
struct AsyncEnumerator : IAsyncEnumerator<T>, IEnumerator<T>
{
public bool MoveNext()
{
...
}
public async Task<bool> MoveNextAsync()
{
...
}
public T Current => _current;
public void Dispose() { }
};
@benaadams https://github.com/dotnet/roslyn/issues/261#issuecomment-215581366 I'm almost certain foreach
by spec requires the IEnumerable
interface GetEnumerator
and then uses the one it can by a pattern based lookup on the type or on IEnumerable<T>
or it requires GetEnumerator
, MoveNext
and Current
to be in the correct places, so by not implementing IEnumerable
and using MoveNextAsync
on these one could not do the former there :grin: (§8.8.4)
@bbarry but you could implement both; except you'd have to choose which one was hidden as an explicit interface as you couldn't have two methods differing by return type. Which would mean you couldn't use a struct enumerator for both as one would have to go via interface.
I believe you could implement both implicitly as long as your type on Current
was the same.
@ljw1004's proposal doesn't necessitate the IAsyncEnumerable<T>
interface used either, merely a pattern of:
CollectionType {
IteratorType GetEnumerator();
}
IteratorType {
ElementType Current {get;}
AwaitableType<bool> MoveNextAsync();
}
edit and under option 1 (https://github.com/dotnet/roslyn/issues/261#issuecomment-215308615), the collection type is not necessary either. edit: (my vote would be for option 2 but for the sake of consideration everything after that comment I am thinking about option 3)
@bbarry sorry I wasn't really very clear; I mean you might want two different enumerator types one for async and one for sync as although in practice they smudged together in the same; the are generally very different things. (e.g. the async would have extra data for cancellation state and its statemachine/builder)
Putting them together means people often end up doing bad things like only implementing one and then doing the other as sync-over-async or async-over-sync; which normally ends unhappily.
Yeah I got that. I am not sure how you might do it with a compiler supported async
+yield
based method compiled into a builder, but surely if you were writing one manually you could (and it is desirable to do so) wind up with a single type that implements both (I mean that is effectively what any async supporting BCL stream-like type already is right? the methods are merely named something else).
but you could implement both; except you'd have to choose which one was hidden as an explicit interface as you couldn't have two methods differing by return type. Which would mean you couldn't use a struct enumerator for both as one would have to go via interface.
A good reason to name the async version differently, e.g. GetAsyncEnumerator()
.
I am trying to create a ValueTaskEnumerable<T>
and ValueTaskEnumerableBuilder<T>
to play with the spec proposal and am having trouble getting past:
- The return type for an async iterator method must be a generic Tasklike
Tasklike<T>
.
I don't think this is true. The iterator method must be Tasklike<bool>
right (we are talking about MoveNextAsync()
right?)?
edit: oh nvmd me
@bbary The "async iterator method" is the one that's written
async IAsyncEnumerable<int> f()
{
}
This method f
must have a return type IAsyncEnumerable<int>
which is a _tasklike_. I'm re-using this word from the previous proposal. All this work means is that it has an attribute on it that points to the builder:
[Tasklike(typeof(MyAsyncEnumerableBuilder<>))]
interface IAsyncEnumerable<T>
The word _tasklike_ is a poor choice of word within this feature. It would be better renamed [HasFactory(...)] interface IAsyncEnumerable<T>
or [HasBuilder(...)] interface IAsyncEnumerator<T>
.
Note: the stuff in my spec describes the requirements of the builder type. What I've written is (I think) correct for an enumerator-builder, but incorrect for a "factory" (i.e. an enumerable-builder). I have to go back and revisit that.
https://gist.github.com/bbarry/9ae25647d65ad82bfe218125087677ea
Seems to work. I haven't attempted to get too crazy (hand writing the compiler generated state machine gets old fast).
@bbarry do you reckon we could juggle things around so
Feel free to change the dance between state machine and builder to achieve this...
I think that is possible. I have ValueTaskEnumerable
as a class because I am assigning the state machine from Start
, so I currently have an allocation in Create
for that. There is a bit of complexity in remembering that I'm handling multiple mutable structs which I was unprepared for at 6pm on a Friday (I thought I had it written this way correctly at first but it was writing each yielded value to the screen twice when I ran it).
I was going to write a ValueTaskEnumerator
and associated builder and try and refine the api first.
I've implemented a prototype version of C# which supports _consumption_ of async enumerators and async enumerables, foreach (await var x in e)
. Install instructions are here:
https://github.com/ljw1004/roslyn/blob/features/async-return/docs/specs/feature%20-%20async%20iterators.md
(I'm on vacation until May 16th so won't start work on the _production_ side until then. First step, before implementing any production changes in the compiler, will be to figure out exactly what the builder and state machine should look like in order to be as efficient as possible.)
Upon further thought I don't think it is possible for zero allocations (for a compilation produced machine). One would need type information for the state machine in the enumerator in order to hold the state between calls of MoveNext
, but I don't know the name of the type.
Essentially I think I need to write something like this:
public static async ValueTaskEnumerable<int, *> YieldingStuff()
{
await Task.Delay(1000);
yield return 1;
await Task.Delay(1000);
yield return 2;
}
Which could be compile time transformed to ValueTaskEnumerable<int, Cg3>
.
On the bright side I am fairly certain I can do it with vNext generators (#5561):
public static ValueTaskEnumerable<int, Af.Enumerator> AfYieldingStuff()
{
Af.Await(Task.Delay(1000));
Af.Yield(1);
Af.Await(Task.Delay(1000));
Af.Yield(2);
return Af.Break<ValueTaskEnumerable<int, Af.Enumerator>>();
}
and generate the necessary method and state machine pre-compilation.
@bbarry I imagined something like a generic class MyEnumerableBuilder
I think you need that to build the right type (at least a generic method that creates the correct tasklike...) but you still need the type information somehow in the ValueTaskEnumerable<>
and/or ValueTaskEnumerator<>
types in order to hold references to them in the containing methods, don't you?
For the zero-allocation version, I'm imagining that a single struct ValueTaskEnumerator<>
which has a field which is the struct builder, which in turn has a field which is the struct state machine. Or some other arrangement of the last two.
For the single-allocation version (which returns an IAsyncEnumerable
), I'm imagining that it returns a class which implements both IAsyncEnumerable
and IAsyncEnumerator
, and where GetEnumerator()
just returns this
if it's the first enumerator request of it, and the class has a field which is the struct builder, which in turn has a field which is the struct state machine.
I've updated the implementation I wrote of ValueTaskEnumerable to explicitly implement the interfaces alongside the patterns and separated out the Enumerator implementation and builder.
The interfaces I am implementing are:
public interface IAsyncEnumerable<out T>
{
IAsyncEnumerable<T> ConfigureAwait(bool continueOnCapturedContext);
IAsyncEnumerator<T> GetEnumerator();
IAsyncEnumerable<T> WithCancellationToken(CancellationToken token);
}
public interface IAsyncEnumerator<out T>
{
IAsyncEnumerator<T> ConfigureAwait(bool continueOnCapturedContext);
T Current { get; }
ConfiguredTaskAwaitable<bool> MoveNextAsync();
IAsyncEnumerator<T> SetCancellationToken(CancellationToken token);
}
Notes (in no particular order):
Task<bool>
...); these allocations _do_ happen in the explicit IAsyncEnumerator
usage (though I believe they are cached instances in Task.FromResult
).YieldValue
ConfigureAwait
and WithCancellationToken
such that you cannot call them more than once is possible but annoying.ConfiguredValueTaskAwaitable
could be implicitly cast to ConfiguredTaskAwaitable
.SetStateMachine
methodforeach
.IDisposable
implementation must be allowed to be called more than once. MoveNextAsync
.https://gist.github.com/bbarry/9ae25647d65ad82bfe218125087677ea
For science, here's something that is zero allocations and almost works (I bet I could make it work with a bit of effort) but demonstrates what I mean about needing the state machine exposed: https://gist.github.com/bbarry/0fca79b6ac8f9ea642a768024560aaa5
Are async iterator methods even needed as a language feature? I believe it is possible to write them as a library solution today. Like this:
public static IAsyncEnumerable<T2> Select<T1, T2>(IAsyncEnumerable<T1> items, Func<T1, Task<T2>> selector) {
var yieldSource = new YieldSource<T2>(); //design like TCS
var sequenceTask = SelectCoreAsync(items, selector, yieldSource.YieldConsumer);
return yieldSource.GetEnumerable(sequenceTask);
}
static async Task SelectCoreAsync<T1, T2>(
this IAsyncEnumerable<T1> items,
Func<T1, Task<T2>> selector,
AsyncYieldConsumer<T2> yieldConsumer) {
foreach (await var item in items) {
var result = await selector(item);
await yieldConsumer.Yield(result); //libary-based yield
}
}
Seems clean enough. It does not accomplish all design goals but a lot of them at zero language cost.
YieldSource
is a library-provided helper class. It should not be too hard to write that. This design is a "push pull adapter" but thanks to async there is no need for background threads. Basically, every yieldConsumer.Yield
call deposits the value and completes the current outstanding MoveNextAsync
task. YieldSource
only needs to buffer a single item at a time.
To clarify, this proposal is only about the production side. I do not question async foreach for the consumption side.
@GSPP do you not need an initial IAsyncEnumerable<T>
to pass in to both of those methods?
@bbarry it can be implemented manually by deriving from that interface. There will not be async enumerator CLR support. You can do anything manually that the C# 7 compiler otherwise would generate for you.
Are async iterator methods even needed as a language feature?
@bbarry While it can be done in current C#, adding language support could make it a lot easier to implement it efficiently -- the same sorts of gains you'd see with ContinueWith vs async/await.
@bbarry That thing about "needing the state machine exposed" will, I think, sink the idea of having a non-allocating iterator. Here's why... @CyrusNajmabadi
ValueIterator<int> f() { yield return 1; yield return 2; }
foreach (var x in f()) { Console.WriteLine(x); }
The above code is a simple case of a non-allocating iterator. It relies upon f()
returning a value-type, and we subsequently call MoveNext()
upon it. But the compiler has to emit the body of MoveNext()
somewhere. It can only ever emit it this body into a compiler-generated type struct fStateMachine
.
_Therefore, ValueIterator<int>
must contain that state-machine as a field._
@bbarry achieved this by actually having the method return ValueIterator<int,fStateMachine>
. But this is weird, because the compiler-generated state machine type ends up being exposed in the signature. We couldn't allow this. (Indeed, the developer can't even utter the name of the state machine type).
The only other solution I can imagine is something like anonymous types...
private var f() { yield return 1; yield return 2; }
The compiler will say: "I see the method returns var
and has yield return
in it. Therefore I will construct a compiler-generated value-type enumerable. The user can either cast it to IEnumerable<int>
, or can foreach
over it directly to avoid all allocation." The method would have to be either private or a local function to allow this. It would unfortunately have to rely upon type inference, and we'd need to solve circular cases:
private var f() { yield return g().First(); } // infers element type "int"
private var g() { yield return 1; }
private var f2() { yield return g2().First(); }
private var g2() { yield return f2().First(); } // error: circular
I guess the compiler-generated type would indeed be ValueEnumerable<T,TStateMachine>
just as @bbarry suggested. This type would have to exist in the framework and be blessed by the compiler. There'd be no way for the compiler to use a different type other than ValueEnumerable
.
But you wouldn't even be able to write efficient LINQ extension methods on it because only VB (not C#) allows the receiver of an extension method to be ref
...
T First<T,TSM>(ref ValueEnumerable<T,TSM> enumerable)
And what about async iterators? Once again, the compiler would have to tie the feature to one concrete value-type-async-iterator from the framework.
The method would have to be either private or a local function to allow this.
This would also be extremely unfortunate as you would not be able to use this for types you want to be publicly enumerated in a non-allocating fashion.
It can only ever emit it this body into a compiler-generated type struct fStateMachine
You hit the nail on the head. If we wanted to allow these methods to be public, then we'd have to allow something like the following (syntax not final):
``` c#
public struct Enumerator GetValues() { yield ... };
or, alternatively
``` c#
class C
{
public struct enum Enumerator;
public Enumerator GetValues() { yield; }
}
In both these cases, the effective idea is to be able to name the struct that will hold the state machine. In either case, you are not allowed to supply any body for the type. Instead, you are simply naming it, and stating what accessibility/scope it would have.
The compiler would then fill this type when necessary. It would then be an error to reference this type more than once as the return type of a yielding method.
This would work, but would unfortunately still not solve the Linq problem. Now you have a uniquely named type for each yielding scenario. There would be no way to write a consistent set of linq methods that could operate on it...
_unless_ we had these structs implement some special interface that they then explicitly implemented.
The linq methods would be defined like:
c#
First<T>(this ref T value) where T : SomeSpecialInterface
But, as lucian mentioned, we can't use ref in extension methods in C#.
However, i think this scenario is pretty compelling. So maybe we could relax that C# restriction and put it more in line with VB.
@ljw1004, is the implementation of the interfaces required or can it be pattern based?
@paulomorgado I think it should be etirely pattern based
Just a little _bump_ to see where this is and how we can get some progress?
I've been working on other things and am starting vacation tomorrow. I'm planning to get back to it in September.
@aelij wrote a sampler for AsyncEnumerable using arbitrary async returns https://arbel.net/2016/08/10/n-async-the-next-generation/
Okay, it's September now, so as promised I'm getting back to async streams. My current plan of attack is:
Things to make sure we answer with a definite yes or no (and if yes, then with working code):
So far for [1] I have EF and RX and BCL streams/files (e.g. @NickCraver above) on my list. If anyone has other ideas of APIs that you think would benefit from async streams, please let me know.
Yep, my work with Google Cloud APIs would benefit. We're currently using System.Interactive.Async and IAsyncEnumerable, if that's relevant. Feel free to include me in any emails, and I'd be happy to join a Hangout or Skype call.
@ljw1004 Service Fabric reliable collections expose yet another async enumerable.
@ljw1004 Another area which must perhaps benefit from async enumerables is filesystem access APIs. Right now Directory.EnumerateFiles
returns just a synchronous IEnumerable<string>
, StorageFolder.GetFilesAsync
returns asynchronously the whole list, or allows to fetch files asynchronously in bacthes.
I think, async enumeration fits perfectly here.
An important use could would by async LINQ and async PLINQ. Even if just Where
, Select
and ToList/Array
are supported that's a big win. It's rare to order, group or join by a key that's obtained asynchronously.
@ljw1004 I've long used Ix-Async (now System.Interactive.Async) in my CSV/etc. parsing library.
I've also experimented making my own "performant" Ix-Async (it is quite chatty) by having a chunked extension to IAsyncEnumerable to pull down multiple items at once, as is the case with most I/O. I unfortunately don't have this code anymore, but with ValueTask that complexity may no longer be prudent.
@vladd unfortunately the underlying win32 filesystem apis are not async.
there may also be some crossover here with the "channels" stuff that David Fowler is looking at; this is _primarily_ IO focused - essentially it reverses the direction of Stream
from pull to push (with async goodness). Perhaps viable as a potential "source" for an async iterator
@mgravell not to be dense here, but isn't Rx already the opposite of Ix? That was the mathematical design. The channels could probably produce IObservable sequences, right?
We propose to do a language feature and associated library support for C#8 to support async streams. An async stream is a sequence of values where you might need to await for the next sequence.
Here's a strawman:
IAsyncEnumerable<int> xs = GetStream();
foreach (await var x in xs) { ... }
// Expansion:
{
try {
while (await e.MoveNextAsync()) {
T x = e.Current;
...
}
}
finally {
(e as IDisposable).Dispose(); // plus a few idiosyncracies here
}
}
foreach (await var x in xs)
but this is up for discussionIAsyncEnumerable
but it's up for discussion whether it should operate upon an IAsyncEnumerator
xs
that satisfies a pattern.ConfigureAwait(false)
will fit in. I assume xs.ConfigureAwait(false)
, but this means ConfigureAwait
can't be an extension method on IAsyncEnumerable
.CancellationToken
will work: as an argument to GetStream()
, or GetEnumerator()
, or MoveNext()
.Here's a strawman:
async IAsyncEnumerable<int> GetStream()
{
yield 1;
await Task.Delay(1, async.CancellationToken);
yield 2;
}
await
inside a finally
block for an async stream? -- this would require us to solve IAsyncDisposable
and I don't like it...async
modifieryield
and return
inside, e.g. for IAsyncOperationWithProgress<T,U>
?async
inside the method, which binds to a pattern-based part of the builder, and the builder can choose which methods/properties to expose via this keyword.await
operators in the method? -- hopefully not!Here's my strawman of an example type that satisfies the async stream pattern for consumption. I've written it suggestively as an interface, but if it's all pattern-based then you can completely bypass this interface while still fulfilling the pattern.
interface IAsyncEnumerable<T>
{
IAsyncEnumerator<T> GetEnumerator(CancellationToken cancel = default(CancellationToken));
IAsyncEnumerable<T> ConfigureAwait(bool b);
}
interface IAsyncEnumerator<T>
{
IAsyncEnumerator<T> ConfigureAwait(bool b);
T Current {get;}
ConfiguredTaskAwaitable<bool> MoveNextAsync();
}
IAsyncEnumerable
Where will the .NET implementation of IAsyncEnumerable
live? and its builder?
System
? (Which team will own it?)IAsyncEnumerable
We'll presumably want LINQ operators. I hope they will be fast enough. I don't know which ones.
IAsyncEnumerable<int> xs;
IEnumerable<int> ys;
xs = xs.Select(x => x + await t);
xs = ys.Select(x => x + await t); // ???
IObservable
I'd expect this feature to be a handy way to produce IObservables
. Is there anything here for consuming them as well?
Please consider a ValueTask type for the return of MoveNext()
That's not necessary. If MoveNext completes synchronously, it can return a cached Task<bool>
(which is what async methods already do if they return bool and complete synchronously). And if it completes asynchronously, a task would still need to be allocated, even with ValueTask. That actually then ends up making ValueTask a more expensive choice, as it'll always be handing back both a T and a Task<T>
(so more data being moved), plus it'll likely end up slightly bloating whatever builder it's stored in.
Up for discussion how .ConfigureAwait(false) will fit in. I assume xs.ConfigureAwait(false), but this means ConfigureAwait can't be an extension method on IAsyncEnumerable.
@ljw1004, I didn't understand this part. I do not think any of the ConfigureAwait stuff should be part of the interface. If foreach is pattern based, then ConfigureAwait can be an extension method on IAsyncEnumerable, providing its own MoveNextAsync/Current members, with MoveNextAsync returning a ConfiguredTaskAwaitable, and the foreach will bind to that. You'd have something like:
``` C#
public static ConfiguredAsyncEnumerable
...
public struct ConfiguredAsyncEnumerable
{
public ConfiguredAsyncEnumerator
}
public struct ConfiguredAsyncEnumerator
{
public ConfiguredTaskAwaitable
public T Current { get { ... } }
}
```
Up for discussion whether to require the async modifier
I'm likely in the minority, but I'd actually like to see us:
iterator
keyworditerator
on iterators (not require for reasons of compat)Then it's clear just from looking at the signature what kind of transformation is going to be done.
iterator
=> sync iteratorasync
=> async methodasync iterator
=> async iteratorI personally get annoyed with iterators in C# today that I need to scour the method's implementation looking for yields to know whether this is a method the compiler is going to transform or not.
Where does cancellation get in? I propose a contextual keyword async inside the method, which binds to a pattern-based part of the builder, and the builder can choose which methods/properties to expose via this keyword
This is useful, but it's also orthogonal to async streams and I believe should be discussed separately. It provides similar value for arbitrary async return types of regular async methods, not just iterators. It's also not specific to cancellation. For example, the canonical example we've used is returning an IAsyncOperationWithProgress
, in which case you'd like this async
-exposed object to provide both a CancellationToken for cancellation (triggered by a consumer calling IAsyncOperationWithProgress.Cancel()
as well as an IProgress<TProgress>
for outputting progress reports.
Can you have both yield and return inside, e.g. for IAsyncOperationWithProgress
?
I would prefer to see this example handling via the previously discussed async
keyword allowed in the body. I would prefer not to see both of these allowed in the body; seems too subtle and confusing, plus extra complication for the builder to be able to handle both.
Should it warn if you lack await operators in the method? -- hopefully not!
How does this differ from regular async methods? Are you proposing removing the warning from there, too?
I've suggested the syntax foreach (await var x in xs)
+1
Up for discussion how CancellationToken will work: as an argument to GetStream(), or GetEnumerator(), or MoveNext()
My preference would be for it to be embedded into the enumerator. While there may be a few niche/corner-case scenarios where passing a token to MoveNext such that it can be changed on each call would be desirable, it'd potentially be a huge performance hit to support that (e.g. registration for cancellation per MoveNext rather than once for the whole iteration), and it does not seem like a good tradeoff to me. By building the token into the enumerator, you allow for any operation on the enumerator to be canceled while being able to ammortize any associated costs for cancellation across the whole iteration.
If it's a core part of .NET, will it have to move into System ? (Which team will own it?)
It could be added to System.Threading.Tasks.Extensions, which is already a standalone NuGet package that works across various versions. Or we could introduce a separate package for it. But I'm not seeing why it would need to be pushed into a core library like mscorlib.
We'll presumably want LINQ operators. I hope they will be fast enough. I don't know which ones.
"which ones" has a few parts to it... it's not just which operators, but also which combination of sync/async with each operator, e.g. for the single Select method we have today of the form:
``` C#
public static IEnumerable Select
introducing async enumerables means potentially adding multiple forms for this one overload:
``` C#
public static IAsyncEnumerable<T> SelectAsync<T,U>(this IEnumerable<T> source, Func<T, ValueTask<U>> selector);
public static IAsyncEnumerable<T> SelectAsync<T,U>(this IAsyncEnumerable<T> source, Func<T, U> selector);
public static IAsyncEnumerable<T> SelectAsync<T,U>(this IAsyncEnumerable<T> source, Func<T, ValueTask<U>> selector);
and that's just for an overload that takes a single source and a single delegate. The potential combinations explode if you start considering allowing some delegates in an overload to be async or not, to allow methods that operate on multiple enumerables to mix and match whether those are sync or async, etc.
Whatever is done, I suggest it be its own standalone library in corefx, e.g. System.Linq.Async, separate from the definitions of the interfaces, and a package re-usable across all relevant platforms.
It could be added to System.Threading.Tasks.Extensions, which is already a standalone NuGet package that works across various versions. Or we could introduce a separate package for it. But I'm not seeing why it would need to be pushed into a core library like mscorlib.
Wouldn't it need to go into mscorlib
to enable adding API's that expose async iterators to types defined there? Otherwise, it seems to already have a home in System.Interactive.Async
as the de-facto location and implementation of the operators? The assembly name and namespaces could be updated if it's better to use something else.
Wouldn't it need to go into mscorlib to enable adding API's that expose async iterators to types defined there?
Only if there were existing mscorlib types to which we wanted to add such APIs. And if/when that happens, then the types can be type-forwarded down. I do not believe we should do that from the get-go. We already have many "mscorlibs", for desktop, for coreclr, for .NET Native, for Xamarin, for Unity, ... adding these as a separate package basically makes it "just work" for all of them, and for previous versions already shipped. Then later if/when we want an API added, we can factor in the work necessary.
@ljw1004 What happens if I await a async sequence the never returns?
What benefits would language support bring rather than libraries?
That's not necessary. If MoveNext completes synchronously, it can return a cached Task
(which is what async methods already do if they return bool and complete synchronously). And if it completes asynchronously, a task would still need to be allocated, even with ValueTask.
Note for interested readers: the builder is where this code would be. An async method a user might write would be something like this:
async IAsyncEnumerable<int> Foo() {
yield 1;
yield 2;
}
the builder for this would create a state machine:
public Task<bool> MoveNextAsync() {
switch (currentstate) {
case 0: //starting
value = 1;
currentstate = 1;
return Task.FromResult(true); //implementations of the builder need to do the right thing here
break;
...
This (combined with the fact that you couldn't have a zero allocations implementation overall without considerable language support) is why ValueTask<T>
isn't really helpful for IAsyncEnumerable<T>
.
edit: I did write an implementation above using ValueTask and it is rather pointless (as it must allocate the enumerator and can use cached tasks for the MoveNext results).
@stephentoub,
I'm likely in the minority, but I'd actually like to see us:
- Add an
iterator
keyword- Allow but not require
iterator
on iterators (not require for reasons of compat)- Require "async iterator" on async iterators
Then it's clear just from looking at the signature what kind of transformation is going to be done.
iterator
=> sync iteratorasync
=> async methodasync iterator
=> async iteratorI personally get annoyed with iterators in C# today that I need to scour the method's implementation looking for yields to know whether this is a method the compiler is going to transform or not.
👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍
I pushed for that for C#!
yeah, i'm definitely not a fan of that. I prefer to not force ceremony when coding. I would have preferred to not have 'async' either, but the desire to have things like "await (expr)" unambiguously parse overrode that hope.
Ceremony is ok if it means that the general code inside will be simpler and cleaner. If, however, there are no ambiguity problems, then i would not want any ceremony.
Note, IMO, that syntactic ship has likely sailed. C# has definitely (for many releases now) erred on the side of brevity. VB has been the language to use if you prefer a more verbose approach to the structure of your code. I do not envision any reason why the next version of C# of VB would change this. It's really one of the few reasons you would pick one of these languages over the other.
Since you brought up VB, @CyrusNajmabadi, the point I'm are making (and, I guess, @stephentoub) is that C# is all about "there is no magic". async
tells you that "This is not really a method! The compiler will just let you write this as if it were and will do all the tedious work for you.". The same for iterators, except that you need to read the code to find out that it's an iterator.
The same for iterators, except that you need to read the code to find out that it's an iterator.
Yes. For example, take this code:
``` C#
public IEnumerable
{
if (key == null)
{
throw new ArgumentNullException(nameof(key));
}
...
Just from reading the beginning of the method, I can't tell you whether passing in null will end up throwing an exception or not, which is very counterintuitive given that in any other unannotated method, I could definitively tell you that. If it were instead:
``` C#
public iterator IEnumerable<Result> GetResults(string key)
{
if (key == null)
{
throw new ArgumentNullException(nameof(key));
}
...
I can see just from the signature that there's something special here and that the normal rules don't apply.
I realize the language is never going to enforce adding such a keyword to iterators; that would be a gigantic breaking change. But with it available as an option, we can have our own style enforcement to ensure that it's used. And as we're introducing new kinds of methods (e.g. async iterators), we can make it available (and, ideally from my perspective, required).
Then it's clear just from looking at the signature what kind of transformation is going to be done.
I'm curious, why is it important? However, I think for C# an analyzer could enforce annotating such methods with an attribute, because as @CyrusNajmabadi said, this is one of the things that distinguishes C# from VB in terms of conciseness. Not everyone would like that, considering that it wouldn't be ambigious otherwise — I assume that's why yield return
is chosen instead of yield
in the first place.
I'm curious, why is it important?
Because, at least in the code bases I work in most of the time, one is a bug, the other isn't. There's also a performance impact to iterators. As someone who reviews a ton of code, the more obvious such cases, the more likely it is that I'll notice bugs (behavioral or performance) before they ever get checked in. And as someone writing code, having to explicitly mark the method in order to enable the compiler's significant transformation makes the transformation very explicitly opt-in rather than it happening almost as an accident of how the method got implemented.
I kinda of agree. It's a bit too late to fix it with iterators (although maybe a last-minute change to local function iterators?) However I'd probably be satisfied if VS were to show iterators differently, and if an iterator
keyword were to be added to the language support a code style that would error if it was omitted.
the more obvious such cases, the more likely it is that I'll notice bugs
As I said in my edit, I think an analyzer suffices (you could even forbid it altogether!). Unless there is an actual ambiguity that the keyword could resolve, I don't really think that it would worth to introduce a keyword that was not required up untill now.
I don't really think that it would worth to introduce a keyword that was not required up untill now
We will need to agree to disagree.
While I don't like verbosity, I have to admit that the iterator keyword would be a responsible addition which would make code a bit safer, like in strongly typed FP langs. Speaking of strongly typed FP langs, another stream library that might make a good reference is https://github.com/nessos/Streams.
Should the possibility of reserving async
in arbitrary tasklike methods be considered as a separate issue item for potential inclusion in the same release as #10902 in the event that this slips?
@bbarry I'm no longer concerned about reserving async
. Mads suggested that it can follow exactly the same kind of rules as nameof
-- i.e. look it up, and bind to a symbol if there's one defined with that name, otherwise treat it as the contextual keyword. ("It's a keyword only in async method/lambda places where it's not already an identifier").
Now that we have analyzers, i definitely lean toward the idea that the language should decide on a good, consistent, default. People with specialized needs can write analyzers to help call out the issue.
For example, for @stephentoub's case, i would write an analyzer that would warn me if i had argument checking at the start of a method that then contained yields within it. I might even go further and have a code fix provider that hten refactored the method to call a sibling helper (or maybe even a nested function :)). That would then be a cool analyzer/fix combo to make available as an extension for those who want it.
interface IAsyncEnumerator
Changing topics to another part of the proposal...
The main approach we've all been assuming for the interface is:
``` C#
public interface IAsyncEnumerator
{
Task
T Current { get; }
}
With that, a loop like:
``` C#
foreach (await T item in enumerable)
{
…
}
would compile down to something like:
``` C#
var e = enumerable.GetEnumerator();
while (await e.MoveNextAsync())
{
T item = e.Current;
…
}
I think we should consider (at least discuss) an alternative.
The existing `IEnumerator<T>` has a well-known design concern: it requires two interface calls per element, one for MoveNext and one for Current. There are various ways to address that for an `IAsyncEnumerator<T>`, one of which would be along the lines of this alternative:
``` C#
public interface IAsyncEnumerator<T>
{
Task<bool> WaitForNextAsync();
bool TryGetNext(out T item);
}
which with the aforementioned foreach loop would compile down to something like:
``` C#
var e = enumerable.GetEnumerator();
while (await e.WaitForNextAsync())
{
while (e.TryGetNext(out T item))
{
…
}
}
The idea is that WaitForNextAsync would return a synchronously completed task if data was already available (or if it knew there would never be more data) and otherwise would perform whatever operation was necessary to bring down the next piece of data, but it wouldn’t actually take that data from the enumerator; that would be done by TryGetNext, which would return data available if there is any, otherwise returning false.
One obvious advantage of this is that it addresses the two-interface-calls-per-iteration issue. Worst case, there are two interface calls per iteration, if each call to WaitForNextAsync only makes a single item available for TryGetNext, but best case, there’s only one interface call for each element.
However, the `IEnumerable<T>` two-call design has another downside: it’s not possible to create a thread-safe implementation. Without locking external to the interface implementation, multiple consumers can’t consume the same enumerator, as the call to MoveNextAsync/Current can’t be made atomic. That’s not the case with the alternate model. You can implement a thread-safe WaitForNextAsync/TryGetNext pair, as the TryGetNext itself can use whatever synchronization it needs internally to both get the next item and tell you whether it was successful Worst case, if the caller loses a race condition with another consumer, TryGetNext can return false and you can loop back around to try again.
Now, today an `IEnumerator<T>` isn’t thread-safe, and as such no one can use it as such. but I think there are more use cases for supporting thread-safety in the async world. These async enumerators are likely to be used in some cases for producer/consumer models, and once you get there, it’s not far before you want to allow multiple consumers off of the same data stream, parallelized processing of results, etc. It would be possible to enable such multi-consumer scenarios off of an enumerable, where the enumerable coordinates safety over a single underlying stream, handing out single-consumer enumerators that all coordinate with each other. But that violates the notion of how we’ve been talking about enumerables, as something where another call to GetEnumerator essentially restarts the operation.
There's also another use case for this model. Imagine you wanted a construct like:
``` C#
IAsyncEnumerator<Student> students = …;
IAsyncEnumerator<Teacher> teachers = …;
select
{
case students as Student s:
HandleStudent(s);
break;
case teachers as Teacher t:
HandleTeacher(t);
break;
}
The idea here is that we have two async iterators, and we take one item from one of them and process it, whichever arrives first, kind of like a WhenAny. This can’t be done correctly on top of the MoveNextAsync/Current pattern: you would have to call MoveNextAsync on both iterators, at which point you’ve already moved the other one forward (and it may still be in flight when you exit the select). In contrast, because the act of saying “try to ensure more data is available synchronously” is separate from the act of saying “get me synchronously available data”, this can be done with the alternate model (though, full disclosure, it would likely require the WaitForNextAsync implementation to be thread-safe to be used in this manner, or at least tolerate another call to WaitForNextAsync while a previous one is still in flight).
(These reasons are part of why the channels prototype I did on corefxlab essentially has these two methods as the main part of the readable channel interface:
https://github.com/dotnet/corefxlab/blob/master/src/System.Threading.Tasks.Channels/System/Threading/Tasks/Channels/IChannel.cs#L32-L45)
Anyway, food for thought.
how about:
await foreach (var thing in Method()) { //I still prefer this syntax
...
}
//becomes
var eble = Method();
var etor = eble.GetEnumerator();
while (await eble.WaitForNextAsync(ref etor))
{
while (eble.TryGetNext(ref etor, out T item))
{
...
}
}
method is async IAsyncEnumerable<Foo> Method() { ... }
...
then via generators functionality it would be possible to do something like:
public async ValueAsyncEnumerable<Foo, GenerateStateMachine> Method()
{
if(NotHotPath) { await Something(); }
yield new Foo();
}
public partial struct GenerateStateMachine {}
where ValueAsyncEnumerable<T, Ttor>
implements the pattern IAsyncEnumerable<T>
and the generator can provide the implementation and voila: zero allocation IAsyncEnumerable
:
public replace ValueAsyncEnumerable<Foo, GenerateStateMachine> Method()
{
var eble = default(ValueAsyncEnumerable<Foo, GenerateStateMachine>);
eble.ThisArg = this;
return eble;
}
public partial struct GenerateStateMachine {
...
}
@bbarry
while (await eble.WaitForNextAsync(ref etor))
Except that the implementation can not be async
,
CS1988: Async methods cannot have ref or out parameters
I'm skewed towards Task<(bool, T)> TryGetNext()
. It's an await
but a single call per element in all cases vs. two interface call per element in the worst case and no await
in other cases. Absence of WaitForNextAsync
would also simplify the implementation but I'm not sure if that is a concern.
I'm skewed towards Task<(bool, T)> TryGetNext()
That's very expensive: we can cache a Task
@stephentoub Right, that is definitely a concern. However, it can be declared covariant over T
on master unless I'm missing something.
public interface IAsyncEnumerator<out T> {
Task<(bool, T)> TryGetNext();
}
While out
parameters types are not allowed to be covariant because they are represented as ref
in CLR.
F# uses option
for this (source), I'm assuming that interoperability between the two is not an issue for now,
type IAsyncEnumerator<'T> =
abstract MoveNext : unit -> Async<'T option>
inherit IDisposable
In C# however, there is no such type. If we don't need to be able to yield nulls from an async sequence, we can take advantage of #9932,
Task<T?> TryGetNext();
I think Option<T>
is still a good investment though.
What about ValueTask<(bool, T)> TryGetNext()
? Should benchmark this against the version with two virtual calls.
I think using two while loops complicates things a bit, both for the iterator implementation and the consumer.
Is it even clear that per-item overhead matters so much for the async case? Async usually is being used for IO heavy code. It tends to be not particularly chatty.
So if I run an async stream over 100 URLs to download the per-item overhead is unmeasurable for all of the solutions that were discussed here.
The only chatty use case I can think of is async ADO.NET using data readers where each field is accessed using an async API. But that's almost always a bad idea in a web app because no throughput can be gained doing that. If a lot of threads are waiting on the database then the database is overloaded anyway. So we must assume few threads which makes async memory savings moot.
Maybe the solution here is a less chatty data reader API. For example, there could be a DataReader.FillObject<T>(T obj)
that fills in an entire object (or object[]
with all the column values). Could use runtime code gen to optimally deserialize the TDS stream directly into the object.
Are there other chatty use cases? We'd need to talk about 100,000 operations per second per core before async overhead becomes meaningful.
@GSPP long stretches of code running without any allocations in the heap means you could do things like run in environments with an altered GC designed under that constraint (or potentially no GC at all, allocate a block at the start of a run where all "allocations" inside that call stack are made).
It isn't necessarily about the performance of some specific bit of code, but about what environments the code might be able to run in.
@arlz why would the implementation of Task<bool> WaitForNextAsync<T>(ref T enumerator)
need to be async?
The general population of users writing async IAsyncEnumerable<T>
methods would not need to care and the rare people implementing the pattern would likely simply do something with the value and then perhaps call an internal async method. For example a BufferedNetworkStream might be implemented:
public class BufferedNetworkStream : IAsyncEnumerable<byte, int>
{
List<byte> _buffer = new List<byte>();
...
private asnyc Task<bool> ReadFromUnderlyingAsync()
{
if (_remote.Finished) return false;
if (_remote.HasAvailable) { _buffer.AddRange(_remote.ReadAvailable()); }
else { _buffer.AddRange(await _remote.ReadAvailableAsync()); }
return true;
}
public Task<bool> WaitForNextAsync(ref int index)
{
if (index < _buffer.Length) { return Task.FromResult(true); }
if (_remote.Finished) { return Task.FromResult(false); }
return ReadFromUnderlyingAsync();
}
public bool TryGetNext(ref int index, out byte value)
{
if (index >= _buffer.Length)
{
value = default(byte);
return false;
}
value = _buffer[index++];
return true;
}
public int GetEnumerator() => 0;
}
Is it even clear that per-item overhead matters so much for the async case? Async usually is being used for IO heavy code. It tends to be not particularly chatty.
IO can be incredibly chatty; it might not seem so at aggregate "GetURLAsync" level but it is at the socket level.
@GSPP
Is it even clear that per-item overhead matters so much for the async case? Async usually is being used for IO heavy code. It tends to be not particularly chatty.
I believe a very common pattern in website APIs is paging (for example Wikipedia API and Stack Overflow API use it, to name the ones I've used). The way it works is that you make a request for the first _n_ results, then another request for the next _n_ results, etc.
And in these cases, some form of async enumerable would be very useful. But (unless you want to have ugly interface using something like IAsyncEnumerable<IEnumerable<Item>>
), they're also fairly chatty.
@alrz I'm pretty sure variance shouldn't work with tuples, so I have reported it: https://github.com/dotnet/roslyn/issues/13705.
@svick I was suspicious myself hence the "unless I'm missing something" part. 😄
Same would apply to reference-type-capable Nullableout
params. So we probably wouldn't get covariant IAsyncEnumerable
Please don't make IAsyncEnumerable
invariant if there is any way around it. IEnumerable
variance has been a tremendous boon for me.
Maybe the solution here is a less chatty data reader API. For example, there could be a DataReader.FillObject
(T obj) that fills in an entire object (or object[] with all the column values). Could use runtime code gen to optimally deserialize the TDS stream directly into the object.
@GSPP Not sure how well this is documented but ReadAsync()
is already supposed to work like that unless you specify CommandBehavior.SequentialAccess
on ExecuteReaderAsync(CommandBehavior)
. I.e. ReadAsync()
will only complete when enough has been read from the network that none of the calls to the sync GetXyz(int ordinal)
methods would block for that row.
@svick API paging is very non-chatty with respect to async overhead because the HTTP call is about 1e6x more expensive.
@benaadams that is a good point, but should apply less to sequences than it applies to "scalar" tasks. Or does it? Maybe new patterns will emerge. But I have written quite a few IO layers of various kinds and for some reason I don't recall any instance where I had a chatty sequence based on IO. For some reason it'S either scalars or low-volume sequences.
My favorite trick to make chatty sequences non-chatty is chunking. I sometimes do: items.AsChunked(1000).AsParallel().Select(chunk => chunk.ConvertAll(x=> ...)).AsEnumerable().Flatten()
. So that's a workaround that will apply to async sequences as well. (But so far that trick was only needed for CPU-bound work since IO is too bulky to need it.)
@divega I was thinking about that as well.
Currently, neither Task<(bool, T)> TryGetNext();
nor ValueTask<(bool, T)> TryGetNext();
will work where T
is Span<T>
as it is a stack only type and can't be awaited for.
So you'd need the two steps where one was async but didn't return T
and the other was sync and returned T
to be able to work with Span<T>
.
Following on from @stephentoub https://github.com/dotnet/roslyn/issues/261#issuecomment-245616932, @bbarry https://github.com/dotnet/roslyn/issues/261#issuecomment-245644988 / https://github.com/dotnet/roslyn/issues/261#issuecomment-245676226 (and @jaredpar's article)
Take it all the way back to sync also?
public interface IEnumerable<T, TEnumerator>
{
TEnumerator StartEnumeration { get; }
bool TryGetNext(ref TEnumerator enumerator, out T item);
}
public interface IAsyncEnumerable<T, TEnumerator>
{
TEnumerator StartAsyncEnumeration { get; }
Task<bool> WhenNextAsync(ref TEnumerator enumerator);
bool TryGetNext(ref TEnumerator enumerator, out T item);
}
I have a similar fear about the second type parameter that I had about variance in the first place. Is that going to hamper abstraction? For example, if I need to consume an IAsyncEnumerable
with item types deriving from MyBaseType
, how would I do it with IAsyncEnumerable`2? Must all code that consumes an enumerable become generic? If I take three enumerables, am I forced to add three otherwise useless generic parameters along the call chain?
Same as before
For async
foreach (await MyBaseType item in Collectionish)
{
// …
}
For non-async
foreach (MyBaseType item in Collectionish)
{
// …
}
@benaadams I'm thinking specifically of method signatures. For example, I'll commonly consume a (possibly variant) IEnumerable
as a constructor param. If IEnumerable
has two type parameters, what do I do? I'm not adding a generic parameter TCtorParam1Enumerator
to the whole class just to be able to receive an IEnumerable
in the constructor.
Or in a method that takes multiple enumerables, won't I be forced to expose one generic type parameter in the method signature for each enumerable?
Or, let's say I want to see if I can enumerate MyBaseTypes from some object. What does that look like? var enumerable = collection as IEnumerable<MyBaseType, ?>;
The real issue is that TEnumerator
is an implementation detail of the enumerable that no consumer will ever want to care about or have polluting the API.
@jnm2 This problem is solved in Rust/Swift with "associated types" i.e. generic types that we might not care about most of the time, something like this:
interface IEnumerable<out TItem> {
type TEnumerator;
...
}
However, this probably needs proper CLR support and an extensive type inference among other things.
@alrz That's where my mind went; I figured some language had a solution for this. But what if we want IAsyncEnumerable
before or without those changes to the CLR?
The implementation for async
, yield
ing methods need not use the enumerator generic:
a user might type:
async IAsyncEnumerable<int> Foo() {
yield 1;
await SomethingAsync();
yield 2;
}
The data type that the compiler generates could be:
public struct GeneratedMachine : IAsyncStateMachine, IEnumerator<int> {...}
And the method could become:
IAsyncEnumerable<int> Foo() {
var sm = default(GeneratedMachine);
sm.ThisArg = this;
sm.State = -1;
return AsyncEnumerableBuilder.Create<int, GeneratedMachine>(sm);
}
(or some similar syntax)
The built in implementation would still be allocating, but at some point in the future (when generators come) a non-allocating version could be generated which makes use of the named machine class so that it could be exposed on the method signature.
Today's C# language design meeting was about async streams. I'm putting these notes up mostly raw and unedited.
IAsyncEnumerable<int>.Configure(cancel, false)
will produce a ConfiguredAsyncEnumerableConfiguredAsyncEnumerable<T>
will not implement IAsyncEnumerable<T>
, but will fulfill the foreachable pattern.GetEnumerator(CancellationToken cancel = default(CancellationToken))
foreach
on iterators also just invokes the syntax .GetEnumerator().GetAsyncEnumerator()
not .GetEnumerator()
for returning an async enumerator.foreach (await var x in xs) { ... }
<-- there will be special syntax to indicate that foreach is async different.foreach (var x in xs)
syntaxThe interface design. Do the async ones inherit the synchronous ones? <-- no, don't like implicit syncification of async stuff.
What about IAsyncDisposable as well? <-- not sure. Think of RX CompositeDisposables. Or ASP.NET request scoping
which calls Dispose on the things that are done with the request [Oren]. That reflection-based stack would have
to be enlightened about preferring IAsyncDisposable.
Since lots of components do a runtime type check...
Not clear if "block" or "fire-and-forget" is the best approach. I guess block for client-side resources.
If there is any async work needed to obtain the enumerator, it's always possible to defer that to the first MoveNextAsync.
When you foreach over a normal IEnumerable
, it gets back an object of static type IEnumerator
. Then for every element in the sequence it calls IEnumerator.MoveNext()
and IEnumerator.Current
. That is two interface-dispatches. They're not the fastest. Can we reduce it to just one interface-dispatch? e.g.
Task<bool> TryGetNextAsync(out value) // <-- doesn't work because you can't use out parameters for async work
while (await enumerator.GetNextChunkAsync())
{
while (enumerator.TryMoveNext(out value)) <-- this would work since it's not async
{
}
}
This API pattern would reflect the chunkiness of typical buffered async streams. The hope is that the inner loop could be more efficient.
Indeed it avoids having both ".Current" and ".MoveNext" (two interface calls).
Concern: when you write an async iterator method, then each chunk would have only 1 element inside it, so you'd still have two interface dispatches.
Concern: in ADO.NET, each payload might have < 1
, == 1
or > 1
frames in it. I guess that GetNextChunkAsync() would keep fetching payloads until it got >= 1
frames in it.
This goes back to an EntityFramework observation that they'd like to have types expose both a synchronous foreach (var x in e)
API, and also an asynchronous foreach (await var x in e)
API.
Presumably if a type implemented both, then its synchronous IEnumerable version would typically block.
It would always be possible to write a helper method IAsyncEnumerable<T>.AsEnumerable()
which takes an async stream and produces a blocking synchronous enumerable from it.
Today we have no (few?) cases of implicit async->sync conversion. Conclusion: no, IAsyncEnumerable
Should we do these things for non-async iterator methods to? -- (1) allow a custom builder and a custom return type, (2) maybe have a more performant pattern that avoids double-interface-dispatch and maybe even avoids heap allocation entirely.
If we do, then it'd be weird to require the modifier "async" on the iterator method.
Also if we do, then I can imagine that there'd be a lot more use of "custom iterator return types" than there is for "custom tasklike return types". The only custom tasklike you really need is ValueTask[AsyncMethodBuilder]
rather than a proper language keyword.
CONSUMPTION SIDE: how do we pass cancellation token into the thing? Here are possible ways we could pass in Cancellation and ConfigureAwait...
var xs = GetAsyncEnumerable();
foreach (await var x in xs.WithCancellation(cancel))
{
}
foreach (await var x in xs.ConfigureAwait(false))
{
}
foreach (await var x in xs.Config(cancel,false)) { ... } <-- machinery for passing it to the right place
foreach (await var x in xs) (cancel) { ... } <-- second parameter list passed to implicit .GetAsyncEnumerator() call
foreach (await var x in xs) .ConfigureAwait(false) { ... } <-- implicitly done to the implicit .GetAsyncEnumerator() call
PRODUCTION SIDE: If cancellation were to be passed in at the enumerable level, it's not a good idea. Here's how it would look:
async IAsyncEnumerable<int> f(CancellationToken cancel)
{
await Task.Delay(1, cancel);
yield return 1;
await Task.Delay(2, cancel);
}
Problem is that this way means that a single cancellation will destroy the factory for hereafter. It can't provide a fresh cancellation token for each one.
_Interesting point_. This style of cancellation makes it harder to _compose_ async streams like we do with LINQ.
But it would satisfy the 99.9% case though!
PRODUCTION SIDE: If cancellation were to be passed in at the Enumerator level, how would it look?
// Idea 1: contextual keyword "async" inside an async iterator method
// This is a bigger more general more powerful feature.
async IAsyncEnumerable<int> f()
{
yield return 1;
await Task.Delay(1, async.Cancel)
}
// Idea 2: A async iterator method can take a SECOND parameter list, which is used
// for the implicit declaration of the GetEnumerator method.
async IAsyncEnumerable<int> f() (CancellationToken cancel)
{
yield return 1;
await Task.Delay(1, async.Cancel)
}
Let's consider an async iterator method with an "await" inside a "finally". It looks like the code below. The only way to support that "await inside finally" is if we have an IAsyncDisposable
.
async IAsyncEnumerable<int> f()
{
try
{
yield return 1;
await t;
}
finally
{
//await t;
}
}
// Here's the expansion of consuming this:
{
IAsyncEnumerator<int> enumerator = enumerable.GetEnumerator(cancel);
try
{
while (await enumerator.MoveNextAsync())
{
var x = enumerator.Current;
Console.WriteLine(x);
}
}
finally
{
await? (enumerator as IAsyncDisposable)?.DisposeAsync();
}
}
_Interesting point_. What kind of ConfigureAwait or CancellationToken will be passed to the implicit
await DisposeAsync()
. ANSWER: we establish the convention that any time you have anIAsyncDisposable
in hand, you trust that it was already created with an appropriate cancellation token and configure-await inside it. In an async using block, you'd probably pass a cancellation token to the method that constructed and return to you the IAsyncDisposable.
If we have IAsyncDisposable, then inside the body of an async foreach
or using
clause, there are several control-flow constructs which do an IMPLICIT AWAIT. I never like implicit awaits. One idea is to explicitly signal with the await
keyword that, inside such a code block, these control-flow constructs will do an await (or, in the case of the return statement, maybe more than one)...
foreach (await var x in xs)
{
Console.WriteLine(x);
if (x > 10) await return 5; // could do arbitrarily many awaits!!!
await break;
await goto label;
}
await using (var ad = GetAsyncDisposable())
{
// the same "await return" and "await break" and "await goto"
}
EntityFramework. From an expressivity perspective, e.g. ADO.NET, it really is desirable to have an async disposal of the stream.
EntityFramework. We care about co-existence of Sync and Async.
If an object implements both IDisposable and IAsyncDisposable, then which one should folks pick who have an object in hand?
Guidelines used to say "If you implement IDisposable then you should call Dispose". THey will have to be
updated to say that IAsyncDisposable is also valid. And double-dispose should continue to not throw, even if one of them is disposed via IAsyncDisposable.
What syntax to use for async foreach? How about using the "async" modifier? ...
async foreach (var x in xs) { ... }
async using (var x = f()) { ... }
I like "await" because async says "here is something that can await" while await says "I might yield"
await foreach (var x in xs) { ... } <-- what are we actually awaiting? It looks like we're awaiting the entire foreach?
await using (var x = f()) { ... }
foreach (await var x in xs) { ... }
using (await var x = f()) { ... }
We've said we want to add IAsyncDisposable
. If so, then there will have to be an "async using" construct. It would probably be nice if the syntax for async-using and async-foreach be aligned.
xs.Select(x => x + 1)
xs.SelectAsync(async (x) => x + await t)
We'd have to change the translation of LINQ syntax so that if an await is inside the body then it should stick in an async modifier on the lambda.
What to do about overloads? Here they are:
1. IEnumerable<U> Select<T,U>(this IEnumerable<T>, Func<T,U> lambda)
2. IAsyncEnumerable<U> Select<T,U>(this IEnumerable<T>, Func<T,Task<U>> asyncLambda)
3. IAsyncEnumerable<U> Select<T,U>(this IAsyncEnumerable<T>, Func<T,U> lambda) <-- this exists as of several years in IX
4. IAsyncEnumerable<U> Select<T,U>(this IAsyncEnumerable<T>, Func<T,ValueTask<U>> asyncLambda) <-- ValueTask is new, hence not much of a corner problem!
But consider if someone wrote code xx.Select(x => Task.FromResult(y))
. It would be weird if this used to generate IEnumerable<Task<int>>
, but we
added an overload, and now it generates IAsyncEnumerable<int>
. -- So how about we just remove the first two overloads? If you want to convert to async, then you call .AsAsync() on an IEnumerable
What to do about syntax in LINQ expressions? If the following treated await
as a keyword rather than identifier, it would technically be a breaking change:
var yy1 = from x in xx.AsAsync()
select x + await (t);
How about: if you're writing a LINQ expression inside an async method, then await
is already a reserved keyword. For the cases where you want to write an async LINQ expression inside a synchronous method, you can also stick an async
modifier there:
var yy2 = async
from x in xx.AsAsync()
from y in yy
select x + await t;
IQueryable<U> Select<T,U>(this IQueryable<T>, Expression<Func<T,U>> quotedLambda)
What about executing IQueryable asynchronously? I.e. what about introducing expression trees for await? or do we want an entirely new IAsyncQueryable<T>
interface?
Also, what about async methods like .FirstAsync() ?
BCL: we should take what IX has and see what we want to promote.
I don't understand if IAsyncQueryable represents an async network connection to a sync database, or whether it represents a connection to an async server.
We need to discuss expression trees for await.
Idea for non-allocating iterators. The central problem is that if you want a non-allocating iterator, then callers of your method have to declare on their stack an instance of the corresponding state machine. Therefore they need to know the state machine type.
Folks in this thread have suggested possibilities, e.g. @CyrusNajmabadi suggested some magic similar to how VB does it (VB creates unutterable typenames for anonymous methods; it also implicitly creates utterable backing properties when you declare a WithEvents property). @bbarry suggests using the generator feature.
Here's another idea, courtesy of @gafter. It's born from the observation of how we already write non-allocating iterators over ImmutableList<T>
. The way it's done today is that you declare the type of your state machine explicitly, and you hand-author the state machine. How about instead if you still have to declare the type, but the compiler fills in the implementation?
public async iterator MyEnumerator<int, struct SM> GetEnumerator() { ... }
This declares a method named GetEnumerator
. It also declares alongside that method a nested type named SM
with the same accessibility as the method. The compiler fills in the definition of this nested type itself.
This way, consumers who invoke GetEnumerator() will be able to keep an instance of the state machine on their stack.
I think that should be:
that method a nested type named
MyEnumerator
But yes, this is definitely an approach that can work. I've discussed this with @MadsTorgersen before and i think it's quite nice.
another option would be to just have public async struct IEnumerator<...> GetEnumerator()
and have the compiler come up with a struct type with an unutterable name. The downside is that we don't generally like having public types with awful .Net runtime names. I personally don't mind, but it's probably a lot more work rather than just having the user declare teh name themselves.
We can avoid a new contextual keyword by having them do something like:
public async struct MyEnumerator GetEnumerator()
btw.
Note: i was enamored with the form public async struct IEnumerator<...> GetEnumerator()
because it's somewhat similar to what rust does where you just state what traits you have, but that doesn't require that you actually be allocating on the heap. I think it would be very interesting to have that capbility in our language across a wide gamut of cases. But it's a much larger thing to design if we go that route.
I am not really tied to the generator feature (I am, but for so many other reasons I'm looking forward to). I'd be perfectly happy if this just worked:
public partial struct SM {} //user puts nothing here
public async ValueAsyncEnumerable<int, SM> Foo() { ... }
I'm simply saying if the consuming foreach
lowering accepts the pattern that permits the allocation free form to exist, we get the feature for the cost of someone (perhaps me if nobody else does it first) writing that generator library with no further changes to the compiler.
await? (enumerator as IAsyncDisposable)?.DisposeAsync();
caught my eye. Please say that await?
will happen! It's the exact analog of the synchronous foo?.Bar()
and it's made its absence felt.
@jnm2 I've missed await? e
as well. I don't think anyone's made a solid proposal for it. I wrote it here just because I was too lazy to write it out longhand. To implement it would be a small amount of straightforward work.
But...
await? e;
await (e ?? Task.CompletedTask);
var x = await (e ?? Task.FromResult(0));
This would only work for await statements (i.e. where you discard the result). It looks pretty cryptic, so much so that I don't think it's better than the longhand using ??
. In my personal opinion it wouldn't meet the language design bar.
But...
If you like it, please file an issue, and we'll get C# LDM to discuss it!
I've missed await? e as well. I don't think anyone's made a solid proposal for it.
I'm trying to figure out the scenario here. This seems to indicate that this would happen when you had a null task and wanted to 'await' it.
However, that seems super weird to me. First, if you have an actual 'async' method, you would never get a null task. So this would only be because your method chose to return null. But why use null when Task.CompletedTask exists?
In general, you use null because either every value in your domain is taken, or there is no good sentinel value that your domain can provide. Neither of those is the case here. It seems like you'd have to go out of your way to use 'null', which seems odd given how poorly it would interact with everything else in the task world.
@ljw1004
My vote for await
within the statement, e.g.:
foreach (await var x in xs) { ... }
using (await var x = f()) { ... }
It just feels cleaner to me.
i'm a fan of :
c#
async foreach (var x in xs) { ... }
async using (var x = f()) { ... }
It's a 'foreach' statement that operates 'asynchronously'. It's a 'using' statement that operates 'asynchronously'.
'await var x' reads very strangely to me. While i can _sorta_ see it in the foreach (you're going to await the availability of each variable), it doesn't make any sense to me for 'using' (where you're going to await the disposal of the instance).
'await' very much feels like a marker telling me "ok, right here i'm going to yield and come back once the value is available". But that's not what i'm trying to indicate with these constructs. Instead, i'm just trying to say that the operation i'm performing will operate asynchronously instead of synchronously.
'async' currently means 'awaits' can/will happen inside (i.e. when you have an async-lambda). That fits what's going on here, except that it's also telling the compiler: insert the 'awaits' implicitly as per the pattern we spec out.
@CyrusNajmabadi
I'm trying to figure out the scenario here. This seems to indicate that this would happen when you had a null task and wanted to 'await' it.
It's simpler than that. We used to have to do if (foo != null) foo.Bar();
and syntactic sugar was added so that ?
replaced the if
: foo?.Bar();
The same sugar could be applied to if (foo != null) await foo.BarAsync();
. It feels more natural beside the synchronous version.
Hey look what I found: https://github.com/dotnet/roslyn/issues/7171
Ah... i see the use-case now. Note: i could see this being a general null-accepting pattern elsewhere in the language. i.e. "foreach? (var v in foo?.GetValues())"
Essentially, the case is "i'm using ?. elsewhere, so that's how null flows into the system. But then common language patterns (await/foreach/etc) explicitly die on these nulls".
THanks for making this clear for me!
@CyrusNajmabadi
'await' very much feels like a marker telling me "ok, right here i'm going to yield and come back once the value is available". But that's not what i'm trying to indicate with these constructs.
Isn't that exactly what you're trying to indicate with these constructs? You're are going to yield and come back once the operation has completed, specifically the operation of moving to the next value or of disposing of the resource. You are await
ing per the grammar and conventions as established by C# 5.0. Although I do agree that given the convention there doesn't seem to be an appropriate place to use await
with using
.
Given the choice I'd prefer await foreach
or await using
but I don't like the look of the two word keyword phrases.
ok, right here i'm going to yield and come back once the value is available
Isn't that exactly what you're trying to indicate with these constructs
No. For 'using' i'm not yielding in order to come back one the value is available. so 'using (await var v...' doesn't make sense to me. I'm eventually 'await'ing a call to DisposeAsync, bcause the _using_ itself is asynchronous.
For 'foreach' i can squint and somewhat see it. In that case i am 'await'ing the value being made avialable to me. But that's still because it's an asynchronous foreach. I'm still going to have to 'await' other things as well (eg. the DisposeAsync if the IAsyncEnumerable returns an IAsyncDisposable). To me, the overall operation (which is an aggregation of many things) happens asynchronously. As such, as hte marker is on the construct, it makes sense for it to be 'async' to me, not 'await'.
Because using (await var v...)
feels so strange to me, that leads me to want anything ther than that. And once i have something else (like await using (var v...
or async using (var v ...
), then i want my 'foreach' to match.
But my brain keeps coming around to "what's the difference between this foreach and normal foreach, or this using and a normal using". The answer is always "it executes asynchronously", and as such the async keyword seems to click with my brain the best.
Note: this is all a personal opinion. Just wanted to share some insight on how my brain groks this stuff.
@CyrusNajmabadi
No. For
using
i'm not yielding in order to come back one the value is available.
No, you're not. You're await
ing the completion of a task, that task being the disposal of a resource. That's also exactly why you can await
a simple Task
, no value involved.
But I agree, using
is a bizarre syntax case. There isn't a good place to stick the await
. I think it should follow what foreach
does just out of consistency.
But my brain keeps coming around to "what's the difference between this foreach and normal foreach, or this using and a normal using". The answer is always "it executes asynchronously", and as such the async keyword seems to click with my brain the best.
Yes, but it doesn't work with C# 5.0 where async
does not mean this. An async
method isn't any more asynchronous that a non-async
method, and has no capacity for being asynchronous without await
. If it weren't for the syntax ambiguity that required resolution it's even questionable if the async
keyword would've appeared in C# at all.
Just wanted to share some insight on how my brain groks this stuff.
This isn't the first argument over the keywords chosen and how they are used when it comes to asynchrony. But I think that C# should remain consistent with itself even if it doesn't quite align with how those words work.
And of course this is also personal opinion. 😀
No, you're not. You're awaiting the completion of a task, that task being the disposal of a resource. That's also exactly why you can await a simple Task, no value involved.
I'm amenable to that interpretation. But here's why it's not what my brain naturally clicks to. Specifically, when i use 'await', i'm awaiting 'something awaitable'. (And, for this discussion, i'm just going to say "i'm awaiting a task"). As such, i should be able to take that task and do whatever else i wanted with it. i.e. i can write:
``` c#
await complex_expr;
Or i can do:
``` c#
var v = complex_expr;
await v;
That's not the case here with using/foreach. They are not tasks themselves. They are not 'awaitables' themselves. I cannot do:
c#
var v = foreach (...) { }
await v;
As such, this doesn't compose properly with the construct as we originally added it. I await an awaitable thing. That awaitable thing is something i can otherwise manipulate. As that's not what's going on here, 'await' feels strange to me.
On the other hand, 'async' fits here for me both because it describes the operation of the statement (it executes asynchronously), and because it allows for 'awaits' inside (albeit ones that are implicitly inserted by the compiler).
Note: i am amenable to 'await statement', it's not awful to me :) And i far prefer it to 'statement (await var'.
'await var x' reads very strangely to me. While i can sorta see it in the foreach (you're going to await the availability of each variable), it doesn't make any sense to me for 'using' (where you're going to await the disposal of the instance).
My thoughts exactly.
I agree that it would be nice to have similar syntax for "async foreach" and "async using", but I think it's more important for each to have an appropriate and intuitive syntax, even if it's not the same one.
I like async foreach
better, it is also what F# uses to consume async sequences,
async { for url, length in pages do
printfn "%s (%d)" url length
} |> Async.Start
Actually, async
just defines an _asynchronous context_ which may await
(yield) at some point. Note that you only use asyncSeq
if you want to produce an async sequence which will be donated by async IAsyncEnumerable<T>
or async iterator IAsyncEnumerable<T>
in C#.
As for async using
I think it reads better also in case of RAII (#181), because you don't await
in the declaration location. async using res = new Res();
it starts an asynchronous resource acquisition block.
Should we have a non-generic version of async iterator methods? e.g. if we're producing something out of reflection?
I really like to see this be tackled in the language (#6248?). Non-generic/generic interface hierarchies are really awkward and they are not limited to iterators. You can have the same type with IAsyncEnumerable<object>
anyways. And of course, I really hope it does not end up being invariant.
The foreach itself is not asynchronous, asynchronous is the sequence it is iterating over. And also the using block is not asynchronous, it just operates on the resource that has asynchronous disposal. Therefore I propose this syntax:
foreach (async var x in xs)
and
using (async var x = new AsyncDisposable())
It also removes ambiguity in the following example:
async
using (var a = new Disposable())
using (var b = new Disposable())
{
} // which resource is disposed synchronously and which asynchronously?
I prefer async
.
await var x
, await using (...)
, and await foreach (...)
look like awaiting for a result of var
, using
, and foreach
_expressions_. These are not expressions so far, but possible in future, especially var
expressions (declaration expressions) with high probability.
The only thing that I don't like about async foreach
is that it blocks the whole iteration once it's awaiting a task. It would be nice to be able to "move on" when we are iterating over an async sequence (#8014).
Still perfer await
before in
as it suggests a per loop await
foreach (var x await in xs)
As the await happens on each loop as per var x = await xs
or suggesting a pattern similar to
while (var x = await xs.GetNextAsync())
Whereas await before var suggests single await
foreach (await var x in xs)
Rather than a per loop await
The function containing the foreach would need to be async, but adding async to the foreach doesn't seem to add any clarity.
On the other hand for using
that would be closer to an async
item?
using(async var x = f())
{
x + await t;
}
Similar to how the aforementioned lambda's work
xs.SelectAsync(async (x) => x + await t);
Though async lamdbas often confuse me, so not sure its the best pattern; but it does match what's there more?
I love await in
.
Following on from that is an async using Task returning?
var t0 = using(async var x = f())
{
x + await t;
}
var t1 = using(async var x = f())
{
x + await t;
}
await Task.WhenAll(t0, t2);
Or
await Task.Run(using(async var x = f())
{
x + await t;
});
Should you be awaiting an async using?
await using(async var x = f())
{
x + await t;
}
@benaadams
Still prefer await before in as it suggests a per loop await
Then await in await
is possible.
C#
foreach (var x await in await GetDataAsync())
It will be confusing novices :)
If the GetDataAsync
returns Task<IAsyncEnumerable<T>>
though is that any different to
foreach (await var x in await GetDataAsync()) { ... }
I'm not fan of async
at all. That drastically changes the existing meaning (which is 'awaiting is allowed') or introduces the concept of implicit awaiting, something which I am also not a fan of. If the statement causes an await, I want to see the keyword await
even if it's in a weird place.
await in await GetDataAsync()
Actually I think it's motivating how clear and precise this is. It tells novices that they are initially awaiting before the loop starts and they are also awaiting each item in
the collection.
@jnm2 That's my issue with async
, too. I don't disagree that the current usage is ambiguous/confusing, but I think that the language shouldn't redefine what it means based on where it's used.
As for foreach (await var x in await GetDataAsync()) { ... }
or foreach (var x await in await GetDataAsync()) { ... }
I think that both convey pretty accurately that you're awaiting a scalar containing a sequence. I prefer the former as it puts a little more space between where the two await
s go which I think can aid in avoiding confusion. I can see the StackOverflow posts where the developer put await
on the wrong side of in
for the situation.
Still not getting why that would happen? IAsyncEnumerable
isn't awaitable IAsyncEnumerator
is awaitable; so as long as it wasn't returning itself in the common case
foreach (var x await in await data)
Would be a compile error?
foreach (var x await in data)
Would be fine
foreach (var x in await GetDataAsync())
Would sync enumerate over an IEnumerable
that was async returned from GetDataAsync()
(e.g. Task<IEnumerable>
)
@benaadams
For IAsyncEnumerable<T>
by itself, probably yeah, assuming nobody sticks a GetAwaiter
extension method somewhere. But I think double-await
would be perfectly legal for Task<IAsyncEnumerable<T>>
or anything that supports GetAwaiter
with a Result
that has GetAsyncEnumerator
.
Concern: when you write an async iterator method, then each chunk would have only 1 element inside it, so you'd still have two interface dispatches.
Why? If I write code like:
``` c#
async IAsyncEnumerable
{
foreach (var batch in batches)
{
var results = await batch.GetAsync();
foreach (var result in results)
{
yield return result;
}
}
}
```
Then the compiler-generated code could produce one chunk for each batch, since there is no await
while a single batch is being processed, no?
@svick I guess you're right! GetNextChunkAsync()
will return an IEnumerable<T>
. And this IEnumerable<T>
can just keep returning values as it encounters them, until it comes to an await, whereupon it will say that it has no values left. Nice.
If
``` C#
using (var disposable = GetDisposable())
{
// ...
}
roughly stands for:
``` C#
var disposable = GetDisposable();
try
{
// ...
}
finally
{
disposable.Dispose();
}
So, if an async disposable would work roughly like this:
``` C#
var disposable = GetAsyncDisposable();
try
{
// ...
}
finally
{
await disposable.DisposeAsync();
}
it should be expressed like this:
``` C#
await using (var disposable = GetAsyncDisposable())
{
// ...
}
As for an async foreach
the foreach
syntax doesn't explicitly express an assignment to the variable. It just declares it and the assignment is implicit.
So, how would one express a variable that awaits for its value? async var v
? var async v
?
I think I prefer foreach(await var v in vs)
.
Could we constrain the T
to awaitable? ie IAsyncEnumerable<async T>
(See #13928)
Could we constrain the T to awaitable? ie
IAsyncEnumerable<async T>
Interesting idea, but it should probably be a separate proposal (what you do with the T
isn't relevant to how async sequences are handled).
Anyway, I'm not sure how it would work; await
is pattern based and is a purely language-level feature, so I don't think the CLR could enforce the constraint.
_Thoughts on syntax_
When it comes time to retire this thread and start a new one, I'm going to start _two_ new threads: one for syntax, one for semantics.
Several architects met to discuss their use-cases of async streams, and what light those use-cases shed on API patterns. These are the meeting notes. It was a free-flowing and fast-moving discussion, and I took notes as best I could. Sorry in advance for any errors that crept into the notes, or confusing bits.
Q. What level should you cancel at?
IAsyncEnumerator.MoveNextAsync(CancellationToken cancel) // <- at the MoveNext level
IAsyncEnumerable.GetAsyncEnumerator(CancellationToken cancel) // <- at the enumerATOR level
IAsyncEnumerable GetStream(CancellationToken cancel) // <- at the enumerABLE level
It doesn't make sense to cancel one MoveNextAsync()
and then continue to use the rest of the sequence.
If you have an enumerABLE then it's fine if the only cancellation granuality is at the enumerator. But if enumerATOR is the only one, then granularity of cancellation is more subtle.
EntityFramework. They're content with current IX shape of IAsyncEnumerable. For cancellation, IX takes it in MoveNextAsync(CancellationToken)
, but they don't actively use that per-move-next granularity... if we said just one cancellation token per enumeration then no customer would notice. However they are concerned about patterns that associate cancellation with the whole enumerable.
EntityFramework. It does mixed server+client evaluation - figures out what parts we can do on server vs must do on client. We already flow cancellation through to server. So if we have one entry point for cancellation, that wouldn't be a problem.
ServiceFabric would be okay with doing a timeout (i.e. cancellation) once per iteration, rather than once per MoveNext.
Q. Is it enough to offer cancellation only in the GetAsyncEnumerator()
pattern itself? Or must it be common to all async streams via the IAsyncEnumeraable
interface?
I don't know if that question is a meaningful one. But it lead to an important use-case...
LINQ scenario. Imagine .Select(_).Select(_).Where(_)
and you want cancellation to be propagated through each of the enumerators in the chain.
It's still an open question of you'd actually cancel when you have one of those LINQ queries. PLINQ had an operator .WithCancellation()
which automatically propagates it to the whole query. Maybe we'd do the same.
Q. Would it be a bad thing to suggest using a default value in GetAsyncEnumerator(CancellationToken cancel = default(CancellationToken))
? Against the CLS guidelines?
Q. Can you enumerate something without requiring folks to provide a cancellation token?
ServiceFabric scenario. It is _critically important_ here to force folks to pay attention to cancel.
We don't expect most users to see the GetEnumerator method. Most users will foreach over it.
Maybe a Roslyn analyzer could warns if someone does foreach (await var x in e)
upon something an e
that hasn't yet had cancellation passed to it? Or you could even have an enumerable-like type which _isn't even foreachable_ (doesn't satisfy the foreach pattern), and you have to explicitly call .WithCancellation()
on it to get back something that can be foreach'd.
_This is interesting_. We can make
IAsyncEnumerable.WithCancellation(...)
return anIAsyncEnumerableWithCancellation
type, if we wish to enforce anything about it.
foreach (await var x in xs) cancelToken { ... } // <- a space specially for the cancellationToken
foreach (await var x in xs) (cancelToken) { ... } // <- a second argument list to be passed to the implicit call to GetEnumerator()
ServiceFabric. Imagine you're iterating over an async sequence. It likely buffers under the hood, e.g. fetching 100 records in one go. There are cases where each fetch actively costs money. So you'd like a way to know in advance whether the next MoveNextAsync()
will be on a buffer boundary or not. This could be represented either by IAsyncEnumerable<IEnumerable<Record>>
, or by just IAsyncEnumerable<Record>
with some additional signal.
EntityFramework. This doesn't have an obvious ABI way to know where the boundaries of the buffer are - you get a payload from the server and you don't know if you're going to find == 1
, > 1
or < 1
part of a record in the buffer. There isn't a good way to know whether the next thing can be resolved immediately or not. If IAsyncEnumerable wanted a functionality to ask "will the next one be synchronous or not", then ADO.NET can't answer that.
_*This is interesting_. There are circumstances like ServiceFabric where knowing whether the next one will be async is interesting, and others (like async iterator methods and ADO.NET) where it's not possible.
The equivalent of GetAsyncEnumerator
in ADO.NET is ExecuteReaderAsync()
(i.e. the GetEnumerator equivalent _itself_ is async). Then it has multiple ReadAsync() on it. But in EF this is actually wrapped up inside IX, and the asynchronous work of ExcecuteReaderAsync is deferred until the first MoveNextAsync.
interface IAsyncEnumerable<T> { IAsyncEnumerator<T> GetEnumerator(); }
class C : IAsyncEnumerable<int>, IEnumerable<int> { ... not very nice ... }
// vs
interface IAsyncEnumerable<T> { IAsyncEnumerator<T> GetEnumeratorAsync(); }
class C : IAsyncEnumerable<int>, IEnumerable<int> { ... works fine ... }
foreach (await var x in c) // <-- consume it asynchronously
foreach (var x in c) // <-- or consume it synchronously; it's up to you
If queries are still going to be executable both synchronously and asynchronously, then a single object would have both .GetEnumerator() and .GetAsyncEnumerator(cancel). This looks weird, but we hope that users don't typically see any calls to GetEnumerator anyway.
EntityFramework. EF has some linq extension methods that are asynchronous. EF team wanted to provide an async experience. .FirstAsync(), .CountAsync(). ALso, if you write the LINQ, and then do ForEachAsync() on it or .ToListAsync(). In general want to get out of this hack business. Our IQueryables happen to also be IAsyncEnumerables which means that so long as you stay in this world, we can defer the decision about how to execute it until right at the end. Likewise our IQueryProviders.
_This is an interesting question_. Can .GetEnumerator() and .GetEnumeratorAsync() exist on a single type?
[I can't find my notes on this topic]. But it sounded like there's a solid need - from EntityFramewwork at the very least - to support async disposal.
Actors are built on ServiceFabric. We get proxies to various types of sequences - we do "Get me an observable", and get a local proxy to a remote observable. Then write we queries on it. Then we call SubscribeAsync()
. Almost exactly the same as RX. The only difference is that Subscribe is async, and it takes a query identifier. There's also a DisposeAsync()
to dispose of a bunch of them.
Bart. We have a Metadata API to discover what's available in the service. It exposes IAsyncQueryable, with all the expected extension methods on them. When you do the .ForEachAsync()
extension method, it sends the expression tree to the server, and translates the async expression tree to the equivalent synchronous query to execute on the server. This is where client really wants ability to write metadata queries to the server to say what are the streams that are available, but don't want to do it using classic blocking IQueryable.
Exactly the same thing in Reactive, e.g. when folks want to do async inside .Where() clauses. Even though extpression trees can't capture await, we let them do .Where(x => DummyAwait(...)) for a method we defined T DummyAwait<T>(Task<T> x)
. The expression tree shows a call to this method. It's the responsibility of whoever consumes (interprets/compiles/executes) that expression tree to do the right thing -- maybe do it synchronously on the server, or do it asynchronously on the client.
EntityFramework. We have similar scenario for queries that need to eb translated.
We don't have any particular APIs in the .NET Framework that are burning to move over to IAsyncEnumerable.
Probably most of the libraries that adopt IAsyncEnumerable will be distributed-systems APIs.
IAsyncEnumerable
is currently defined in IX (and used by EntityFramework). It is also separately defined in ServiceFabric.
We'd like to make sure it has a common home.
IX has already implemneted the common LINQ operators for IAsyncEnumerable - has parity. (It used to be done with TaskCompletionSource but they've just finished a rewrite into familiar System.Linq iterator pattern, but would still benefit from language features for async enumerable).
The remaining TODOs on this IX project is: we don't have any Select(this IAsyncEnumerable<T> xs, Func<Task<T,U>> lambda)
- i.e. don't have anything that takes in a TaskValueTask
-returning lambdas.
IX is a .NET Foundation project.
ServiceFabric. This defines its own IAsyncEnumerable. ServiceFabric provides a reliable distributed dictionary. It lets you iterate over key+value pairs. It lets you serialize key+value pairs to clusters. Its current IAsyncEnumerable has Reset()
and MoveNext(CancellationToken)
.
Note: if one particular library needs some special functionality in its async enumerator type, it can expose extra fields/methods on the concrete enumerable, even if they're lacking from the main IAsyncEnumerable.
What are the boundaries of a buffered fetch
Could do something like
interface IAsyncEnumerable
{
// ...
bool NextMoveWillNotYield();
}
Which will help with the fast path batch pattern @svick outlined
@benaadams
I like how @stephentoub's proposal does this.
C#
var e = enumerable.GetEnumerator();
while (await e.WaitForNextAsync())
{
while (e.TryGetNext(out T item))
{
…
}
}
Folks like @alrz have talked about covariance in this thread already.
// VS2010 introduced this conversion in C#:
IEnumerable<string> --> IEnumerable<object>
// I think it's important to allow this one too:
IAsyncEnumerable<string> --> IAsyncEnumerable<object>
That has implications on the pattern, also as mentioned by @alrz:
interface IAsyncEnumerator<out T>
{
bool TryGetNext(out T item); // error: out parameters can't be covariant
Task<(bool, T)> TryGetNext(); // bad, involves too many allocations
ITask<T?> GetNextAsync(); // again, involves an allocation each time, and needs new lang features
T TryGetNext(out bool succeeded); // is pretty weird
}
The only one of these that works without extra allocations is the last one. It's not nice on the consumption side...
// I like consuming "bool TryGetNext(out T value)"
while (en.TryGetNext(out var x)) { ... }
// I don't like consuming "T TryGetNext(out bool succeeded)"
while (true)
{
var x = en.TryGetNext(out bool b); if (b) break;
...
}
// I suppose I could consume it in just a single line...
while (TryGetNext(out var b) is var x && b) { ... }
That last single-liner was proposed by @MadsTorgersen and uses pattern matching.
The option T TryGetNext(out bool succeeded)
is also weird because it involves synthesizing a returned value even in cases where it failed. That's doable of course with default(T)
but feels weird.
This solves the "too many allocations" problem:
c#
Task WaitForNextAsync();
(bool, T) TryGetNext();
Whatever the outcome, I strongly feel that variance is more important than ease of consumption.
@jnm2 As @svick pointed out, that is not possible because tuples are invariant, so it is not an option.
Oh right. So we're forced to have a T
return value, whether from a Current
property or the MoveNext
method, or an IMoveNextResult<out T>
or T[]
which is an allocation. Returning T?
is definitely the coolest here and I want that new language feature, but I'm guessing waiting for that isn't a good option.
Covariance is worth it.
Someday, out
could be introduced as a first-class construct to the CLR separate from ref
so that out params don't block covariance. Or, since we should move to tuple returns instead of out parameters... why again is it good for tuples to be invariant?
If we're going to choose the two-method path, I think the following has a sensible structure,
public interface IAsyncEnumerator<out T>
{
Task<bool> WaitForNextAsync();
T? TryGetNext();
}
With non-nullable reference types on the way, I think this should be really considered as an actual use case.
@alrz what if null is a valid value of the sequence? It's not very common, but not unheard of either.
@alrz
You very likely couldn't have T?
without either class
or struct
constraints.
@thomaslevesque How many times it occurred to you to return null
form an async Task<T>
method? Besides of being rare, it's absurd. Since ADTs are also on the table, I think you should use something like Maybe<T>
to indicate that the sequence may yield Nothing
.
@HaloFour That is filed as #9932. It would be unfortunate to be not able to use an unconstrained T?
IMO.
@thomaslevesque How many times it occurred to you to return null form an async Task
method?
About as often as returning null from a method that returns T
, which is not rarely at all. Admittedly it's much less common in sequences, but I don't think the API should forbid it entirely.
while (TryGetNext(out var b) is var x && b) { ... }
I kinda like that. x
being readonly
is an interesting bonus also. It is unfortunate that this call is backwards compared to every other TryGet*
method out there though.
I don't think handwritten consumption shapes should weigh heavily on the design here.
@alrz
That has two problems. First it binds this proposal to the fate of another must-less-certain proposal, one that would likely require CLR and BCL changes in order to accomplish. Two, it still results in the loss of variance since structs can't be variant.
_(I'm summing up some currently-under-discussion design points)..._
What should the async foreach pattern be like? First option is to be familiar like IEnumerable
. Here's an example type that would satisfy the pattern:
interface IAsyncEnumerable<out T> {
IAsyncEnumerator<T> GetAsyncEnumerator();
}
interface IAsyncEnumerator<out T> {
T Current {get;}
Task<bool> MoveNextAsync();
}
The other option is to be more efficient. We can be more efficient in a few ways: (1) by avoiding heap allocation and returning just structs which include state machine and method builder and enumerator, so the caller can allocate them all on the stack; (2) by avoiding the double-interface dispatch; (3) by having a tight non-async inner loop. There's been lots of discussion on fine-tuning the precise best way to achieve these efficiency goals, but here are some simplistic versions:
// (1) avoiding heap allocation entirely
// Declaration side:
async iterator MyAsyncEnumerable<int, struct StateMachine> GetStream() { ... }
// Consumption side:
foreach (var x in GetStream()) { ... }
// (2) avoiding the double-interface-dispatch
interface IAsyncEnumerator<out T> {
Task<Tuple<T,bool>> TryGetNextAsync()
}
// (3) avoiding async when working through the batch...
while (await enumerator.GetNextChunkAsync())
{
while (enumerator.TryMoveNext(out value)) { ... }
}
_As the discussion has progressed, I've seen the "efficient" versions become steadily less appealing..._
Heap allocations. Why do we care about avoiding heap allocations entirely? I see that eliminating heap allocation is desirable for in-memory data structures that you iterate over with IEnumerable
. But when it comes to _async_ streams, if the entire stream ever gets a cold await, then it will necessarily involve allocation. The only folks who will benefit from heap-free async streams are those who write middleware, e.g. something that can sit equally on top of either a MemoryStream
or a NetworkStream
, and still be as efficient as possible in the first case. (We did indeed see such a need when we introduced async).
In all, the heavy-weight language work needed to support heap-free async streams seems way disproportionate to the benefit. (That language work might be struct SM
like I wrote in the above code, or the ability for a method to have var
as its return type, or the ability to declare something that looks like a struct and also a method).
Heap-free streams as we've envisaged them will only apply to consumption by foreach
: as soon as you use the LINQ extension methods, then you get boxing. @MadsTorgersen and @CyrusNajmabadi spent some time exploring LINQ alternatives that avoid boxing. The gist of it was that you could pass the type of the enumerable and enumerator. This is clever, but looks pretty heavyweight.
// This is familiar LINQ extension method
static void Select<T,U>(this IEnumerable<T> src, Func<T,U> lambda)
// We could plumb more information through to avoid boxing
static void Select<T,Table, Tator, ...>(this IEnumerable<T,Table,Tator> src, ...)
At this point we waved our hands and said "We've discussed and investigated escape analysis in the past -- something where some part of the infrastructure can see that the IEnumerable
doesn't escape from the method even despite its use in LINQ queries. If we had such escape analysis then it would benefit everyone, including the folks who need it most, rather than being an efficiency oddity specific to async streams that requires everyone to rewrite their code."
Conclusion: should give up on heap-free async streams.
Avoid double interface dispatch. Sometimes we believe that interface dispatch onto a struct is one of the slowest parts of .NET. Other times we think it's cached and so pretty fast. We haven't benchmarked this yet. There's no point pursuing it for efficiency's sake unless it's been benchmarked.
The downside of "avoid double interface dispatch" is that it doesn't work nicely with covariance. And covariance is more important. The only way to retain covariance in just a single interface dispatch is something like this: T TryGetNext(out bool succeeded)
. That's hard to stomach.
One attractive feature of a TryGetNext
method is that it makes the enumerator _atomic_. (When it's split into a MoveNext
method call followed by a Current
property fetch, that can't be atomic, and will lead to race conditions of two threads try to consume an enumerator at the same time). Atomicity is nice. It means, for instance, that async streams could be consumed by a "choice" language primitive (similar to the choice operator in CCS and pi calculus, similar to what Go uses).
Avoid async when working through the batch. Let's work through concretely how this would work when you iterate over an async iterator method. There are subtleties here that aren't at first obvious...
async iterator IAsyncEnumerable<int> GetStream()
{
while (true)
{
await buf.GetNextByteAsync();
yield return buf.NextByte;
}
}
var enumerator = GetStream().GetEnumerator();
while (await enumerator.GetNextChunkAsync())
{
while (enumerator.TryMoveNext(out value)) { ... }
}
The question is, _what granularity do the chunks come in?_
The easy answer is that GetNextChunkAsync()
will progress as far as the next yield return
, and then TryMoveNext
will succeed exactly once. This is easy to implement but it defeats most of the value-prop of chunking in the pattern -- because each chunk will be exactly one item big.
A more complicated answer is that TryMoveNext
will execute as much of the method as possible. It will have the ability even to kick off an await
by calling GetAwaiter()
and then awaiter.IsCompleted
. Only when it comes to the end of the async iterator method or to a _cold await_ will it finally return false. There's something a little hairy about this, about having the await
be finished by a different method from the one that started it. Does it also have implications for IAsyncDisposable
? Not sure.
Conclusion: we should of course benchmark the single-interface-dispatch and the buffers. But they would have to show _dramatic_ wins to be worth the concommitant ugliness.
_(I'm summing up some currently-under-discussion design points)..._
We'd talked about cancellation being done in two ways:
// way 1
using (var ator = able.GetAsyncEnumerator(token))
// way 2
foreach (await var x in xs.WithCancellation(token))
To avoid the compiler+language having to know about cancellation, we could define the first method with a default parameter: IAsyncEnumerator<int> GetAsyncEnumerator(CancellationToken cancel = default(CancellationToken))
.
We'd talked about ConfigureAwait
being done only in the second way:
foreach (await var x in xs.ConfigureAwait(false))
We'd talked about a shorthand .Configure()
method to let easily both provide a cancellation token and configure the await.
We'd talked about how when you obtain an obejct that implements IAsyncDisposable
, then knowledge of how it should await/cancel has already been stored inside the object, and so a consumer need only await it.
_QUESTIONS_.
Q1. Why do we need both "way1" and "way2"? Can't we just do it with "way2"?
Q2. Normally you can do foreach (var x in xs)
and common datatypes give you back a struct type for your enumerator, e.g. List<T>.Enumerator
. Is this still possible with the .ConfigureCancellation()
approach?
Q3. Does .ConfigureCancellation()
get defined in IAsyncEnumerable
? Or is it an extension method on IAsyncEnumerable
? Or is it left to be defined by any concrete types that wants to offer it?
Q4. Does .ConfigureCancellation
return an IAsyncEnumerable
that can be composed further? Or does it return a ICancelledAsyncEnumerable
which can't be used by the LINQ combinators but which does satisfy the foreach pattern?
Q5. For folks like ServiceFabric who wish to _force_ you to provide a cancellation token, would we make it so IAsyncEnumerable
itself _doesn't_ satisfy the async foreach pattern? -- Answer: no. This can be done by an analyzer. This enforcement doesn't belong in IAsyncEnumerable
.
_(I'm summing up some currently-under-discussion design points)..._
Where is the home of IAsyncEnumerable
? There's one defined in IX (which doesn't quite satisfy our needs because it has cancellation as a parameter of MoveNextAsync). There's one defined in Azure ServiceFabric (which again isn't quite right).
It feels like the interface type IAsyncEnumerable
itself should be defined in System or mscorlib or somewhere central, similar to IObservable
. But the LINQ library methods should be provided in IX.
Not sure about the extension method IAsyncEnumerable<T> AsAsyncEnumerable<T>(this IEnumerable<T> src)
.
What do folks @onovotny think about this?
I think that it makes sense to have the interface itself live somewhere central so that methods from mscorlib or System could potentially return it as a signature with its own internal implementation.
For the home of the LINQ implementation, I would recommend IX. We would of course welcome any and all contributions from Microsoft and other teams. IX would adapt and use whatever the final signature is for the IAsyncEnumerable
, so I'm not too concerned about the current implementation having a MoveNext with a cancellation token. We would need to coordinate with partner teams, like EF, on how/when to make that breaking change, but a major version increment should cover that scenario with good justification.
I would think the extension methods should go along with IX as well -- basically, if you want to do IX Async, that's the real library to reference for the main logic.
@ljw1004 I think new extension methods for IAsyncEnumerable
should be part of the BCL, not IX/RX. The set of people using RX (or that even know about RX) is a tiny fraction of the total user base of .NET.
Personally, I also find RX really hard to use because the documentation is spotty / dated. We shouldn't tie a brand-new language feature to an old library designed for a world in which the language feature didn't yet exist.
The set of people using RX (or that even know about RX) is a tiny fraction of the total user base of .NET.
And that's even more true of IX
@MgSam @thomaslevesque
I completely agree. However I think the reason for that failing is largely due to the lack of attention Rx/Ix received from Microsoft. My opinion has always been that since Rx is so far along with its implementation of LINQ on both push and pull streams that it makes a great deal of sense for Microsoft to take advantage of their own work and build async streams on top of that. If that entailed bringing Rx under the BCL proper, which I imagine would require a bit of a refactoring, I would not be opposed to that. It likely requires it just from a consistency point of view since Rx and TPL diverged quite a bit in common areas.
It makes me sad that Rx seems to be getting more love on the JVM these days than on .NET where it was born. And here we are arguing over the details of reimplementing much of it.
@HaloFour By the time C# 7+1 ships Rx/Ix will likely be over 10 years old. It is filled with outdated paradigms and the web filled with outdated documentation.
I think a much better tact would be adding a new library for IAsyncEnumerable
to the BCL, and taking the parts of Rx/Ix that fit with the new paradigm and are worth keeping and adding them in.
This thread has raised a several good points about Ix and Rx.
Few thoughts here:
Some of the comments have addressed Rx as opposed to Ix. Ix, despite living in the Rx repository, can version and release independently of Rx. For the purposes of this thread and disccussion, I think it would be helpful to focus on Ix rather than Rx. Of course, we'd be more than happy to have discussions about the current state and future of Rx over on the Rx repo.
There was some disccussion here around the lack of discoverability around Ix. What do you think could help fix that? Ix today already has the System.Interactive.Async
NuGet package name to better reflect its core framework nature. Is it something that can be addressed by having more direct references to it from the BCL teams?
When it comes to documentation, we wholeheartedly agree that it could be made better. That's also an area that can and will be improved. We would very interested in what kinds of changes would make the documentation better.
About suggestions that Ix hasn't had the same sort of API review and was designed for a time pre-async and has outdated techniques, I would respectfully disagree. The API of Ix has been carefully thought out by the team and treated with the same level of review that an "in-box" library would have. Furthermore, Ix Async itself was rewritten this summer to use a modern approach based on what System.Linq
does today. The resulting code has fewer allocations and is much easier to reason about and debug. Code coverage for its tests is over 95%.
I would suggest that Ix overall is hardened and time-tested with very thorough test coverage and review practices. The location of the code should not matter, but lest that be a blocker, the code could split into its own repo should there be a need.
Overall, the path to implementing whatever shape of the interface resuls from this discussion is far shorter and mostly done aleady. That's a huge boost.
I would also like to state that community contributions are very welcome. In fact, it was a community member that did the initial refactor to introduce async
and await
to Ix. We're very open to external contributions.
@onovotny
We would very interested in what kinds of changes would make the documentation better.
My main issue with the documentation of Rx, is that the only API reference on MSDN is horribly out of date.
The situation seems to be even worse for Ix: there's nothing on MSDN or anywhere else, and sometimes methods don't even have XML documentation comments.
For example, when I search for "AsyncEnumerable.Buffer" (or "IAsyncEnumerable Buffer"), I find:
Interesting to compare notes with the concurrent development in JavaScript. http://www.2ality.com/2016/10/asynchronous-iteration.html
@onovotny I agree with your comments about Ix being carefully designed and implemented. As you know, we were able to switch vNext of EF Core (currently on our dev branch) to the new version that contains the reimplementation of the operators, and we appreciate the advantages of the new implementations.
That said there are still a few issues we should discuss with regards to the API aligment with the idiomatic patterns used across .NET for async. In particular we believe that query operators that return a single T
should use the Async
suffix. The fact that they don't is one the reasons we have so far kept the current IAsyncEnumerable<T>
type mostly hidden from application developers using EF Core.
In summary, I actually don't have a strong opinion on where the LINQ operators for IAsyncEnumerable<T>
should live, but if a future major version of Ix adopted the new definition of the type in the BCL and switched to names that align better with the async pattern that would help a lot.
cc @anpete
@divega AFAIK, the reason the Async
suffix isn't used is so that the operators work with the LINQ syntax conventions. If updating the LINQ syntax to resolve *Async
versions is in scope, then that would answer that, but I'm not sure that's under consideration.
@onovotny are you referring to VB query comprehension syntax? Otherwise do you know which operators map to C# comprehension syntax for which this would be a problem? As far as I remember VB provides sugar for many more operators but is strict about their return types. If I am remembering correctly, this means the current naming probably doesn't help.
To clarify, I said previously that it was about IAsyncEnumerable<T>
operators that return T
, but it is actually applicable to all awaitable operators, including things like ToListAsync()
.
BTW, this could be an interesting criteria to decide where to place operators. Let's say there are two groups:
I believe the second group is for more advanced/less common scenarios and having to include an extra dependency to use them wouldn't hurt too much.
I'm specifically referring to the LINQ syntax from
, select
, groupby
, that uses a duck pattern match to find matching methods/overloads. It doesn't care about the return types and works today with Ix Async :)
I agree there seems to be two groups as you describe, and there's no keyword equiv of ToListAsync
, FirstAsync
or SingleOrDefaultAsync
, etc. Those could easily be renamed as part of the version that implements the new interface design.
@onovotny ok, I was referring specifically to the ones that can be awaited. For from
, select
, where
, groupby
, join
, etc. I agree the names should remain the same.
This is now tracked at https://github.com/dotnet/csharplang/issues/43
Here's my take on async enumerators/sequences using C# 7's Task-Like types. The approach could potentially act as a playground for the language feature.
@Andrew-Hanlon AsyncEnumerator<>
looks like the approach I've taken so far.
The thing I'm afraid of is that finally blocks (using statements) won't run if the enumeration is incomplete, even though the enumerator is disposed.
@jnm2 That's a good point. Making the AsyncEnumerator<>
itself disposable could work, and simply set and bury a specific exception on the inner tasks, thereby triggering any inner disposables. I'll add this to the repo. Thanks.
Shouldn't it be IDIsposableAsync
? 😄
Just wanted to add my vote for enabling await
within linq queries. In the proposal, there's a statement: "and this isn't a particularly high-value thing to invest in right now", but I disagree. It would make real-life interaction with APIs sooo much more elegant. There are many situations where you need to make several async API calls, each referring to results from previous API calls. Stringing them all together with nested select
or let
clauses in a single linq query would make that sort of code much more pleasant.
Most helpful comment
Changing topics to another part of the proposal...
The main approach we've all been assuming for the interface is:
``` C# MoveNextAsync();
public interface IAsyncEnumerator
{
Task
T Current { get; }
}
would compile down to something like:
``` C#
var e = enumerable.GetEnumerator();
while (await e.MoveNextAsync())
{
T item = e.Current;
…
}
which with the aforementioned foreach loop would compile down to something like:
``` C#
var e = enumerable.GetEnumerator();
while (await e.WaitForNextAsync())
{
while (e.TryGetNext(out T item))
{
…
}
}
The idea here is that we have two async iterators, and we take one item from one of them and process it, whichever arrives first, kind of like a WhenAny. This can’t be done correctly on top of the MoveNextAsync/Current pattern: you would have to call MoveNextAsync on both iterators, at which point you’ve already moved the other one forward (and it may still be in flight when you exit the select). In contrast, because the act of saying “try to ensure more data is available synchronously” is separate from the act of saying “get me synchronously available data”, this can be done with the alternate model (though, full disclosure, it would likely require the WaitForNextAsync implementation to be thread-safe to be used in this manner, or at least tolerate another call to WaitForNextAsync while a previous one is still in flight).
(These reasons are part of why the channels prototype I did on corefxlab essentially has these two methods as the main part of the readable channel interface:
https://github.com/dotnet/corefxlab/blob/master/src/System.Threading.Tasks.Channels/System/Threading/Tasks/Channels/IChannel.cs#L32-L45)
Anyway, food for thought.