Runtime: Add missing async methods in System.Data.Common and implement IAsyncDisposable

Created on 1 Feb 2019 · 36Comments · Source: dotnet/runtime

Provider implementation tracking issues

SqlClient: https://github.com/dotnet/SqlClient/issues/113
Npgsql: https://github.com/npgsql/npgsql/issues/2481
MySqlConnector: https://github.com/mysql-net/MySqlConnector/issues/642

Specification

This issue consolidates dotnet/runtime#18225, dotnet/runtime#24244.

There are several APIs in System.Data.Common which possibly involve I/O, where we're missing async APIs altogether. This proposal adds these missing APIs, with default implementations that call the sync counterparts - much like the existing async methods on those APIs. In addition, all System.Data.Common types that implement IDisposable would be updated to implement IAsyncDisposable as well.

Note that in other cases async methods already exist but return Task, and we'd like to add overloads returning ValueTask. This is out of scope for this issue (but we'll work on these soon, see dotnet/runtime#19858, dotnet/runtime#25297).

This would ideally go into .NET Standard 2.1.

Proposal:

```c#
class DbConnection : IAsyncDisposable
{
protected virtual ValueTask BeginDbTransactionAsync(IsolationLevel isolationLevel, CancellationToken cancellationToken);

// The following delegate to BeginDbTransactionAsync, like their sync counterparts
public ValueTask<DbTransaction> BeginTransactionAsync(CancellationToken cancellationToken = default);
public ValueTask<DbTransaction> BeginTransactionAsync(IsolationLevel isolationLevel, CancellationToken cancellationToken = default);

public virtual Task ChangeDatabaseAsync(string databaseName, CancellationToken cancellationToken = default);
public virtual Task CloseAsync(CancellationToken cancellationToken = default);
public virtual ValueTask DisposeAsync();

}

class DbTransaction : IAsyncDisposable
{
public virtual Task CommitAsync(CancellationToken cancellationToken = default);
public virtual Task RollbackAsync(CancellationToken cancellationToken = default);
public virtual ValueTask DisposeAsync();
}

class DbCommand : IAsyncDisposable
{
public virtual Task PrepareAsync(CancellationToken cancellationToken = default);
public virtual ValueTask DisposeAsync();
}

class DbDataReader : IAsyncDisposable
{
public virtual Task CloseAsync(CancellationToken cancellationToken = default);
public virtual ValueTask DisposeAsync();
}
```

Note that this is only about adding async methods where non currently exist. dotnet/runtime#25297 is about adding ValueTask overloads where Task-returning async methods already exist (and where we have to think about naming etc.).

Returning Task vs. ValueTask

For the return value of the new proposed methods, our current guiding principle has been to opt for Task when the method in question represents a full database roundtrip operation, where any perf advantages of ValueTask are likely to be dominated by networking etc. Thus, method such as CommitAsync() and RollbackAsync() would return Task (aside from them having no return value).

Methods used for processing a resultset being streamed do seem like they'd benefit from returning ValueTask; examples include DbDataReader.ReadAsync(), DbDataReader.GetFieldValueAsync<T>(), which are out of scope of this proposal.

Cases such as DbConnection.CloseAsync() and DbDataReader.CloseAsync() seem to be between the two and we could probably go either way with them.

It would definitely be good to get some input on this.

Cancellation token overloads

Existing async methods in System.Data.Common come in pairs - a virtual one accepting CancellationToken and a non-virtual one delegating to the first. For the new methods we can just add one method with cancellation token defaulting to CancellationToken.None.

/cc @divega @ajcvickers @terrajobst @stephentoub @bgrainger

Edit history

| Date | Modification |
| ----- | ------------ |
| 16/4 | DbConnection.Open() returns Task instead of ValueTask as per @terrajobst's question
| 18/4 | Added DbDataReader.Get{Stream,TextReader}Async()
| 18/4 | Removed DbDataReader.Get{Stream,TextReader}Async() again (GetFieldValueAsync<T>()) already provides this functionality)
| 18/4 | Removed DbCommand.CancelAsync()
| 15/5 | Changed BeginTransactionAsync() overloads to return ValueTask instead of Task, and changed to use the standard ADO.NET API pattern, as per design review. Changed DbDataReader.Close() to return Task instead of ValueTask.

api-approved area-System.Data

Source

roji

👍4

Most helpful comment

@stephentoub mentioned that he had a flowchart about decision making on ValueTask which might be helpful to review.

If we use ValueTask<T> in the interface definition and there are providers which can provide synchronous results (in memory databases providers like SqLite in some modes) can run fast and avoid the task allocation and it makes sense to enable that scenario where it can be of benefit.

For the rest of the big old out of process providers we're never going to return synchronously as we'll always end up with a .AsTask call and wasting a reference worth of copy time, but really we're going to have to allocate a TaskCompletionSource and some sort of object to hold state and perhaps more async support machinery so we can't get a low alloc path anyway. So do we choose to prevent the faster path being available to everyone for the sake of avoiding a struct copy in a common case?

The existing async methods were defined before ValueTask<T> was available and if it had been available at the time i think the same discussion would have occured. In this case because we can't preclude a provider being able to give synchrnous results we shouldn't prevent that possibility at the api layer IMO.

Wraith2 on 16 May 2019

👍2

All 36 comments

Cases such as DbConnection.CloseAsync(), DbDataReader.CloseAsync(), DbCommand.Cancel() seem to be between the two and we could probably go either way with them.

Don't think default(ValueTask) has any advatange over Task.CompletedTask in this scenario?

/cc @stephentoub

benaadams on 1 Feb 2019

public virtual Task ChangeDatabase(string databaseName, CancellationToken cancellationToken = default);

ChangeDatabaseAsync ?

bgrainger on 1 Feb 2019

Cases such as DbConnection.CloseAsync(), DbDataReader.CloseAsync(), DbCommand.Cancel() seem to be between the two and we could probably go either way with them.

For most providers, that support pooling, it seems like DbConnection.Close[Async] would by default just synchronously return the connection to the pool (and not perform any I/O). Perhaps the ability to return Task.CompletedTask from a (non-asynchronous) implementation means this isn't a problem in practice? Cf. https://github.com/mysql-net/MySqlConnector/issues/467.

For MySQL, DbCommand.Cancel[Async] is always (lots of) network I/O, to send a "kill query" command to the server (and wait for acknowledgement). I would put it firmly in the "likely to be dominated by networking etc." category.

bgrainger on 1 Feb 2019

@benaadams,

Cases such as DbConnection.CloseAsync(), DbDataReader.CloseAsync(), DbCommand.Cancel() seem to be between the two and we could probably go either way with them.

Don't think default(ValueTask) has any advatange over Task.CompletedTask in this scenario?

I agree.. It's a question of whether we expect these operations (command cancellation, physical connections) to occur intensively enough, and whether the I/O involved to be light enough, so that ValueTask would have a perf impact. I agree that it probably doesn't make much sense.

@bgrainger,

ChangeDatabaseAsync?

Yep, corrected...

For most providers, that support pooling, it seems like DbConnection.Close(Async) would by default just synchronously return the connection to the pool (and not perform any I/O). Perhaps the ability to return Task.CompletedTask from a (non-asynchronous) implementation means this isn't a problem in practice? Cf. mysql-net/MySqlConnector#467.

I agree, and like with OpenAsync() it makes sense to me for CloseAsync() to return a Task. It also fits squarely with the idea that calls involving a network roundtrip should return Task (because the Task vs. ValueTask difference is negligible anyway).

But just as a note, the fact that an operation is usually expected to return synchronously doesn't necessarily seem decisive - both Task- and ValueTask- returning methods can have zero overhead in this case. I think it matters more what happens when the operation does return asynchronously, and how often that's expected to occur. Compare with NpgsqlDataReader.Read(), which is also expected synchronously in most cases because rows would typically be buffered in memory - it still seems to make sense to have a version returning ValueTask (not a database roundtrip, but rather continuously parsing results).

For MySQL, DbCommand.Cancel(Async) is always (lots of) network I/O, to send a "kill query" command to the server (and wait for acknowledgement). I would put it firmly in the "likely to be dominated by networking etc." category.

I agree...

roji on 1 Feb 2019

It's a question of whether we expect these operations (command cancellation, physical connections) to occur intensively enough, and whether the I/O involved to be light enough, so that ValueTask would have a perf impact.

Yeah, but then if its instant, you can return a Task.CompletedTask which is no allocations an IntPtr sized to pass around.

ValueTask<T> makes a lot of sense for most scenarios; but the non-generic version ValueTask makes most sense when its IValueTaskSource backed at some point as its 2 x IntPtr in size so can't be passed via register; so can be slower if you aren't taking advantage of its other properties.

benaadams on 1 Feb 2019

👍1

~Note: rather than providing an empty virtual implementation of DisposeAsync(), we probably need to provide the async analogue of the dispose pattern (Dispose(bool) called by both Dispose() and the finalizer). Need to confirm.~

After thinking about this again (and not at 3AM), this doesn't seem to make any sense - finalizers shouldn't be doing I/O or calling async methods, and the classes in question call into the synchronous Dispose(bool).

roji on 7 Feb 2019

Task BeginTransactionAsync: shouldn't it have Task<DbTransaction> return type?

MaceWindu on 9 Feb 2019

Task BeginTransactionAsync: shouldn't it have Task<DbTransaction> return type?

Thanks, yes - corrected.

roji on 9 Feb 2019

Video

@roji @divega

What does it mean to cancel CancelAsync()? Also, how is calling CancelAsync() different from cancelling the cancellation token that was passed to the operation that CancelAsync() would cancel?
Is there a reason why CloseAsync() returns ValueTask? It doesn't seem to on the perf critical path?
Should we really add DbDataReader.CloseAsync() seems odd to add an API that is known to be not useful.

terrajobst on 16 Apr 2019

What does it mean to cancel CancelAsync()?

Unlike some other async cancellations, cancelling an ongoing database operation typically involves a (potentially) heavy networking operation (e.g. establish a new connection to the database). For this reason it may make sense to treat the cancellation as its own async operation that may need to be cancelled (although I admit it may be a bit contrived).

Assuming we decide to keep CancelAsync() (see below), we could drop the cancellation token (although this would be the first case I've heard of where an async method doesn't accept one).

Also, how is calling CancelAsync() different from cancelling the cancellation token that was passed to the operation that CancelAsync() would cancel?

There's at least the possibility of two different cancellation types. simply passing the token down to the socket layer (which now has actual value since https://github.com/dotnet/corefx/issues/24430), vs. triggering the database-specific cancellation mechanism I described above (e.g. establishing a new connection to the database). It's true that there's a lot of ambiguity here though.
It's possible for CancelAsync() to be called on a synchronous operation, in which case there's no token (although it may be contrived for an application to sometimes use sync, sometimes async).
DbCommand already has a synchronous Cancel(), so this adds its async counterpart for consistency.

But I do admit it could make sense to not have CancelAsync() at all, and say that Cancel() is there for sync only, whereas sync has the original cancellation token - that does seem to be a simpler API surface around cancellation. @divega what do you think?

Is there a reason why CloseAsync() returns ValueTask? It doesn't seem to on the perf critical path?

For DbDataReader.CloseAsync() - assuming we actually add it (see next point) - it returns ValueTask for consistency with DisposeAsync(), to which it's very similar. Note that unlike with DbConnection, there's no way to reopen a DbDataReader, so disposing and closing really mean the same thing (unless some provider decides to implement them differently). Since DisposeAsync() must return ValueTask (because of IDisposableAsync), it seems to make sense to return the same from CloseAsync().

For DbConnection.CloseAsync(), I think you're right - will update the proposal.

Should we really add DbDataReader.CloseAsync() seems odd to add an API that is known to be not useful.

It's definitely not absolutely necessary, but:

The sync counterpart Close() already exists, and it seems like it would be inconsistent/surprising to not have it.
There's the possibility that some provider out there decided to do something different for Close()/Dispose(), so they would need CloseAsync() to maintain the same distinction with async.
There doesn't seem to be a big disadvantage in adding CloseAsync() specifically (apart from the general added API surface). Just like with Close()/Dispose(), providers would typically implement one by calling the other.

roji on 16 Apr 2019

Just to clarify one of my points from above: in the current state of things, some providers already implement DbCommand.Cancel() via a database-specific mechanism. It should be possible to trigger that same mechanism asynchronously, but if we do that when the cancellation token is triggered we lose the option to pass it down into the socket layer.

roji on 16 Apr 2019

Note: added DbDataReader.GetStreamAsync() and DbDataReader.GetTextReaderAsync() to the proposal above - async support seems especially important for these column-streaming methods.

/cc @divega

roji on 18 Apr 2019

👍1

Streams can currently be returned by using GetFieldValueAsync<TextReader> etc, will the new methods offer different behaviour or defaults?

Wraith2 on 18 Apr 2019

Since DisposeAsync() must return ValueTask (because of IDisposableAsync), it seems to make sense to return the same from CloseAsync().

Because its basically a pass-through?

Question on mismatch here then

class DbConnection : IAsyncDisposable
{
    public virtual Task CloseAsync(CancellationToken cancellationToken = default);
    public virtual ValueTask DisposeAsync();
}

benaadams on 18 Apr 2019

@Wraith2

Streams can currently be returned by using GetFieldValueAsync

Oh, that is actually new to me (and not supported in Npgsql) - thanks. This makes GetStream() and GetStreamReader() very similar to, say, GetString() in that they're just an alternative to GetFieldValue<T>(). If that's the case, then it indeed doesn't make any sense to introduce async overloads.

Have opened https://github.com/npgsql/npgsql/issues/2446 to track adding this to Npgsql, and will remove GetStreamAsync() and GetTextReaderAsync() from the proposal.

roji on 18 Apr 2019

@benaadams my statement above was about DbDataReader.{Close,Dispose}(), not DbConnection.

Basically:

DbConnection can be opened and closed multiple times. This makes Close() and Dispose() different, so non-passthrough. This is why I've modified DbConnection.Close() to return Task as @terrajobst suggested.
DbDataReader can't be reopened - it's returned by DbCommand.ExecuteReader() and consumed by the user, until it's disposed. Because of this, Close() seems identical to Dispose(), and the only reason to actually add CloseAsync() is for API consistency (because sync Close() already exists). We also never know what peculiar distinction between the two is already implemented by some provider out there, so it seems like a good idea to allow the same in async.

roji on 18 Apr 2019

👍2

Docs for streaming are at https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sqlclient-streaming-support . The GetFieldValueAsync docs don't list the stream types but i know GetFieldValue supports some stream types. It might be worth verifying the details and seeing if it'd be better to use/extend the existing support or use the new methods you suggest.

Wraith2 on 18 Apr 2019

@Wraith2 if you say SqlClient allows GetFieldValueAsync<Stream> (and TextReader), that's enough for me even if it isn't clearly documented. Logically there isn't really much difference between GetStream() and GetString(), and the less new method clutter we introduce, the better...

roji on 18 Apr 2019

Regarding cancellation, after thinking about this a bit more I'm convinced we don't need CancelAsync().

As I wrote above, the main point behind having CancelAsync() was to allow having two cancellation types: a database-specific cancellation which contacts the server and aborts the current command, and a simple client-side cancellation enabled by https://github.com/dotnet/corefx/issues/24430.

However, while a socket cancellation makes sense in many scenarios, it doesn't seem to make sense here: if the socket read (or write) for a currently-executing command is cancelled, pending data is left pending but the logic for reading and parsing that data (i.e. DbCommand.ExecuteReader()) has been cancelled - protocol sync is lost. As a result, it's very difficult to imagine the connection actually being usable after a socket connection. In other words, with ADO.NET specifically a socket cancellation seems to have the same effect as forcibly closing the socket, which is something users can already do via Close().

So here's a summary of what I think we should have:

We have DbCommand.Cancel() for triggering a database-specific cancellation mechanism, which works also for sync operations.
We have the cancellation token passed to DbCommand.ExecuteReaderAsync() (and the others), which does the exact same thing. Both are best effort, no-guarantee APIs
If for some reason the user really wants to stop the I/O operation (e.g. because of a network partition), they should simply close the connection on which the command is executing.

I'll update the proposal for now, but any comments on the above are very welcome.

roji on 18 Apr 2019

@Wraith2 if you say SqlClient allows GetFieldValueAsync

I'm wrong. GetFieldValue and GetfieldValueAsync will allow XmlReader because I added it (per request) in https://github.com/dotnet/corefx/pull/34880 which was in response to https://github.com/dotnet/corefx/issues/30958 by @bricelam and the reasoning that all field types including streams should be accessible through the same overloads. The same could be done with TextReader and Stream to return binary and text data respectively.

There are already GetStream, GetXmlReader and GetTextReader sync versions though so your proposal mirrors those. I think the choice comes down to whether it makes more sense to mirror the existing synchronous api into async versions or whether to take the newer api design permitted by IsDbNull and GetfieldValueAsync. I have no strong feeling either way

Wraith2 on 18 Apr 2019

@Wraith2 thanks for testing this.

I still think that getting Stream and TextReader via GetFieldValue (and GetFieldValueAsync) makes a lot of sense - there's no real reason for these to be different from other types like string or int. The surface API of DbDataReader is already pretty huge, I'd rather avoid adding more methods than we absolutely have to.

roji on 18 Apr 2019

👍1

@roji Remember not all providers are necessarily using ~~TCP/IP~~sockets. Is there no reasonable scenario where closing would need to do async work? On any kind of transport? What about if Close closes a file--could that be async?

ajcvickers on 18 Apr 2019

@ajcvickers it's true that not all providers use socket. However, the proposal is definitely to keep CloseAsync(), but to not introduce a new CancelAsync() method in addition to the original operation's cancellation token, which already exists as a cancellation mechanism. So it seems fine for closing to do async work, and even for the cancellation token to trigger async I/O.

IMHO the question is whether it makes sense to have the ADO.NET API provide two cancellation paths - CancelAsync() and the standard cancellation token mechanism. Even for non-TCP providers, it's difficult to see how that would make sense, and how users would understand which mechanism would do what...

Do you have a specific scenario/direction in mind? I may be missing something...

roji on 18 Apr 2019

@roji Sorry--I misread. I'm fine with there being no CancelAsync.

ajcvickers on 19 Apr 2019

DbDataReader.Close() seems identical to Dispose(), and the only reason to actually add CloseAsync() is for API consistency (because sync Close() already exists). We also never know what peculiar distinction between the two is already implemented by some provider out there, so it seems like a good idea to allow the same in async.

I remember one interesting distinction we discussed with @bricelam while he was working on https://github.com/aspnet/EntityFrameworkCore/pull/15001, is that Dispose() is usually implemented to not throw.

I still think that getting Stream and TextReader via GetFieldValue (and GetFieldValueAsync) makes a lot of sense - there's no real reason for these to be different from other types like string or int. The surface API of DbDataReader is already pretty huge, I'd rather avoid adding more methods than we absolutely have to.

Agreed. Should have a new issue on SqlClient to support more stream types with GetFieldValue/GetFieldValueAsync? Should this be covered by ADO.NET specification tests?

divega on 19 Apr 2019

👍1

Should have a new issue on SqlClient to support more stream types with GetFieldValue/GetFieldValueAsync?

If the area owners are ok with the change then I'll probably pick that up since I did the XmlReader one and know the code in that area. A new tracking issue would be handy.

Wraith2 on 19 Apr 2019

Probably worth asking @saurabh500 if he sees any fundamental problem with it.

divega on 19 Apr 2019

Opened dotnet/SqlClient#27 for the Stream/TextReader question, let's leave it out of this issue for now.

roji on 19 Apr 2019

👍1

@terrajobst I think this is ready for another look - your comments above have been addressed:

CancelAsync() has been removed entirely
DbConnection.CloseAsync() now returns Task instead of ValueTask
After some discussion we'd like to keep DbDataReader.CloseAsync() for both API consistency and other reasons (e.g. provider-specific difference between close and dispose, dispose not throwing where close may, etc.).

roji on 19 Apr 2019

Video

To follow the existing pattern, BeginConnection should be non-virtual and we should add a single BeginDbConnectionAsync that takes an isolation level
Consider for reach API whether you want Task<T> or ValueTask<T>

terrajobst on 14 May 2019

Consider for reach API whether you want Task<T> or ValueTask<T>

Based on the conversation in the API review this morning, I think for all APIs in this set that need to return a value, we can just go with ValueTask <T>. My reasoning is that these are all just “maybe async” APIs: the default implementation falls back to the sync version of the method and for all providers that won’t/can’t implement async this will yield the lowest runtime cost.

divega on 15 May 2019

Based on the conversation in the API review this morning, I think for all APIs in this set that need to return a value, we can just go with ValueTask .

Agreed - although this seems to only include BeginTransactionAsync(), have made the change to the specs above.

I've also done the modifications to have it follow the standard ADO.NET API pattern (protected virtual BeginDbTransactionAsync, public non-virtual BeginTransactionAsync), and have changed DbDataReader.CloseAsync() to return Task instead of ValueTask (no downside, slightly cleaner and more consistent with other non-generic async methods, theoretical small perf advantage as the ValueTask overhead is removed).

I've also removed the blocking-partner label. I think this design can be implemented now, will try to get around to it soon.

roji on 16 May 2019

protected virtual DbTransaction BeginDbTransactionAsync(IsolationLevel isolationLevel, CancellationToken cancellationToken);

Should this be Task<DbTransaction>?

My reasoning is that these are all just “maybe async” APIs: the default implementation falls back to the sync version of the method

I'm sympathetic to this reasoning, but wonder if it makes the most sense to optimise for "old" drivers. Certainly for MySqlConnector, BeginTransactionAsync does perform network I/O to start a transaction on the server. (I imagine this would be true for any driver, since server state has to be changed.) Any driver that implements BeginTransactionAsync with async I/O will necessarily have to return a Task, so ValueTask is now adding unnecessary overhead.

This also fits the pattern of existing APIs (e.g., ExecuteReaderAsync), which execute the sync method and wrap it with Task.FromResult.

bgrainger on 16 May 2019

@stephentoub mentioned that he had a flowchart about decision making on ValueTask which might be helpful to review.

Wraith2 on 16 May 2019

👍2

@bgrainger and @Wraith2 I somehow missed your comments.

One case for ValueTask is in-memory as @Wraith2 mentioned, and possibly also embedded databases (e.g. Sqlite).

However, a more compelling case is an optimization which Npgsql already performs: BeginTransaction() doesn't actually do anything, but rather enqueues the protocol message in an internal buffer - nothing is sent. When the user next executes a query, that query is sent with the transaction start message prepended to it. This saves a roundtrip as the transaction start and the query are effectively batched.

In theory other database drivers can do the same thing (depending on protocol details, of course), and it can be a very valuable optimization. FYI the same trick is done with the "session reset" message that Npgsql sends when a connection is returned to the pool. ValueTask seems justified to support these cases.

roji on 23 May 2019

FYI for all those following, we're proposing to remove the cancellation token from {DbDataReader,DbConnection}.CloseAsync: https://github.com/dotnet/corefx/pull/39070, see discussion in https://github.com/dotnet/standard/pull/1283#pullrequestreview-255383035

roji on 1 Jul 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings