SqlClient's connection resiliency adds 10 seconds to database existence checks. Consider exposing a way to override it at runtime

Created on 21 Dec 2016  路  12Comments  路  Source: dotnet/SqlClient

This is to follow up on a specific issue a customer created today on EF Core at https://github.com/aspnet/EntityFramework/issues/7283, but the problem has been brought to our attention before.

Both EF6 and EF Core provide APIs that allow applications to create and initialize databases, check for their existence and modify their schemas. These operations can happen at design time (e.g. when using the migrations feature) or at runtime. Attempting to connect to a database that doesn't exist is part of the normal flow of these operations.

In .NET Framework 4.5.1 SqlClient introduced a connection resiliency feature that performs automatic retries if failures occur during SqlConnection.Open(). Two new settings ConnectRetryCount and ConnectRetryInterval were added to the list of settings recognized in connection strings and a default behavior was adopted so if the first attempt to connect to a database fails a retry will occur after 10 seconds if these settings are not specified. The same feature is now included in the .NET Core version of SqlClient.

With only the default behavior this feature introduces a lag of 10 second for any calling code that attempts to connect to a database that doesn't exist. It also prevents any calling code from implementing its own (potentially more efficient) retry logic correctly. This severely affects customer experience and can potentially affect runtime performance.

Two main approaches have been proposed to mitigate this issue on EF code, but they have severe disadvantages:

  1. Recommend customers to disable SqlClient's connection resiliency altogether by setting ConnectRetryCount=0 when working EF and make sure the feature is disabled any time EF runtime or tooling creates a connection string, or even throw if an attempt is made to use a connection that has the feature enabled. The main disadvantages of this approach are:

    • It is not discoverable for customers and so users will still experience the 10 second lags (or exceptions if we decide to throw) for the default case.
    • It prevents the connection resiliency feature in SqlClient from being available in scenarios in which it could actually have been helpful.
    • EF6 supports older versions of .NET Framework than 4.5.1 and for those versions these settings are not valid. The logic would need to take the executing version of .NET Framework into account.
  2. Store aside a copy of the original connection string so that we can modify it with ConnectRetryCount = 0 and create a separate SqlConnection object to perform existence checks.

    • The main problem with this approach is that both EF Core and EF6 support passing a SqlConnection object to be used in the EF context. It is possible that the password would have already been removed from the ConnectionString in that connection object if it was open before, so in that case the connection string would not contain enough credentials to be able to create a separate functional connection object.

At this point we believe that if SqlConnection exposed a way to programmatically disable connection retries without requiring the modification of the connection string we could modify EF6 and EF Core code to restore the correct behavior and eliminate 10 second lags. We are happy to discuss other options with the SqlClient team.

Also note that this issue applies to both .NET Core and .NET Framework.

cc @ajcvickers

Note on how EF Core implements its own retry logic to check for database existence:

In general, any code that is calling SqlConnection.Open() can leverage contextual knowledge to make connection retries more efficient. E.g.:

  • When EF Core performs existence checks, it will only retry on certain errors if they occur immediately after a database has been created.
  • For a regular existence checks (i.e. those that don't happen immediately after the database has been created) EF Core assumes that immediate failures coming from a database server mean that the database effectively doesn't exist and can avoid any retry logic.
Enhancement Performance

Most helpful comment

We could introduce an overload of Open and OpenAsync which take in some connection override options instead of providing a set of APIs to target only the Connection Resiliency feature override.

I propose the following

```C#
class SqlConnection {
//... Other APIs
public void Open(SqlConnectionOverrides options);
public Task OpenAsync(SqlConnectionOverrides options);

// Other Apis

}

[Flags]
enum SqlConnectionOverrides : int
{
None = 0,
DisableAzureConnectionResiliency = 1, // This probably requires a better name
// More options can be added to this flag if needed for any other overrides
}
```

All 12 comments

As recently announced in the .NET Blog, focus on new SqlClient features an improvements is moving to the new Microsoft.Data.SqlClient package. For this reason, we are moving this issue to the new repo at https://github.com/dotnet/SqlClient. We will still use https://github.com/dotnet/corefx to track issues on other providers like System.Data.Odbc and System.Data.OleDB, and general ADO.NET and .NET data access issues.

@ajcvickers @roji @saurabh500 @David-Engel

I was looking at this issue seems a simple to do if we can conclude a solution..

Diego's proposition:

"At this point we believe that if SqlConnection exposed a way to programmatically disable connection retries without requiring the modification of the connection string we could modify EF6 and EF Core code to restore the correct behavior and eliminate 10 second lags. We are happy to discuss other options with the SqlClient team."

With my initial investigation, I think we can do this if this is needed. In order to do this, we'd have to expose a SqlConnection property "ConnectRetryCount" (internal reference is already on connection level) and handle it's value change in SqlConnectionPoolKey to make sure it's sent back to the matching connection pool when connection is closed.

Are there any other scenarios/limitations to this change that I'm missing? Opening this old request for discussions/alternative ideas.

Thanks for looking at this again @cheenamalhotra... I think we should consider @divega's 2nd point above:

Store aside a copy of the original connection string so that we can modify it with ConnectRetryCount = 0 and create a separate SqlConnection object to perform existence checks.
The main problem with this approach is that both EF Core and EF6 support passing a SqlConnection object to be used in the EF context. It is possible that the password would have already been removed from the ConnectionString in that connection object if it was open before, so in that case the connection string would not contain enough credentials to be able to create a separate functional connection object.

I also had a version of this exact problem in Npgsql, wher connection string changes are required. My solution was to simply add a CloneWith(string connectionString), which accepts a new connection string to override settings in the old one, and returns a new connection instance. Security settings are carried over from the original connection to the cloned one.

This solution seems better because it works for any connection string parameter, not just the retry policy. For example, it's sometimes necessary to create a non-pooling connection from an existing connection which has already been opened; I do this for database existence checks in the EF Core provider, to make sure an actual physical (unpooled) connection is done.

How does that sound? I'm assuming that the actual credentials are kept inside SqlClient, since IIRC it should be possible today to call regular Clone on an open connection, and open the cloned connection (so the password is kept somewhere). Adding CloneWith (or similar) should be trivial. If we feel this is useful, we could even add this to DbConnection in ADO.NET.

PS Some quick testing seems to show that Microsoft.Data.SqlClient continues to expose the password in the connection string even after Open is called, opened #448.

For an example of how the Npgsql EF Core provider does database connection checks with CloneWith, see this.

Thanks @roji

Store aside a copy of the original connection string so that we can modify it with ConnectRetryCount = 0 and create a separate SqlConnection object to perform existence checks.

That does sound like a good solution to me. This can be better handled at EF layer than for us to manage pool modification properties, in case more properties need similar support in future.

Edit: SqlClient would expose CloneWith method on SqlConnection to clone properties from provided Connection String.

I will look into the new issue, thanks!

Just to be sure we're saying the same thing, what I was describing would require SqlClient to add a CloneWith method. Upper layers such as EF can't fully handle this because they need to support scenarios where they're given an already-opened SqlConnection object and not a connection string.

Taking a little step back here, the real problem is the 10 second hang when the server is reachable but the database does not exist. This would solve all the cases in EF where people get the unwanted delay.

To state this a different way, if the server can't be reached, then it's not unreasonable to spend some time trying to reach it. But if the server can be reached and gives a definitive answer that no database with the given name exists, then this answer should be returned immediately.

@roji I'm very reluctant to start cloning connections. It has been a huge and horrifying can of worms in the past.

We should discuss this in our sync next week. /cc @bricelam

Taking a little step back here, the real problem is the 10 second hang when the server is reachable but the database does not exist. This would solve all the cases in EF where people get the unwanted delay.

Ah, I see - that behavior indeed seems odd. However, if memory serves, right after creating a new database in SQL Server, there's a period of time during which the database still does not exist, as SQL Server is preparing it (or whatever). If that is so, then retrying for a non-existing database does make some sense, by allowing people to connect successfully immediately after creating a new database. But I'm not sure if the above is actually correct.

Regardless of the above, I think CloneWith could be necessary for other purposes, e.g. to disable pooling while performing a database existence check. Our current check in EF Core opens pooled connections (which doesn't actually do anything if there are idle pooled connections). Granted, we do execute SELECT 1 after that - so if the database is down we'd know it - but there some scenarios where executing something on an existing connection could work, while creating a new physical connection could fail. So I think we should ideally disable pooling and open a new connection, an in order to do that we'd need CloneWith. This can be a completely separate issue which I can open.

I'm very reluctant to start cloning connections. It has been a huge and horrifying can of worms in the past.

OK, I guess I don't have that history. This mechanism has worked quite well in Npgsql but may be more complicated for SqlClient, I'd be interested in hearing about the problems you've seen.

We could introduce an overload of Open and OpenAsync which take in some connection override options instead of providing a set of APIs to target only the Connection Resiliency feature override.

I propose the following

```C#
class SqlConnection {
//... Other APIs
public void Open(SqlConnectionOverrides options);
public Task OpenAsync(SqlConnectionOverrides options);

// Other Apis

}

[Flags]
enum SqlConnectionOverrides : int
{
None = 0,
DisableAzureConnectionResiliency = 1, // This probably requires a better name
// More options can be added to this flag if needed for any other overrides
}
```

@roji @ajcvickers
In testing the proposal, I discovered that if you use OpenAsync(), you get a connection failure instantly with the current code (SDS or MDS).

@saurabh500 OpenAsync() looks completely wrong to me from an async perspective. I'd like to discuss it in our next sync to make sure I'm understanding it correctly.

Hi @ajcvickers

Closing the issue as PR #463 addresses the issue.

Was this page helpful?
0 / 5 - 0 ratings