Sqlclient: EF core 3/SqlConnection with MultipleActiveResultSets=True is very slow in performance across network

Created on 1 Apr 2020  路  16Comments  路  Source: dotnet/SqlClient

Describe the bug

We are observing performance issues while calling sql server using EF core 3.1.2 or plain simple SqlConnection. Our current web app is hosted in West US region on Azure and the sql server is at West US 2. The moment we change the connection string to not having _MultipleActiveResultSets=True_, it just works fine. This does not happen if sql & app are deployed in same region. Sometimes its failing with below exception.

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
 ---> System.ComponentModel.Win32Exception (258): Unknown error 258
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at Microsoft.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName, Boolean shouldReconnect)
   at Microsoft.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName)
   at Microsoft.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso)
   at reprocli.Program.Scenario4(String connString, Int32 number)
   at reprocli.Program.<>c__DisplayClass0_0.<Main>b__0(Int32 n)
   at System.Linq.Parallel.ForAllOperator`1.ForAllEnumerator`1.MoveNext(TInput& currentElement, Int32& currentKey)
   at System.Linq.Parallel.ForAllSpoolingTask`2.SpoolingWork()
   at System.Linq.Parallel.SpoolingTaskBase.Work()
   at System.Linq.Parallel.QueryTask.BaseWork(Object unused)
   at System.Linq.Parallel.QueryTask.<>c.<.cctor>b__10_0(Object o)
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__274_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
ClientConnectionId:005d2aae-9409-4711-aaa0-b03b70f2832e
Error Number:-2,State:0,Class:11
ClientConnectionId before routing:e3300799-fdd0-40a4-84ea-b9f383596b12
Routing Destination:fed2c41af7dc.tr5.westus2-a.worker.database.windows.net,11063<---

To reproduce

Just create a SqlConnection and call it from a console app. My table has ~100 thousands rows.

```c#
static void Main(string[] args)
{
Console.WriteLine(DateTime.UtcNow);
string constring = @"<>";
using (SqlConnection con = new SqlConnection(constring))
{
using (SqlCommand cmd = new SqlCommand("SELECT * FROM YourTable", con))
{
cmd.CommandType = CommandType.Text;
using (SqlDataAdapter sda = new SqlDataAdapter(cmd))
{
using (DataTable dt = new DataTable())
{
sda.Fill(dt);
}
}
}
}
Console.WriteLine(DateTime.UtcNow);

        Console.WriteLine(DateTime.UtcNow);
        constring = @"<<connection string without mars>>";
        using (SqlConnection con = new SqlConnection(constring))
        {
            using (SqlCommand cmd = new SqlCommand("SELECT * FROM YourTable", con))
            {
                cmd.CommandType = CommandType.Text;
                using (SqlDataAdapter sda = new SqlDataAdapter(cmd))
                {
                    using (DataTable dt = new DataTable())
                    {
                        sda.Fill(dt);
                    }
                }
            }
        }
        Console.WriteLine(DateTime.UtcNow);
        Console.ReadLine();
    }

```

Expected behavior

It should take same time to execute both commands, the time difference is huge in this case.
image

Further technical details

System.Data.SqlClient version: 4.8.1 OR M.D.SqlClient 1.1.1
.NET target: netcoreapp3.1
SQL Server version: Azure SQL
Operating system: Hosted on Azure Web App on Windows

Most helpful comment

I looked more into this and I agree with @Wraith2 this seems to be due to back-and-forth network communication. Configuring Packet Size would definitely benefit in this case.

I modified your example to perform warm up of connection pool that removes other lags, and tested on local server with 100k rows of data (using table schema as provided), I don't see any considerable difference between Mars and Non-Mars in both .NET Framework and .NET Core, below are results:

----- SQL Server 2019 -----
MARS + Packet Size 12k : 857 ms
MARS  : 631 ms
NON MARS  : 630 ms

----- Azure SQL DB -----
MARS + Packet Size 12k : 4885 ms
MARS : 7162 ms
NON MARS : 795 ms

I believe this issue can be closed.

Understood. Its still a significant difference between MARS and non-MARS, 1 sec vs 5 secs, but I get the reason :)

----- Azure SQL DB -----
MARS + Packet Size 12k : 4885 ms
MARS : 7162 ms
NON MARS : 795 ms

All 16 comments

There is no M.D.SqlClient version 4.8.1

There is no M.D.SqlClient version 4.8.1

I guess I missed it, We are using "System.Data.SqlClient" 4.8.1

And same behavior for M.D.SqlClient 1.1.1. My table has ~100 thousands rows.

Hi @sanjaydebnath

I do see that difference, but it's not as bad when testing locally on Windows.
We get below results:

---- NET CORE ----
Elapsed Time in milliseconds (MARS): 1399
Elapsed Time in milliseconds (Non-MARS): 289

---- NET FX ----
Elapsed Time in milliseconds (MARS): 666
Elapsed Time in milliseconds (Non-MARS): 288

Seems like in App Service this difference increases by many folds, which should be investigated.
We should investigate this issue eitherway, since although there are differences between netcore/netfx, but when using Native SNI, we should not see double execution time with MARS enabled.

My table has ~100 thousands rows.

That explains the big difference. I tested with a simple SELECT @@VERSION so maybe we should look into DataAdapter area in NETCore.

But the basic connection is also slow, so that too I believe.

As always with netcore perf issues, if you can get me a captive reproduction against northwind or a scratch database I can profile it and see if there's anything obvious I can do.

Hi @sanjaydebnath

Could you post your table metadata information here to investigate datatypes in processing.
Or Provide a SQL Script for table in use.

Hi @sanjaydebnath

Could you post your table metadata information here to investigate datatypes in processing.
Or Provide a SQL Script for table in use.

It's a history enabled table.

CREATE TABLE [dbo].MyTable ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
WITH
(
SYSTEM_VERSIONING = ON ( HISTORY_TABLE = [dbo].[MyTableHistory] )
)
GO

I've done some poking around and I think what you're seeing is an unavoidable artefact of how mars works. Each mars enabled packet can hold 8 bytes less data than a standard packet, since the packet length cannot vary until the end of the stream this means that each full packet in a non-mars connection spans two packets in a mars connection. That spanning increases wait times for each packet which is affected by the roundtrip time you experience so it effectively magnifies latency.

Can you try increasing your packet size to 12 K and see if the perceived delay is lower?

increasing your packet size to 12 K

Thanks for the details. I'm not sure how can we do that on a cloud/Azure env (current setup). The web app is hosted on Azure Web App and DB is on Azure SQL. Both are not inside dedicated private network.

It's a property of the connection string Packet Size you just set it to the value in bytes you want.

It's a property of the connection string Packet Size you just set it to the value in bytes you want.

Cool, thanks. Let me try that.

It's a property of the connection string Packet Size you just set it to the value in bytes you want.

Better with higher pkt size. but still way more than non MARS connection string for a select of 100K rows.

image

Ok. So that's better. The maximum packet size you can use with a secured connection (all azure connection are ssl enabled) is 16K-8 bytes as far as I remember, if it errors take a few chunks of 8 bytes off until it starts working and you're at max supported packet size.

If I'm right (and the evidence of my test suggests I'm either right or lucky) then you're stuck. There isn't a way to engineer around the way mars packets are constructed and received. My suggestion would always be to turn of mars because it sucks for performance and complicates the network layer a lot. If you can't turn off mars see if you can reduce the amount of data transferred. Less data needs less packets so you'll get it faster. If you can't do either of those things then use the larger packet size but make sure you leave a note for the person that maintains the code telling them why you're using the larger packet size.

Ok. So that's better. The maximum packet size you can use with a secured connection (all azure connection are ssl enabled) is 16K-8 bytes as far as I remember, if it errors take a few chunks of 8 bytes off until it starts working and you're at max supported packet size.

If I'm right (and the evidence of my test suggests I'm either right or lucky) then you're stuck. There isn't a way to engineer around the way mars packets are constructed and received. My suggestion would always be to turn of mars because it sucks for performance and complicates the network layer a lot. If you can't turn off mars see if you can reduce the amount of data transferred. Less data needs less packets so you'll get it faster. If you can't do either of those things then use the larger packet size but make sure you leave a note for the person that maintains the code telling them why you're using the larger packet size.

Ya I don't need MARS for all queries but a few. The data size is big for some 'Get' cases and for now my work around is keep two connection strings in Key Vault & switch accordingly (while creating EF Core context) depending on query, which seems working fine for now. So for all 'Get' calls I use non-MARS connection string except a couple of places where we need to call stored proc that returns multiple result sets (need MARS). So at least 95% 'get/select' are faster now.

I looked more into this and I agree with @Wraith2 this seems to be due to back-and-forth network communication. Configuring Packet Size would definitely benefit in this case.

I modified your example to perform warm up of connection pool that removes other lags, and tested on local server with 100k rows of data (using table schema as provided), I don't see any considerable difference between Mars and Non-Mars in both .NET Framework and .NET Core, below are results:

----- SQL Server 2019 -----
MARS + Packet Size 12k : 857 ms
MARS  : 631 ms
NON MARS  : 630 ms

----- Azure SQL DB -----
MARS + Packet Size 12k : 4885 ms
MARS : 7162 ms
NON MARS : 795 ms

I believe this issue can be closed.

I looked more into this and I agree with @Wraith2 this seems to be due to back-and-forth network communication. Configuring Packet Size would definitely benefit in this case.

I modified your example to perform warm up of connection pool that removes other lags, and tested on local server with 100k rows of data (using table schema as provided), I don't see any considerable difference between Mars and Non-Mars in both .NET Framework and .NET Core, below are results:

----- SQL Server 2019 -----
MARS + Packet Size 12k : 857 ms
MARS  : 631 ms
NON MARS  : 630 ms

----- Azure SQL DB -----
MARS + Packet Size 12k : 4885 ms
MARS : 7162 ms
NON MARS : 795 ms

I believe this issue can be closed.

Understood. Its still a significant difference between MARS and non-MARS, 1 sec vs 5 secs, but I get the reason :)

----- Azure SQL DB -----
MARS + Packet Size 12k : 4885 ms
MARS : 7162 ms
NON MARS : 795 ms

Was this page helpful?
0 / 5 - 0 ratings