Sqlclient: big traffic leads to max pool size was reached error in azure sql server with ef core

Created on 22 Aug 2019 · 8Comments · Source: dotnet/SqlClient

When big traffic(telemetry messages) hit, i assigned a task to each message and made the task async-await, in this task there is a long-running method, which wrote some entities in DB, it started to throw these errors. I cannot put thread.sleep or lock because it cause lag in DB. Application is developed in Dotnet core 2 writing data to azure SQL server

System.AggregateException: One or more errors occurred. (Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.) ---> System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached. at System.Data.Common.ADP.ExceptionWithStackTrace(Exception e) --- End of stack trace from previous location where exception was thrown --- at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, Boolean errorsExpected) at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransactionAsync(IsolationLevel isolationLevel, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransactionAsync(CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(Tuple2 parameters, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func4 operation, Func4 verifySucceeded, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(IReadOnlyList1 entriesToSave, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) at TT.Core.Services.ProductionService.UpdateOeeActualProduction(Equipment equipment, EquipmentShift equipmentShift, JobEquipment jobEquipment, Job job, EquipmentTelemetryModel telemetryData) in /src/TT.Core.Services/ProductionService.cs:line 1029 at TT.Core.Services.ProductionService.UpdateProductionFromTelemetry(EquipmentTelemetryModel telemetryData, Boolean createDefaultJob) in /src/TT.Core.Services/ProductionService.cs:line 751 --- End of inner exception stack trace --- at TT.Core.Services.ProcessingLiveMessagingService.ProcessTelemetryMQTTMessage(String message, Boolean createDefaultJob, Boolean createJobFromTelemetrys) in /src/TT.Core.Services/ProcessingLiveMessagingService.cs:line 165 at TT.Core.Telemetry.WebJob.TriggerProcessing.ProcessEventsAsync(String item, Boolean createDefaultJob, Boolean createJobFromTelemetry) in /src/TT.Core.Telemetry.WebJob/TriggerProcessing.cs:line 43 at TT.Core.Telemetry.WebJob.Program.<>c__DisplayClass33_1.d.MoveNext() in /src/TT.Core.Telemetry.WebJob/Program.cs:line 450 ---> (Inner Exception #0) System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached. at System.Data.Common.ADP.ExceptionWithStackTrace(Exception e) --- End of stack trace from previous location where exception was thrown --- at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, Boolean errorsExpected) at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransactionAsync(IsolationLevel isolationLevel, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransactionAsync(CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(Tuple2 parameters, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func4 operation, Func4 verifySucceeded, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(IReadOnlyList1 entriesToSave, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) at TT.Core.Services.ProductionService.UpdateOeeActualProduction(Equipment equipment, EquipmentShift equipmentShift, JobEquipment jobEquipment, Job job, EquipmentTelemetryModel telemetryData) in /src/TT.Core.Services/ProductionService.cs:line 1029 at TT.Core.Services.ProductionService.UpdateProductionFromTelemetry(EquipmentTelemetryModel telemetryData, Boolean createDefaultJob) in /src/TT.Core.Services/ProductionService.cs:line 751<---

Source

bilalmalik777

Most helpful comment

@bilalmalik777 If the tasks are set to start and connect as soon as each message comes in, they will try to obtain/open a connection (from the pool) immediately. If all connections are busy (connection pool size defaults to 100 max) with the long running operation, connection.Open() calls will begin to time out (default connection timeout is 15 seconds) with the exception you noted. You have some options. You can increase the max connection pool size and/or the connection timeout, but that won't scale forever. You can queue the messages and have your own thread/job pool with a max size less than max connection pool size which process the messages from the queue. You may also want to look into optimizing your long running method to not take so long.

David-Engel on 22 Aug 2019

👍2

All 8 comments

David-Engel on 22 Aug 2019

👍2

@David-Engel Thanks for your response.
I moved all code into one store procedure but now it is throwing the following error. I think db connection is not getting free

: System.AggregateException: One or more errors occurred. (A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server)) ---> System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server) at TT.Core.Services.ProcessingLiveMessagingService.ProcessTelemetryMQTTMessage(EquipmentTelemetryModel telemetry, Boolean createDefaultJob, Boolean createJobFromTelemetrys) in /src/TT.Core.Services/ProcessingLiveMessagingService.cs:line 160 at TT.Core.Telemetry.WebJob.TriggerProcessing.ProcessEventsAsync(EquipmentTelemetryModel telemetry, Boolean createDefaultJob, Boolean createJobFromTelemetry) in /src/TT.Core.Telemetry.WebJob/TriggerProcessing.cs:line 44 --- End of inner exception stack trace --- at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken) at TT.Core.Telemetry.WebJob.Program.<>c__DisplayClass32_1.b__3() in /src/TT.Core.Telemetry.WebJob/Program.cs:line 438 ---> (Inner Exception #0) System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server) at TT.Core.Services.ProcessingLiveMessagingService.ProcessTelemetryMQTTMessage(EquipmentTelemetryModel telemetry, Boolean createDefaultJob, Boolean createJobFromTelemetrys) in /src/TT.Core.Services/ProcessingLiveMessagingService.cs:line 160 at TT.Core.Telemetry.WebJob.TriggerProcessing.ProcessEventsAsync(EquipmentTelemetryModel telemetry, Boolean createDefaultJob, Boolean createJobFromTelemetry) in /src/TT.Core.Telemetry.WebJob/TriggerProcessing.cs:line 44 ClientConnectionId:00000000-0000-0000-0000-000000000000<--

bilalmalik777 on 27 Aug 2019

one more point I did not mention in the previous comment. we are running our application in Kubernetes. There are multiple pods are running because our application is built upon the microservice architecture. Only one pod is doing this problem because of MQTT, big traffic hit and it causes the problem. Before moving to Kubernetes, we were using Azure IoT hub for telemetry messaging, it was working perfectly even in production still, it is in use and working flawlessly.
Only one pod away to move on Kubernetes

bilalmalik777 on 27 Aug 2019

Hi @bilalmalik777

Would request you to test the issue with latest GA version of the driver and confirm if it's still reproducible. We may some fixes in Async Connection pooling behavior so wondering if it's related.

Thanks!

cheenamalhotra on 28 Aug 2019

@bilalmalik777 any updates on this issue?
Are you still experiencing this problem with latest driver?

cheenamalhotra on 25 Sep 2019

👍1

@cheenamalhotra, I am on annual leave from office, I would confirm in a couple of weeks.

bilalmalik777 on 26 Sep 2019

👍1

Closing as inactive and addressed.

cheenamalhotra on 4 Mar 2020

@cheenamalhotra My apologies i did not update you regarding the issue. Issue had been resolved a time ago. Actually problem was nothing to do with SqlClient. The issue was something else.

bilalmalik777 on 5 Mar 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

A network-related or instance-specific error occurred while establishing a connection to SQL Server. On AWS Lambda

pnquest · 5Comments

InvalidOperationException(SqlException) on SqlBulkCopy

xinbingyiran · 5Comments

SqlConnection.GetSchema cannot get procedure parameters with predefined metadata collection name

mizulily · 5Comments

"Microsoft.Data. SqlClient is not supported on this platform." when trying to use it in PowerShell 7.0.2

MasayukiOzawa · 8Comments

System.Net.Internals.SocketExceptionFactory+ExtendedSocketException is thrown to Task.Scheduler.UnobservedException

jinek · 5Comments