When big traffic(telemetry messages) hit, i assigned a task to each message and made the task async-await, in this task there is a long-running method, which wrote some entities in DB, it started to throw these errors. I cannot put thread.sleep or lock because it cause lag in DB. Application is developed in Dotnet core 2 writing data to azure SQL server
System.AggregateException: One or more errors occurred. (Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.) ---> System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached. at System.Data.Common.ADP.ExceptionWithStackTrace(Exception e) --- End of stack trace from previous location where exception was thrown --- at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, Boolean errorsExpected) at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransactionAsync(IsolationLevel isolationLevel, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransactionAsync(CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(Tuple
2 parameters, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func
4 operation, Func4 verifySucceeded, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(IReadOnlyList
1 entriesToSave, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) at TT.Core.Services.ProductionService.UpdateOeeActualProduction(Equipment equipment, EquipmentShift equipmentShift, JobEquipment jobEquipment, Job job, EquipmentTelemetryModel telemetryData) in /src/TT.Core.Services/ProductionService.cs:line 1029 at TT.Core.Services.ProductionService.UpdateProductionFromTelemetry(EquipmentTelemetryModel telemetryData, Boolean createDefaultJob) in /src/TT.Core.Services/ProductionService.cs:line 751 --- End of inner exception stack trace --- at TT.Core.Services.ProcessingLiveMessagingService.ProcessTelemetryMQTTMessage(String message, Boolean createDefaultJob, Boolean createJobFromTelemetrys) in /src/TT.Core.Services/ProcessingLiveMessagingService.cs:line 165 at TT.Core.Telemetry.WebJob.TriggerProcessing.ProcessEventsAsync(String item, Boolean createDefaultJob, Boolean createJobFromTelemetry) in /src/TT.Core.Telemetry.WebJob/TriggerProcessing.cs:line 43 at TT.Core.Telemetry.WebJob.Program.<>c__DisplayClass33_1.d.MoveNext() in /src/TT.Core.Telemetry.WebJob/Program.cs:line 450 ---> (Inner Exception #0) System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached. at System.Data.Common.ADP.ExceptionWithStackTrace(Exception e) --- End of stack trace from previous location where exception was thrown --- at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, Boolean errorsExpected) at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransactionAsync(IsolationLevel isolationLevel, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.BeginTransactionAsync(CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(Tuple 2 parameters, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func
4 operation, Func4 verifySucceeded, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(IReadOnlyList
1 entriesToSave, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) at TT.Core.Services.ProductionService.UpdateOeeActualProduction(Equipment equipment, EquipmentShift equipmentShift, JobEquipment jobEquipment, Job job, EquipmentTelemetryModel telemetryData) in /src/TT.Core.Services/ProductionService.cs:line 1029 at TT.Core.Services.ProductionService.UpdateProductionFromTelemetry(EquipmentTelemetryModel telemetryData, Boolean createDefaultJob) in /src/TT.Core.Services/ProductionService.cs:line 751<---
@bilalmalik777 If the tasks are set to start and connect as soon as each message comes in, they will try to obtain/open a connection (from the pool) immediately. If all connections are busy (connection pool size defaults to 100 max) with the long running operation, connection.Open() calls will begin to time out (default connection timeout is 15 seconds) with the exception you noted. You have some options. You can increase the max connection pool size and/or the connection timeout, but that won't scale forever. You can queue the messages and have your own thread/job pool with a max size less than max connection pool size which process the messages from the queue. You may also want to look into optimizing your long running method to not take so long.
@David-Engel Thanks for your response.
I moved all code into one store procedure but now it is throwing the following error. I think db connection is not getting free
: System.AggregateException: One or more errors occurred. (A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server)) ---> System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server) at TT.Core.Services.ProcessingLiveMessagingService.ProcessTelemetryMQTTMessage(EquipmentTelemetryModel telemetry, Boolean createDefaultJob, Boolean createJobFromTelemetrys) in /src/TT.Core.Services/ProcessingLiveMessagingService.cs:line 160 at TT.Core.Telemetry.WebJob.TriggerProcessing.ProcessEventsAsync(EquipmentTelemetryModel telemetry, Boolean createDefaultJob, Boolean createJobFromTelemetry) in /src/TT.Core.Telemetry.WebJob/TriggerProcessing.cs:line 44 --- End of inner exception stack trace --- at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken) at TT.Core.Telemetry.WebJob.Program.<>c__DisplayClass32_1.b__3() in /src/TT.Core.Telemetry.WebJob/Program.cs:line 438 ---> (Inner Exception #0) System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server) at TT.Core.Services.ProcessingLiveMessagingService.ProcessTelemetryMQTTMessage(EquipmentTelemetryModel telemetry, Boolean createDefaultJob, Boolean createJobFromTelemetrys) in /src/TT.Core.Services/ProcessingLiveMessagingService.cs:line 160 at TT.Core.Telemetry.WebJob.TriggerProcessing.ProcessEventsAsync(EquipmentTelemetryModel telemetry, Boolean createDefaultJob, Boolean createJobFromTelemetry) in /src/TT.Core.Telemetry.WebJob/TriggerProcessing.cs:line 44 ClientConnectionId:00000000-0000-0000-0000-000000000000<--
one more point I did not mention in the previous comment. we are running our application in Kubernetes. There are multiple pods are running because our application is built upon the microservice architecture. Only one pod is doing this problem because of MQTT, big traffic hit and it causes the problem. Before moving to Kubernetes, we were using Azure IoT hub for telemetry messaging, it was working perfectly even in production still, it is in use and working flawlessly.
Only one pod away to move on Kubernetes
Hi @bilalmalik777
Would request you to test the issue with latest GA version of the driver and confirm if it's still reproducible. We may some fixes in Async Connection pooling behavior so wondering if it's related.
Thanks!
@bilalmalik777 any updates on this issue?
Are you still experiencing this problem with latest driver?
@cheenamalhotra, I am on annual leave from office, I would confirm in a couple of weeks.
Closing as inactive and addressed.
@cheenamalhotra My apologies i did not update you regarding the issue. Issue had been resolved a time ago. Actually problem was nothing to do with SqlClient. The issue was something else.
Most helpful comment
@bilalmalik777 If the tasks are set to start and connect as soon as each message comes in, they will try to obtain/open a connection (from the pool) immediately. If all connections are busy (connection pool size defaults to 100 max) with the long running operation, connection.Open() calls will begin to time out (default connection timeout is 15 seconds) with the exception you noted. You have some options. You can increase the max connection pool size and/or the connection timeout, but that won't scale forever. You can queue the messages and have your own thread/job pool with a max size less than max connection pool size which process the messages from the queue. You may also want to look into optimizing your long running method to not take so long.