Sqlclient: NullReference exception in TdsParserStateObjectManaged.ReadAsyncCallback

Created on 1 Oct 2020  路  12Comments  路  Source: dotnet/SqlClient

Describe the bug

Since I migrated my api to use Microsoft.Data.SqlClient I sometime encounter NullReferenceException in the logs of my api.
The issue seems totally random.
It doesn't cause a crash during a request because the exception seems to occur between requests on the GC thread.

Exception message:
A Task's exception(s) were not observed either by Waiting on the Task or accessing its Exception property. 
As a result, the unobserved exception was rethrown by the finalizer thread. (Object reference not set to an instance of an object.)

Stack trace:
at void Microsoft.Data.SqlClient.SNI.TdsParserStateObjectManaged.ReadAsyncCallback(SNIPacket packet, uint error)
at void System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, object state)
at void System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref Task currentTaskSlot, Thread threadPoolThread)

To reproduce

Not found a reliable way to reproduce it.

Expected behavior

No exception.

Further technical details

Microsoft.Data.SqlClient version: 2.0.1
.NET target: Core 3.1
SQL Server version: Azure Sql Database
Operating system: Docker Container
Connection string: tcp, connection timeout 30

Additional context
The NullReference exception seems to occur here
https://github.com/dotnet/SqlClient/blob/d14c0b1196de6271b36251d7aff63cfeadcad799/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObjectManaged.cs#L75

This object is set to null in the dispose method
https://github.com/dotnet/SqlClient/blob/d14c0b1196de6271b36251d7aff63cfeadcad799/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObjectManaged.cs#L93

The problem seems to be caused by the callback that is executed after the handle disposal.
From the code the callback never seems to be detached.

Most helpful comment

I have also not seen this error for 4 days since upgrading the SqlClient package where it would previously occur at least daily.

All 12 comments

Hi @kvpt ,

When you encounter this issue, have you enabled MARS in your connection string?

Ideally, we can help to debug this locally if any repo can be provided. If not, can you try to enable the EventSource to get more information when this happens?

Though it looks like for .NET Core only, I just would like to confirm with you that this doesn't happen for .NET Framework, is it correct?

Thanks,

Hi @karinazhou,

I used the default connection string provided from the azure portal for the azure sql database, so MARS is not enabled.

I can't give you a repo to test but if I can have a link to a procedure to enable debug or trace or something else I can do it.

The application run only on NetCore, inside a linux container on alpine image running inside a kubernetes cluster.
Previously the application was running on NetCore 2.2, System.Data.SqlClient 4.8.2 and EntityFrameworkCore 2.2.6 in the same environment without issue.
The issue started after the migration to NetCore 3.1, Microsoft.Data.SqlClient 2.0.1 and EntityFrameworkCore 3.1.8.

Thanks

@kvpt One thing you can try is to use perfcollect tool with your application in the docker environment. The instructions can be found collecting-in-a-docker-container. It will generate a zip file and it can be opened in PerfView on Windows. Ideally, it will contain the eventsource information.

Another option you can also try is to add EventListener to your running application. Here is a simple example of a sample EventListener. Basically, you need to add the sample EventListener to your application and add a new instance to your application to capture the EventSource information. For more examples, please refer to this. By redirecting all the out messages to another TextWriter will help for investigation.

Thanks,

Hi @karinazhou,
I added the EventListener in my app and configured it as mentioned (had to disable app trimming for it to work but it's ok for the debug)
Here the export of the logs of the faulted query : ExportLog.xlsx
From what I see in the logs the pattern is always the same, each time the exception happen it's when a request timeout.

@kvpt Thank you for the update. We will look into the log.

Hi @kvpt ,

I go through the trace log you provided before and I don't see anything strange from the log. The first several connections work normally and when the network is down, the connection in use is broken and disposed.

I tried to simulate your scenario on my ubuntu vm with a simple program which keeps opening connection, running query, and closing connection in async mode for several times. When running the program, I manually fail the network and it gives me the exception
System.IO.IOException: Unable to read data from the transport connection: Connection reset by peer.
and
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 35 - An internal exception was caught)

When the exception is hit, the connection is broken and then be doomed and disposed. My trace log has the same pattern as what you have. However, I don't see NullReferenceException during my testing. I think we still need a repro to investigate more on this.

Ubuntu-18.10-02.log

Thanks,

I have also been seeing these exceptions in my application:

System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.Data.SqlClient.SNI.TdsParserStateObjectManaged.ReadAsyncCallback(Microsoft.Data.SqlClient.SNI.SNIPacket packet, System.UInt32 error) at offset 30
at System.Threading.Tasks.ContinuationTaskFromResultTask`1.InnerInvoke() at offset 67
at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread threadPoolThread, System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state) at offset 76
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task& currentTaskSlot, System.Threading.Thread threadPoolThread) at offset 292

I am using Microsoft.Data.SqlClient 2.0.1 and also running in a Docker container.

Hi @vln104

Could you try your application code with latest SqlClient version 2.1.0 we released recently that contains Null check on _sessionHandle (mentioned in issue description) and let us know?

@cheenamalhotra I deployed the last package update today and no error for me since the deployment.
I will continue to monitor it next week as the issue not occur every day, but always several times during a week.
I will come back to you at the end of next week.

Hi @vln104

Could you try your application code with latest SqlClient version 2.1.0 we released recently that contains Null check on _sessionHandle (mentioned in issue description) and let us know?

Hi,

I can update the package in my dev environment and observe for a few days to try and report back by the end of the week.

Thanks,

Linh

@cheenamalhotra After a week in production, I confirm that this issue no longer occurs.

I have also not seen this error for 4 days since upgrading the SqlClient package where it would previously occur at least daily.

Was this page helpful?
0 / 5 - 0 ratings