Sqlclient: SqlCommand.Cancel() does not work on Linux

Created on 17 May 2019  路  15Comments  路  Source: dotnet/SqlClient

SqlCommand.Cancel() doesn't appear to work using System.Data.SqlClient on Linux (Ubuntu 19.04, .NET Core SDK 3.0.100-preview5-011568) - the command runs to completion, and only then throws SqlException. The issue does not seem to repro on Windows (with .NET Core).

Repro:

```c#
using (var cmd = new SqlCommand("WAITFOR DELAY '00:00:05'", conn))
{
// sync cancellation
Task.Delay(1000).ContinueWith(t => cmd.Cancel());
var sw = Stopwatch.StartNew();
try
{
cmd.ExecuteNonQuery();
}
catch (Exception e)
{
Console.WriteLine($"Sync cancellation: caught {e.GetType().Name} after {sw.ElapsedMilliseconds}ms");
// Does not interrupt the call (completes after 5 seconds) but still generates SqlException at the end
}

// async cancellation
sw.Restart();
try
{
    await cmd.ExecuteNonQueryAsync(new CancellationTokenSource(1000).Token);
}
catch (Exception e)
{
    Console.WriteLine($"Async cancellation: caught {e.GetType().Name} after {sw.ElapsedMilliseconds}ms");
    // Interrupts and generates SqlException immediately
}

}
```

/cc @divega @karinazhou

Most helpful comment

I've tried the idea above. It works as far as I can tell. The full manual test suite runs through without unusual problems (TVPMain really need fixing at some point).

It's a bit annoying fix because of the current packet cloning which I've fixed in the managed networking rework PR. It will also be blocked by the same issue as that PR https://github.com/dotnet/corefx/pull/35363 where an unrelated test that fails anyway will continue to fail so it won't get merged. So once things get unblocked I'll be able to fix this.

All 15 comments

Thank you for repro steps. We will investigate more on this one.

Triage: This would be nice to fix for 1.0, but may not rise above other GA tasks at this point.

FWIW it may be worth confirming whether this somehow stems from a more low-level issue (i.e. some sort of corefx networking issue) as opposed to an SqlClient-issue.

It's a managed implementation issue. I've reproduced it on windows using your test script above and have added a new test.

There are lock(this) statements in the managed send and receive methods on SniTCPHandle. So what happens is you start a query which ultimately tracks down into Receive which takes the lock and then from another thread you call Cancel which tracks down into SendAttention and that tries to Send and blocks on the lock. When the receive times out the lock is released the send works, the attention is processed and everything unravels in the correct order just a lot later that you want it to.

I think send and receive aren't mutually exclusive but that each one is not concurrent with itself. Changing the lock to be operation specific consumes a bit of object space but there shouldn't be too many tcp handles around since they're reused. It will need some careful testing because locking in this library is definitely "here be dragons" territory.

@Wraith2 your explanation makes perfect sense, and I can see how allowing concurrent send/receive is not to be done lightly... :)

My 2 c. Send and receive are mutually exclusive. All sends are also mutually exclusive except for the Attention packet.
Good catch @Wraith2
Cancel ATTN packet is the only packet in the protocol which can be sent out of band.

This is a bug, and I can understand based on @Wraith2 explanation why this is happening. What is the fix?

Based on your assertions i think the fix is to change the Send implementation to require the lock unless the packet is detectably OOB (ATTN is the only one of these i know of)) from the packet header.

image

Uploading a snippet from the TDS protocol to make this clearer.

I would say, the client should try to send the attention packet ASAP, by inserting it into the TDS stream, as soon as it finishes writing the current packet.

All sends are also mutually exclusive except for the Attention packet.

To refine this further, all requests are send independently of each other except for ATTN packet which can be interleaved in between other packets for an ongoing request.

So i think my proposal is to have two locks and these rules.

1) all in-band send and receive operations MUST aquire the in-band lock (currently Monitor.Enter(this))
2) out-of-band communications MAY aquire (would be Monitor.TryEnter(this, out aquired) and detect attn packet header flag)
2) all send operations MUST aquire the send lock (new _sendLock, lock(_sendLock))

Just be careful you don't end up in a state where you're writing an out-of-band cancellation, without having acquired the lock, just as a user send starts - with the two interfering with one another. I'm not sure of the cross-OS guarantees with regards to atomicity of a single write (i.e. is a ~read~ write guaranteed to complete without interference even if another write is in progress).

FYI in PostgreSQL the protocol obviates this kind of problem by making clients send cancellations on a totally separate TCP connection (although that sometimes creates its own difficulties).

Just be careful you don't end up in a state where you're writing an out-of-band cancellation, without having acquired the lock, just as a user send starts - with the two interfering with one another

I don't think it would be possible with my suggested scheme. the attn would not have the in-band lock but must have the send lock, the first in-band would finish and release the in-band lock, another send could then get the in-band lock but would always block on the send lock which can only complete when the attn has been sent.

I've tried the idea above. It works as far as I can tell. The full manual test suite runs through without unusual problems (TVPMain really need fixing at some point).

It's a bit annoying fix because of the current packet cloning which I've fixed in the managed networking rework PR. It will also be blocked by the same issue as that PR https://github.com/dotnet/corefx/pull/35363 where an unrelated test that fails anyway will continue to fail so it won't get merged. So once things get unblocked I'll be able to fix this.

@Wraith2 great to hear. Note @divega's comment here - we do have async cancellation working, it's probably worth understanding how/why. If we have totally different mechanisms for sync/async cancellation (which we probably do as one is buggy and not the other), it may be worth exploring merging them at least to some extent (unless there's a good reason).

It may also be worth working on this issue and on #44 together, as both are cancellation-related.

Was this page helpful?
0 / 5 - 0 ratings