I am using bulk copy to copy 100 million rows from one database to another on the same server. I am seeing terribly poor runtimes with the asynchronous version of WriteToServerAsync compared to the synchronous WriteToServer methods
````sql
-- SQL used to generate sample data
DECLARE @RowAmount AS INT = 100000000;
WITH InfiniteRows (RowNumber) AS (
-- Anchor member definition
SELECT 1 AS RowNumber
UNION ALL
-- Recursive member definition
SELECT a.RowNumber + 1 AS RowNumber
FROM InfiniteRows a
WHERE a.RowNumber < @RowAmount
)
-- Statement that executes the CTE
SELECT CONVERT(VARCHAR(255), NEWID()) AS VAL
INTO temp
FROM InfiniteRows
OPTION (MAXRECURSION 0);
GO
```c#
async Task Main()
{
using (var sqlConnection = new Microsoft.Data.SqlClient.SqlConnection(@"Data Source=.\SQLEXPRESS;Initial Catalog=Scratch;Integrated Security=True"))
{
await sqlConnection.OpenAsync();
using (var command = new Microsoft.Data.SqlClient.SqlCommand("SELECT VAL FROM dbo.temp", sqlConnection))
{
using (var bulk = new Microsoft.Data.SqlClient.SqlBulkCopy(@"Data Source=.\SQLEXPRESS;Initial Catalog=Scratch;Trusted_Connection=True;Integrated Security=True;MultipleActiveResultSets=True", Microsoft.Data.SqlClient.SqlBulkCopyOptions.KeepNulls | Microsoft.Data.SqlClient.SqlBulkCopyOptions.TableLock))
{
bulk.SqlRowsCopied += (o, e) =>
{
Console.WriteLine(e.RowsCopied);
};
bulk.ColumnMappings.Add(new Microsoft.Data.SqlClient.SqlBulkCopyColumnMapping("VAL", "VAL"));
bulk.BulkCopyTimeout = 0;
bulk.BatchSize = 10000;
bulk.EnableStreaming = true;
bulk.DestinationTableName = "temp1";
bulk.NotifyAfter = 500000;
// In an empty table, finishes in 2 Mins 17 Seconds
bulk.WriteToServer(command.ExecuteReader());
// an empty table, finishes in ~9 minutes
await bulk.WriteToServerAsync(await command.ExecuteReaderAsync());
}
}
}
}
I would have expected the Async calls to be atleast comparable to the synchronous calls.
Microsoft.Data.SqlClient version: 2.0.1
.NET target: ,NET Framework 4.7.2
SQL Server version: SQL Server Version 2016 Express
Operating system: Windows 10
Additional context
Add any other context about the problem here.
@rmalhotra85 Thank you for bringing this up to our attention. I will look into the issue today and get back to you soon.
I have checked the issue and was able to repro what you mentioned. We will look into a solution.
Thanks - fwiw, I see the same behavior in System.Data.SqlClient as well (in case that's helpful)
Is this something that will be improved by the changes in https://github.com/dotnet/SqlClient/pull/358 ? If not I can take a look at it (I like playing with profilers.) if you don't already have plans @JRahnama
Since this seems to be async-related, I'd suspect maybe something related to #593 rather than boxing, but who knows...
It could be. Bulk copy is quite separate from the other parts of the api but perhaps they cross over or the same pattern exists separately. I haven't really investigated bulk copy since I have little need for it personally.
Well, I was looking into BulkCopy code for last couple of weeks regarding another issue with Sparse columns. Basically dotnet core is not good with Asynchronous Pattern
, (the ones that function name starts with Begin and End) design and it may throw platform not supported error when trying to call delegates. The base code, for SqlBulkcopy, is calling the same methods with some if/else statement inside it and that needs to be changed to async/await (TAP) specially in dotnet core,
@Wraith2, by all means, give it a try and I really appreciate your help. It is fun working with packets and learning how driver talks to server. Most probably you will need Wireshark to take a look inside the packet you send to the server. Going through MS-TDS documents is the best help we can get. There are explanations about each type of header and identifiers especially with PLP(Partially Length-prefixed) columns.
@roji First I thought the issue comes from OpenAsync call. I did several testing with differrent approaches and it all went back to Bulkcopy. Our final goal should be, especially in dotnet core, to convert all Asynchronous Pattern
which comes from earlier versions of dotnet framework, All EventBased asynchronous patterns to Task-Based Async pattern(TAP) and simplify everything to use async/await keywords and that would be a big change.
@JRahnama I definitely agree it should be all the old-style async code should be replaced with async/await. Aside from significantly increasing code quality, it's likely to improve perf in a significant way. It's no small undertaking, however...
I put it through some profiler runs and no single item sticks out as a problem. There is some wasted memory that can be improved but the top 5 most allocated items are strings chars and task machinery that can't be removed. In cpu terms the cost looks to be async overhead from reading strings which as @roji said is a bit inefficient but since those strings are short it isn't too bad.
The three things to so that spring to mind are:
@rmalhotra85 something that can make the code run a bit quicker in async mode is taking the MARS out of your Bulkcopy Connection string that will make it better, but still slower than sync one. I will keep looking in the issue.