Hi, I've had an issue with uploading large files to blob storage which is blocking me.
When uploading a large file from a stream, the stream appears to restart without throwing any exceptions. This is a large file and the restarts cause the file to never fully upload.
Sample code streaming from an FTP directory, to azure.
(using Azure.Storage.Blobs 12.6.0, dotnet core 3.1, running locally on Windows 10)
await using (var stream = sftp.OpenRead(message.FilePath))
{
var progressHandler = new Progress<long>();
progressHandler.ProgressChanged += (_, uploadedBytes) =>
{
Log.Information("Streaming file {Name}. {BytesUploaded} of {TotalBytes} bytes transferred ({Percentage}) after {Duration}ms.", message.FileName, uploadedBytes, fileSizeInBytes, ((double)uploadedBytes / fileSizeInBytes).ToString("P"), timer.ElapsedMilliseconds);
};
await blob.UploadAsync(stream, progressHandler: progressHandler);
}
The same issue occurs when first streaming the whole file into memory.
var stream = new MemoryStream();
sftp.DownloadFile(message.FilePath, stream, delegate (ulong bytesTransferred)
{
Log.Information("Downloading file {Name}. {BytesDownloaded} of {TotalBytes} bytes transferred ({Percentage}) after {Duration}ms.", message.FileName, bytesTransferred, fileSizeInBytes, ((double)bytesTransferred / fileSizeInBytes).ToString("P"), timer.ElapsedMilliseconds);
});
stream.Seek(0, SeekOrigin.Begin);
var progressHandler = new Progress<long>();
progressHandler.ProgressChanged += (_, uploadedBytes) =>
{
Log.Information("Uploading file {Name}. {BytesUploaded} of {TotalBytes} bytes transferred ({Percentage}) after {Duration}ms.", message.FileName, uploadedBytes, fileSizeInBytes, ((double)uploadedBytes / fileSizeInBytes).ToString("P"), timer.ElapsedMilliseconds);
};
await blob.UploadAsync(stream, progressHandler: progressHandler);
When looking at the logging output I see:
[18:45:42 INF] Streaming file large_file.zip. 50069504 of 238035600 bytes transferred (21.03%) after 99611ms.
[18:45:42 INF] Streaming file large_file.zip. 50200576 of 238035600 bytes transferred (21.09%) after 99883ms.
[18:45:43 INF] Streaming file large_file.zip. 50331648 of 238035600 bytes transferred (21.14%) after 100287ms.
[18:45:44 INF] Streaming file large_file.zip. 0 of 238035600 bytes transferred (0.00%) after 101325ms.
[18:45:44 INF] Streaming file large_file.zip. 131072 of 238035600 bytes transferred (0.06%) after 101789ms.
[18:45:45 INF] Streaming file large_file.zip. 262144 of 238035600 bytes transferred (0.11%) after 102212ms.
[18:45:45 INF] Streaming file large_file.zip. 393216 of 238035600 bytes transferred (0.17%) after 102625ms.
Is there anything I can try to gain further insight into what error is occurring, causing the restart? I'd have assumed error handling and "PutBlock" retrys would be handled by the framework given these methods seem to be removed from the latest package.
Many thanks,
For completeness question also asked on StackOverflow https://stackoverflow.com/questions/64668945/streaming-large-files-to-azure-blob-storage-stream-restarting
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.
Thank you for your feedback. Tagging and routing to the team best able to assist.
Approaching this from a new angle and this time attempting to write the file using the BlockBlobClient as detailed in the following tests: https://github.com/Azure/azure-sdk-for-net/blob/master/sdk/storage/Azure.Storage.Blobs/tests/BlockBlobClientTests.cs
Interestingly have the same issue during upload (block 11 of 25) although this time I'm able to catch a 400 InvalidBlobOrBlock error from the server:
The specified blob or block content is invalid.
RequestId:6b1744eb-f01e-0044-7190-b25542000000
Time:2020-11-04T09:58:15.2075900Z
Status: 400 (The specified blob or block content is invalid.)
ErrorCode: InvalidBlobOrBlock
Headers:
Server: Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0
x-ms-request-id: 6b1744eb-f01e-0044-7190-b25542000000
x-ms-client-request-id: 40697463-8818-4421-a4ba-b2a3745326fb
x-ms-version: 2019-12-12
x-ms-error-code: InvalidBlobOrBlock
Date: Wed, 04 Nov 2020 09:58:14 GMT
Content-Length: 234
Content-Type: application/xml
Latest code for completeness:
var client = new BlockBlobClient(_storageOptions.ConnectionString, _storageOptions.Container, message.FileName);
const int blockSize = 10000000;
var blockIndex = 0;
var totalBlocks = (int) Math.Ceiling(fileSizeInBytes / (double) blockSize);
var committedBlocks = new List<string>(totalBlocks);
var progressHandler = new Progress<long>();
progressHandler.ProgressChanged += (_, uploadedBytes) =>
{
Log.Information("Uploading file {Name} block {Index} of {TotalBlocks}. {BytesUploaded} of {TotalBytes} bytes transferred ({Percentage}) after {Duration}ms.",
message.FileName,
committedBlocks.Count + 1,
totalBlocks,
uploadedBytes,
blockSize,
((double)uploadedBytes / blockSize).ToString("P"),
timer.ElapsedMilliseconds);
};
try
{
await using var stream = sftp.OpenRead(message.FilePath);
var buffer = new byte[blockSize];
while (stream.Read(buffer, 0, blockSize) > 0)
{
var blockId = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes($"{message.FileName}_{blockIndex}"));
await client.StageBlockAsync(blockId, new MemoryStream(buffer), progressHandler: progressHandler);
committedBlocks.Add(blockId);
blockIndex++;
}
await client.CommitBlockListAsync(committedBlocks);
}
catch (Exception ex)
{
Log.Error(ex, "Unable to upload file.");
throw;
}
Is there something I'm doing fundamentally wrong here?
Any help would be much appreciated.
It would appear ensuring the ID length of my blockId is always the same length has fixed the above 400 Bad Request error as cited in the following blog: https://gauravmantri.com/2013/05/18/windows-azure-blob-storage-dealing-with-the-specified-blob-or-block-content-is-invalid-error/
Could this also be an underlying issue with the blob.UploadAsync(stream); method?
For clarity:
From:
var blockId = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes($"{message.FileName}_{blockIndex}"));
From:
var blockId = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes($"{blockIndex:D4}")); // zero padding the integer 0000-9999
Hi,
Thanks for bringing this find of this issue to our attention.
There's two issues that I want to address. First is the restart of the upload, I will have to look into reproducing this issue and seeing why we are restarting the upload (like if there was a network interruption, a random 500 error in the middle of the upload, or something weird happening when reading from the stream).
The second issue I find very interesting that the block id has to always be the same length when uploading or else an invalid error will occur. I will write a specific test for this, and let you know what I find.
Most helpful comment
It would appear ensuring the ID length of my
blockIdis always the same length has fixed the above400 Bad Requesterror as cited in the following blog: https://gauravmantri.com/2013/05/18/windows-azure-blob-storage-dealing-with-the-specified-blob-or-block-content-is-invalid-error/Could this also be an underlying issue with the
blob.UploadAsync(stream);method?For clarity:
From:
From: