Query/Question
I'm trying to upload a large file (over 10Gb) to Azure Blob Storage using SAS tokens.
I generate the tokens like this
val storageConnectionString = s"DefaultEndpointsProtocol=https;AccountName=${accountName};AccountKey=${accountKey}"
val storageAccount = CloudStorageAccount.parse(storageConnectionString)
val client = storageAccount.createCloudBlobClient()
val container = client.getContainerReference(CONTAINER_NAME)
val blockBlob = container.getBlockBlobReference(path)
val policy = new SharedAccessAccountPolicy()
policy.setPermissionsFromString("racwdlup")
val date = new Date().getTime();
val expiryDate = new Date(date + 8640000).getTime()
policy.setSharedAccessStartTime(new Date(date))
policy.setSharedAccessExpiryTime(new Date(expiryDate))
policy.setResourceTypeFromString("sco")
policy.setServiceFromString("bfqt")
val token = storageAccount.generateSharedAccessSignature(policy)
Then I tried the Put Blob API and hit the following error
$ curl -X PUT -H 'Content-Type: multipart/form-data' -H 'x-ms-date: 2020-09-04' -H 'x-ms-blob-type: BlockBlob' -F [email protected] https://ACCOUNT.blob.core.windows.net/CONTAINER/10gb.csv\?ss\=bfqt\&sig\=.... -v
< HTTP/1.1 413 The request body is too large and exceeds the maximum permissible limit.
< Content-Length: 290
< Content-Type: application/xml
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: f08a1473-301e-006a-4423-837a27000000
< x-ms-version: 2019-02-02
< x-ms-error-code: RequestBodyTooLarge
< Date: Sat, 05 Sep 2020 01:24:35 GMT
* HTTP error before end of send, stop sending
<
<?xml version="1.0" encoding="utf-8"?><Error><Code>RequestBodyTooLarge</Code><Message>The request body is too large and exceeds the maximum permissible limit.
RequestId:f08a1473-301e-006a-4423-837a27000000
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):
Time:2020-09-05T01:24:35.7712576Z</Message><MaxLimit>268435456</MaxLimit></Error>%
After that tried uploading it using PageBlob (I saw in the documentation something like size can be up to 8 TiB)
$ curl -X PUT -H 'Content-Type: multipart/form-data' -H 'x-ms-date: 2020-09-04' -H 'x-ms-blob-type: PageBlob' -H 'x-ms-blob-content-length: 1099511627776' -F [email protected] https://ACCOUNT.blob.core.windows.net/CONTAINER/10gb.csv\?ss\=bfqt\&sig\=... -v
< HTTP/1.1 400 The value for one of the HTTP headers is not in the correct format.
< Content-Length: 331
< Content-Type: application/xml
< Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
< x-ms-request-id: b00d5c32-101e-0052-3125-83dee7000000
< x-ms-version: 2019-02-02
< x-ms-error-code: InvalidHeaderValue
< Date: Sat, 05 Sep 2020 01:42:24 GMT
* HTTP error before end of send, stop sending
<
<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidHeaderValue</Code><Message>The value for one of the HTTP headers is not in the correct format.
RequestId:b00d5c32-101e-0052-3125-83dee7000000
* Closing connection 0
* TLSv1.2 (OUT), TLS alert, close notify (256):
Time:2020-09-05T01:42:24.5137237Z</Message><HeaderName>Content-Length</HeaderName><HeaderValue>10114368132</HeaderValue></Error>%
Not sure what is the proper way to go about uploading such large file?
Why is this not a Bug or a feature Request?
A clear explanation of why is this not a bug or a feature request? just looking for recommendations to upload large files.
Setup (please complete the following information if applicable):
lazy val azureLibs = Seq(
"com.azure" % "azure-storage-blob" % "12.7.0",
"com.microsoft.azure" % "azure-storage" % "8.6.5"
)
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
Hi @dzlab,
The reason your second request is failing is because a PUT request for Page Blobs only creates a 0 length Page Blob and the content length must be set to 0 (I've pasted some information from the Put Blob link you provided).
Content-Length | Required. The length of the request.For a page blob or an append blob, the value of this header must be set to zero, as聽Put Blob聽is used only to initialize the blob. To write content to an existing page blob, call聽Put Page. To write content to an append blob, call聽Append Block.
To actually upload data to a page blob, the rest endpoint to hit would be the one specified here after creating the page blob with this api here.
Just out of curiosity, is there a reason you are issuing requests directly to the REST endpoint instead of using an SDK?
@gapra-msft thanks for the clarification.
I'm actually experimenting with whatever option is available for uploading large files.
I first tried with multi-part requests and having every chunk as a bloc and at the end committing the blocks to form a final blob using CloudBlockBlob.uploadBlock but that was slow I guess I would need to bufferize the chunks.
Then I found the SAS token approach, I'm just testing with curl but I guess eventually I will end up using the js SDK on the client side to do the upload.
Do you have any recommendations?
Hi @dzlab
At least in the Java SDK, we have buffered upload methods (upload/uploadWithResponse/uploadFromFile) that allow users to pass in different levels of concurrency and block size, that I would suggest people to use in the case of large uploads so they can fine tune the parameters based off of their network speeds and data size.
As far as javascript goes, I'm not too sure if they have equivalents, but I'm sure if you pose this question in their repo, they would be happy to answer specific questions for the language.
Is there a reason you decided to use a SAS token instead of a different authentication mode?
@gapra-msft Probably sort of off topic. But what would be the disadvantages of using a SAS token?
I wouldnt say there are any disadvantages of using SAS in particular. SAS is helpful when you want to provide fine grained access to a resource (for example, write access/read access or some combination of different permissions). Here is a doc that might be more helpful.
Please let me know if you have any further questions about it.
So this would be fine grained permissions over just the storage account right? Say I want the Region of a storage account then I would have to use the management SDK that would require Azure AD?
Yes, the SAS token I am referring to can be either for account operations (for example, listing all containers, etc) or resource level operations (like uploading/downloading/listing blobs in a container).
I'm not too familiar with the management SDK so I'm not sure what the different options are for authenticating there.
@gapra-msft fine control over access to resource is actually what I need as the upload will be performed by users of our app.