Azure-sdk-for-go: CreateBlockBlobFromReader fails due to missing content length parameter

Created on 18 May 2017  Â·  25Comments  Â·  Source: Azure/azure-sdk-for-go

Usage:
blob.CreateBlockBlobFromReader(reader, nil)

Output:
storage: service returned error: StatusCode=411, ErrorCode=MissingContentLengthHeader, ErrorMessage=Content-Length HTTP header is missing

fmt.Printf("Content length: %d", blob.Properties.ContentLength) - this outputs 0

Most helpful comment

Azure Storage blobs and AWS do not allow an arbitrarily large file to be uploaded in one HTTP operation. Therefore, uploading in blocks is required by both services. Furthermore, I/O operations can always fail (for many reasons) and therefore I/O operations must retried. In order to retry an upload, the data must be in a memory buffer and there must be a way to seek back to the beginning of the buffer. So, for data coming in from a non-seekable source, that data must first be buffered in memory (a seekable source) before upload to azure storage operation is initiated. This is the required, mandatory building block. By the way, each upload block operation only needs the size (content length) of the block; not the length of the full file. Once Azure's PutBlockList operation is called, Azure will assume all the blocks into a single blob and set the full-size content length automatically.

Now, on top of this building block, a function can be easily implemented that streams a large mass by splitting it into blocks, uploading each block, and then after the last block, calls PutBlockList to assemble the full-size blob. I don't know if the current Go SDK for azure storage has this function but our future Go SDK will definitely have it.

All 25 comments

PR #627

Howdy @radu-matei,

I'm investigating this issue now, thanks for bringing this to our attention. I'll get back to you ASAP on how we should move forward.

My gut reaction is that this should be solved by better calculating blob.Properties.ContentLength, not making that calculation the responsibility of the caller.

-Martin

For the life of me, I'm not sure why our client.go is even setting Content-Length. It seems to me that if we just leave it unset, the HTTP stack in Go will correctly set it.

@radu-matei, can you confirm that this issue is fixed by v10.0.3-beta?

@marstr I just ran into this issue, running

- name: github.com/azure/azure-sdk-for-go
  version: 26132835cbefa2669a306b777f34b929b56aa0a2

so I think the answer to your question is no

@Blackbaud-ChrisJenkins - until there is a fix in the SDK you can pass the size yourself, as in https://github.com/Azure/azure-sdk-for-go/pull/627

Any updates on this issue? I am unable to use the SDK because of it (I have to use my fork instead)

Howdy, sorry I didn't get to this earlier. I've been on vacation without internet access for the last 5 days or so. @mcardosos and I will look into this today and get another patch out to really get this issue fixed.

To mitigate this, @mcardosos wrote up another Pull Request that counts the size of the blob before submitting a request, you can see it in #638. In the long run, I really think this is something that should get pushed off on the Go standard library as much as possible. However, we'll do this in the spirit of unblocking folks.

How about 10.0.4-beta, has that addressed this issue?

it's working for me now, thanks!

Awesome! closing this then :)

What happens if the reader i pass is huge?
I see that you do:

var buf bytes.Buffer
n, err = io.Copy(&buf, blob)

That means reading the entire reader into memory, just to compute the content length.
If azure allows uploads up to ~4TB in size, shouldn't the sdk allow that too?

The whole go-philosophy of Readers is to stream data, not read it all at once (think Unix pipes). Otherwise I'd simply send the whole byte array, not a io.Reader.

Not to make service comparisons, but aws-s3 sdk takes the approach of reading from the Reader, chunking the input every N bytes and uploading the parts much like what the AppendBlob would do.

Our use case is streaming database backups (several GBs) to azure on the fly (without saving to disk), so ReadSeeker is not an option either.

What do you guys think?
Many thanks!

@fermin-silva I'd like to understand more about why a ReadSeeker wouldn't work for your case. I ask because we're looking at updating the storage APIs to take ReadSeekers in a future version in order to simplify transient failure retry logic without the SDK having to perform any buffering.

Hi @jhendrixMSFT , because I don't have the possibility to seek a stream whose content is not stored anywhere (nor disk nor memory), and whose data is not fully generated at the time of starting the upload.

Imagine I have a multipart stream coming into my server and i want to pass it to azure, or im invoking a subcommand and taking the standard output of that program (mysqldump, pg_dump, imagemagick, whatever). Or perhaps my binary program could read from stdin, do some magic, and then upload to azure. Storing all the intermediate data in most cases is not an option (let alone store it in memory).

Going back to the aws-s3 approach, one as an user is in charge of choosing the upload part size when the reader is not seekable. If the reader is seekable, then the sdk can calculate it accordingly. If one chooses a very small part size and the whole upload ends up being more than 10k chunks or so, the upload fails.

Haven't dug very deep in the azure java sdk, but it appears to me it supports this scenario: https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/CloudBlockBlob.java#L629 using the if (useOpenWrite) {

Thanks for the fast reply guys, specially on friday :)

Thanks for the explanation. I think there is some confusion here with this API, it maps directly to https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob which when used for initial block blob creation has a 256MB size limit.
Today we don't have something comparable to the Java API you cited which uploads a file in its entirety, breaking it into smaller chunks as required (we plan to add this functionality in a future release).

It was also confusing to me because there's also the Append Blobs APIs implemented, so I thought that could be used as well. But that has a maximum part of 4MB, so large files (more than 10k parts) are not supported either.

So, to recap, nowadays there's no way of uploading large files to azure with this SDK? Should I split the file in 256MB blobs myself? But then the download and re-assembly of those blobs to a single file seem a bit cumbersome.

Thanks again!

I really think there should be a consistent way of uploading (larger) files.

In the case of VHDs, can use azure-vhd-utils, but then you are stuck to either using an older version of this SDK, or to having two versions of this SDK, both in my opinion really bad ideas.

Is there a clear way of doing this?
Thanks!

@fermin-silva @radu-matei The current implementation requires you to split large file into multiple chunks for upload. For download you shouldn't have to perform any assembling of blocks, you can use the Get API.
We will be adding convenience methods to simplify uploading of large blobs in a (not too distant) future release.

Azure Storage blobs and AWS do not allow an arbitrarily large file to be uploaded in one HTTP operation. Therefore, uploading in blocks is required by both services. Furthermore, I/O operations can always fail (for many reasons) and therefore I/O operations must retried. In order to retry an upload, the data must be in a memory buffer and there must be a way to seek back to the beginning of the buffer. So, for data coming in from a non-seekable source, that data must first be buffered in memory (a seekable source) before upload to azure storage operation is initiated. This is the required, mandatory building block. By the way, each upload block operation only needs the size (content length) of the block; not the length of the full file. Once Azure's PutBlockList operation is called, Azure will assume all the blocks into a single blob and set the full-size content length automatically.

Now, on top of this building block, a function can be easily implemented that streams a large mass by splitting it into blocks, uploading each block, and then after the last block, calls PutBlockList to assemble the full-size blob. I don't know if the current Go SDK for azure storage has this function but our future Go SDK will definitely have it.

Thank you for your response.
Is there any open issue that I can subscribe to, so I know when this will be live? Maybe https://github.com/Azure/azure-sdk-for-go/issues/551 ?

Otherwise I can create one for you

Not at this time. But we will make an announcement on the Azure blog when our new GO SDK is in preview.

From: Fermin Silva [mailto:[email protected]]
Sent: Wednesday, August 23, 2017 1:07 PM
To: Azure/azure-sdk-for-go azure-sdk-for-go@noreply.github.com
Cc: Jeffrey Richter jeffrichter@live.com; Comment comment@noreply.github.com
Subject: Re: [Azure/azure-sdk-for-go] CreateBlockBlobFromReader fails due to missing content length parameter (#626)

Thank you for your response.
Is there any open issue that I can subscribe to, so I know when this will be live? Maybe #551https://github.com/Azure/azure-sdk-for-go/issues/551 ?

Otherwise I can create one for you

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com/Azure/azure-sdk-for-go/issues/626#issuecomment-324447568, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACK0vzIqVdXyRFhwrMHZSOZ9wLWcTwPUks5sbIZ5gaJpZM4NeymO.

Just for future reference, this is the preview you were talking about right?

https://azure.microsoft.com/en-us/blog/preview-the-new-azure-storage-sdk-for-go-storage-sdks-roadmap/

Thanks

Yes, this is our new blob SDK for Go.

-- Jeffrey Richter (watch Architecting Distributed Cloud Appshttps://aka.ms/RichterCloudApps on YouTubehttps://aka.ms/RichterCloudApps or edXhttps://aka.ms/edx-devops200_9x-about)


From: Fermin Silva notifications@github.com
Sent: Thursday, December 7, 2017 6:53 AM
To: Azure/azure-sdk-for-go
Cc: Jeffrey Richter; Comment
Subject: Re: [Azure/azure-sdk-for-go] CreateBlockBlobFromReader fails due to missing content length parameter (#626)

Just for future reference, this is the preview you were talking about right?

https://azure.microsoft.com/en-us/blog/preview-the-new-azure-storage-sdk-for-go-storage-sdks-roadmap/

Thanks

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com/Azure/azure-sdk-for-go/issues/626#issuecomment-349990168, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACK0v4eKf4dUYek5s1bhOY7WhQvnMQMmks5s9_vhgaJpZM4NeymO.

Was this page helpful?
0 / 5 - 0 ratings