Aws-sdk-js: S3: Support for multipart upload using presigned url

Created on 28 Jun 2017 · 34Comments · Source: aws/aws-sdk-js

Hi folks! Apologies if this doesn't qualify as an issue but posting it as a last resort after spending hours looking for a definitive answer to -

Does the SDK supports or plans to support multipart upload using presigned PUT urls?

I didn't find anything like that in the library documentation. The S3 blog has a post about Managed Uploads which intelligently chunk the upload but that doesn't seem to have any param for signing url.

The only thing I could find is #468 which is about a couple of years old. It seems that libraries like EvaporateJS and this older library by @Yuriy-Leonov seem to support it. That makes me think that this SDK would have support for it (which would be much more preferable) as well but I was unable to confirm that.

And if it doesn't support it today, is this something which is on the radar? And if not, is there a recommended way of implementing it (_the chunking the large and signing process for the different part_)?

Thank you!

feature-request

Source

oyeanuj

👍86

Most helpful comment

Really hope aws-sdk-js can provide this feature.

ithinco on 8 Jun 2018

👍34

All 34 comments

@oyeanuj
Are you trying to perform a multipart upload from a browser? The Managed Uploader you linked to accomplishes that, and works directly with File objects. You can also use it indirectly by calling s3.upload. s3.upload will perform a multipart upload behind the scenes if your file is larger than 5 MB.

The libraries you linked to don't appear to be using presigned urls to handle multipart uploads. EvaporateJS does ask for a signingUrl, but this is actually a url to a service you host that returns a v4 signature, which isn't the same as a presigned url.

Can you provide some more feedback on what you're trying to accomplish with the SDK?

chrisradek on 29 Jun 2017

@chrisradek thank you for responding!

My usecase was looking to upload a file from the client, without sending it through server-side (Ruby). I was using it in the context of React/Redux, so I didn't want to deal with getting forms through createPresignedPost. From my research, it seemed the simplest and often recommended way to do that was generating a presigned url to make a PUT Request on the client.

Since the question yesterday, I chatted with @jeskew and @dinvlad, and it seems like that if I wanted to do multipart sending the files to my server, or without createPresignedPost, I'd have to use STS token (which seems a little bit more complicated than creating presigned_url).

So, at this moment, I am doing simple upload without chunking or multipart support. But I'd love to be able to do that, since the presignedUrl method feels the cleanest to use to upload, and I will need to soon upload files upto 2GB. So FWIW, I'd love to put a vote in for that in your backlog.

(_and yes, you are right that those libraries require a server-side signature, my bad_)

oyeanuj on 29 Jun 2017

I have a similar use case where I am using a pre-signed URL to upload a large zip file from the client side without interacting with Server. Multipart upload will be ideal. I agree with @oyeanuj that presignedUrl method feels the cleanest to use to upload and a multipart support to this would be ideal.

ssshah5 on 18 Jul 2017

@ssshah5 Presigned URLs for a multipart upload isn't something that a client library could offer on its own -- you would need to coordinate between the client and server to get the appropriate URLs signed -- but I'll mark this as a feature request. In the meantime, you might want to look at using the createPresignedPost method to construct an HTML form. That will allow the browser to manage access to the filesystem and upload the file as a multipart form.

jeskew on 19 Jul 2017

@jeskew - Thank you for the details and escalating this as a feature request. For our use case - The client reaches out to server to get a pre-signed URL in order to PUT objects to object-storage. The server than generates the URL (using the S3 node module) and returns it back to the client side code (bash). The client side code than tries to use this single URL to upload the entire objects. Since the size of the object can be huge, we are using HTTP streaming data mechanism (chunked transfer encoding). However, this doesn't seem to be working with S3 object storage and I get an error back (Content-Length is missing) probably because it expects that transfer should send ending bits at the end of first chunk. This scenario works fine with Swift Object Storage where a single temporary URL allows to transmit streaming data without generating temporary (pre-signed) URLs for each chunk.
Do you think it would be possible to upload chunked data using a single pre-signed URL?
Thanks

ssshah5 on 19 Jul 2017

@ssshah5 - S3 offers a mechanism for chunked transfer encoding, but it requires that each chunk be individually signed and that the length of the complete object be known beforehand.

jeskew on 19 Jul 2017

👍4

@jeskew — I used to use the block blob upload feature of Azure with signed URI. It was really convenient to upload large file from client side, directly in a storage.

How does it work
1° you generate a signed URI on server side with a write access
2° client split the file, attribute an uuid to every chunk, upload them using the same signed URI
3° client send the list of uuids and Azure re-creates the file based on the chunks sent in 2°

If a well understand your last post, it is not possible on Amazon ? (because every single PUT chunk request must be individually signed).

If it's the case, how to upload big files, from client side, directly to S3, without sending the key to the clients ?

raphaeljoie on 15 Feb 2018

For reference of the Azure API mentioned above, a blob storage client can be created with a presigned url (shared-access signature): https://github.com/Azure/azure-storage-node/blob/master/browser/azure-storage.blob.export.js#L35

ie. and with that, you can do a normal upload which handles the chunking and sending the list of parts automatically: https://github.com/Azure/azure-storage-node/blob/master/lib/services/blob/blobservice.browser.js#L70

absoludity on 27 Apr 2018

Really hope aws-sdk-js can provide this feature.

ithinco on 8 Jun 2018

👍34

Just encountered this; surprised to see there's no way in the SDK to leverage pre-signed URLs.

The reason is so we can offer multi-part upload, from the web browser, but of course keep AWS keys server-side only. How else is this supposed to work? Thanks!

Plasma on 11 Sep 2018

👍12

Agreed. This would be a great feature. I was trying to find a way to do it and found this post. Seems like it is not doable today.

moralesalberto on 16 Sep 2018

Agree, this IS a VERY needed feature. Hope we can see it available soon.

alfadaemon on 28 Sep 2018

I was managed to achieve this in serverless architecture by creating a Canonical Request for each part upload using Signature Version 4. You will find the document here https://sandyghai.github.io/AWS-S3-Multipart-Upload-Using-Presigned-Url/

sandyghai on 11 Oct 2018

👍7 😕5 🎉4 👎3

I was managed to achieve this in serverless architecture by creating a Canonical Request for each part upload using Signature Version 4. You will find the document here https://sandyghai.github.io/AWS-S3-Multipart-Upload-Using-Presigned-Url/

do you have a code example? the instructions aren't really that clear in my case.

delprofundo on 20 Oct 2018

👍10

I was also looking for this and I ended up using STS to generate temporary security tokens for my client locked down to the particular bucket and path that I wanted to give them access to.

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_request.html

I found this video on youtube about it.

https://www.youtube.com/watch?v=4_csSXc_GNU

Perhaps that will help someone looking at this issue.

dusty on 20 Oct 2018

🚀1 👍1

I am on the same boat, my use case is also partial uploads for big files on a JS client side. I want people to be able to resume uploads if they lose their connection, without losing all previously uploaded chunks. And I don't want to expose any credentials (thus not using SDK on client)

~I will update this comment once I solve it.~
UPDATE: following @sandyghai guide, I was able to do it.

There may be syntax errors, as my backend does not use express, but I felt writing it _ala express_ would help other devs understand it easier.

Context: I have an API (behind auth obviously) to which users can send files, and it uploads them to S3. As I didn't want to set IAM for each user of my app, nor put the SDK in the front-end, I decided to go with a back-end authorized approach.

app.post('/upload', (req, res) => {
  let UploadId = req.body.UploadId;

  const params = {
    Bucket: 'my-bucket-name',
    Key: req.body.filename
  };

  // Initialize the multipart - no need to do it on the client (although you can)
  if (req.body.part === 1) {
    const createRequest = await s3.createMultipartUpload(params).promise();
    UploadId = createRequest.UploadId;
  }

  // Save createRequest.UploadId in your front-end, you will need it. 
  // Also sending the uploadPart pre-signed URL for part #1
  res.send({
    signedURL: s3.getSignedUrl('uploadPart', {
      ...params,
      Expires: 60 * 60 * 24, // this is optional, but I find 24hs very useful
      PartNumber: req.body.part
    }),
    UploadId,
    ...params
  });
});

app.post('/upload-complete', (req, res) => {
  let UploadId = req.body.UploadId;

  const params = {
    Bucket: 'my-bucket-name',
    Key: req.body.filename
  };

  const data = await s3.completeMultipartUpload({
    ...params,
    MultipartUpload: {
      Parts: req.body.parts
    },
    UploadId
  }).promise();

  // data = {
  //   Bucket: "my-bucket-name", 
  //   ETag: "some-hash", 
  //   Key: "filename.ext", 
  //   Location: "https://my-bucket-name.s3.amazonaws.com/filename.ext"
  // }
  res.send({
    ...data
  });
});

TL;DR: it is possible, so feel free to close the ticket, IMHO.

tomasdev on 27 Nov 2018

👍3

Hi friends!

I realized that this was a topic that did not have much documentation, so I made a demo repo in case anyone wanted to reference my implementation of multipart+presigned uploads to S3.

https://github.com/prestonlimlianjie/aws-s3-multipart-presigned-upload

prestonlimlianjie on 13 Dec 2018

👍5 🎉1

I am on the same boat, my use case is also partial uploads for big files on a JS client side. I want people to be able to resume uploads if they lose their connection, without losing all previously uploaded chunks. And I don't want to expose any credentials (thus not using SDK on client)

I will update this comment once I solve it.
UPDATE: following @sandyghai guide, I was able to do it.

There may be syntax errors, as my backend does not use express, but I felt writing it _ala express_ would help other devs understand it easier.

Context: I have an API (behind auth obviously) to which users can send files, and it uploads them to S3. As I didn't want to set IAM for each user of my app, nor put the SDK in the front-end, I decided to go with a back-end authorized approach.
app.post('/upload', (req, res) => {
  let UploadId = req.body.UploadId;

  const params = {
    Bucket: 'my-bucket-name',
    Key: req.body.filename
  };

  // Initialize the multipart - no need to do it on the client (although you can)
  if (req.body.part === 1) {
    const createRequest = await s3.createMultipartUpload(params).promise();
    UploadId = createRequest.UploadId;
  }

  // Save createRequest.UploadId in your front-end, you will need it. 
  // Also sending the uploadPart pre-signed URL for part #1
  res.send({
    signedURL: s3.getSignedUrl('uploadPart', {
      ...params,
      Expires: 60 * 60 * 24, // this is optional, but I find 24hs very useful
      PartNumber: req.body.part
    }),
    UploadId,
    ...params
  });
});

app.post('/upload-complete', (req, res) => {
  let UploadId = req.body.UploadId;

  const params = {
    Bucket: 'my-bucket-name',
    Key: req.body.filename
  };

  const data = await s3.completeMultipartUpload({
    ...params,
    MultipartUpload: {
      Parts: req.body.parts
    },
    UploadId
  }).promise();

  // data = {
  //   Bucket: "my-bucket-name", 
  //   ETag: "some-hash", 
  //   Key: "filename.ext", 
  //   Location: "https://my-bucket-name.s3.amazonaws.com/filename.ext"
  // }
  res.send({
    ...data
  });
});
TL;DR: it is possible, so feel free to close the ticket, IMHO.

Do you have front end and back end working code for your solution?

shawnly on 19 Dec 2018

I have already posted the back-end code.

The front-end doesn't do anything especial, just a fetch with method PUT and passing the body binary buffer.

tomasdev on 19 Dec 2018

I have already posted the back-end code.

The front-end doesn't do anything especial, just a fetch with method PUT and passing the body binary buffer.

shawnly on 19 Dec 2018

I have already posted the back-end code.

The front-end doesn't do anything especial, just a fetch with method PUT and passing the body binary buffer.

@tomasdev , thank you very much for your back-end API example code!

Unencumbered by facts, your back-end code suggests that for every individual part in the client, you make a call to your API to get a new signed URL. So seems you do have some relatively specific client-code beyond a standard file upload going on. As @shawnly suggested, it would be helpful to share it.

I think what we all want, though, is a more direct support of S3 Multipart Upload in the browser using a pre-signed URL such that a single signed URL could be used for the entire upload process of a single file regardless of number of parts.

A process that must make an API call to our own API between every chunk would surely slow the otherwise direct S3 upload way down. Kind of defeats some of the genius of "direct upload to S3 from browser".

kudos also to @sandyghai for his work and sharing it with us.

:coffee:

TroyWolf on 7 Feb 2019

👍12

That's not how multipart uploads work, you'd need authentication on each request.

My front-end is within an electron app, so it uses fs to read files in chunk and I can't share it due to legal contracts with my company. But should be doable with a FileReader API Stream like https://github.com/maxogden/filereader-stream

tomasdev on 7 Feb 2019

That's not how multipart uploads work, you'd need authentication on each request.

I understand that. Consider that today, the S3 Javascript SDK supports a multipart upload. It makes it very simple for the user. The user does not have to manage the individual parts--it's hidden in the SDK. But it only works with your actual credentials. The desire, therefore, is for the SDK to have a method that can do the same thing, but accept a pre-signed URL for the auth. If they wanted to, unencumbered by facts, AWS could support a signed URL that authenticates a single file ID and all it's parts.

In the meantime, I am going to try the approach used by you and sandyghai, but with a twist. My thought is to add a "reqCount" param to my custom API that is responsible for making the s3.getSignedUrl() call. I'll go into a loop and generate multiple signed URLs, adding 1 to the part number each time. This way, my API can, for example, do s3.createMultipartUpload(), and return 10 signed URLs to my client--one each for parts 1 - 10. This would cut down my API calls by a factor of 10.

Better yet, it would be trivial to use file.size to estimate how many parts will be needed. This would allow me to initiate the upload and return all signed URLs for all parts in a single request to my custom API.

Of course once all parts are uploaded, I need to make an additional call to my custom API to do s3.completeMultipartUpload().

What do you think of this approach? :coffee:

TroyWolf on 7 Feb 2019

👍19 🎉3

I've just encountered this issue as well, my only solution for now is to use STS to create a temporary set of credentials assuming an upload only role with a policy restricting it to the sole object location. I further went on to add a condition which restricts based on the requesters IP address for added measure.

McSheps on 19 Oct 2019

@TroyWolf Do you have a working code for this ? I am trying to build the exact same thing.

mohit-sentieo on 12 Dec 2019

@TroyWolf Do you have a working code for this ? I am trying to build the exact same thing.

@mohit-sentieo , I did get the solution working exactly as I described! The high-level is:

User selects file for upload
In the browser, I get the file name and size.
I make an API call to my own API, and in my API, I hit the AWS API to start a MPU. Using the file size, I generate all the pre-signed URLs I need--one for each part. My API returns this array of URLs to the browser.
In the browser, I upload each part direct to AWS. Once all parts have uploaded, I hit my own API again to make the AWS call to complete the MPU.

Since the parts can be uploaded in any order, I also developed code that watches the transfer rate and using some basic math, I spin up simultaneous uploads up to 8 at a time--keeping an eye on the transfer rate. If it starts to slow down significantly, I scale back down to say 2 uploads at a time--this is all done in the browser.

The beautiful thing is I don't need any server resources to deal with large files because the file parts are uploaded direct from client browser to S3.

I REALLY wanted to come back here with a tidy code solution to share with the community, but it's a lot of pieces. It would take me many hours to turn it into something I can share, and even then I'm not sure it would be clear enough for most folks. That combined with the fact the demand for this solution is apparently very low--note you and @McSheps are the only ones asking about this here in the last 8 months.

I am willing to help you, though. :coffee:

TroyWolf on 12 Dec 2019

👍7 ❤1

That combined with the fact the demand for this solution is apparently very low--note you and @McSheps are the only ones asking about this here in the last 8 months.

@TroyWolf
Github discourages commenting just for the sake of +1ing feature requests. In my opinion, many people used the +1 reaction on the initial post during the last 8 months, so this feature is still actively demanded.

Your solution is brilliant, though it's very low-level and needs a lot of work and testing to be production-ready. You are basically reimplementing multipart upload strategies to optimize data transfer rates!

In this case, I prefer the STS approach mentioned in earlier posts: just generate a set of temporary credentials for a sub path in the bucket. The youtube video already linked above is quite simple to follow.

But then, you need to authorize a whole folder for each upload, which may be a little more than what you want to allow to your clients, even if you organize them so that existing files are not included in the policy (like, create a folder with a unique name for each upload).

madmox on 13 Dec 2019

👍3

In my opinion, many people used the +1 reaction on the initial post during the last 8 months, so this feature is still actively demanded.

I failed to notice those +1 reactions. You are correct, @madmox

TroyWolf on 13 Dec 2019

Any updates related to this?

ggolda on 10 Mar 2020

👍11

An update as I've learned a lot more about the strategy to upload files direct from browser to S3 bucket using multipart and presigned URLs. I have developed server and client code to support resume as well as parallel chunk uploads in a parts queue. My latest client solution also supports MD5 checksum on all parts.

Previously I shared a strategy where my API call to start the upload returned an array of all the presigned URLs--one for each chunk. However, to support an MD5 checksum on each chunk, you'll make an API call to fetch each presigned URL individually--passing in the chunk's MD5 hash generated in the browser. Otherwise you'd have to read the entire file into browser memory to generate all the MD5 hashes up front.

My original concern about potentially making hundreds of API calls to get presigned URLs individually was unfounded.

https://troywolf.com

TroyWolf on 19 Jun 2020

@TroyWolf could you please share some code examples for your strategy? It seems a very interesting approach for the same problem I got in my project.

danielcastrobalbi on 17 Jul 2020

@danielcastrobalbi and all, Due to client agreements for custom solutions I've developed around this, I can't freely share the source in public at this time. I am free to consult privately or even develop similar solutions, but I'd have to ensure you aren't a direct competitor of my existing file upload clients before we dive in too deep.

Just figuring out the architecture for an "S3 Multipart Presigned URL Upload from the Browser" solution was a pretty daunting task for me. I've tried to outline this previously in this thread, but let me try again--adding in more bits from my experience so far.

I like to think about "3 players" in the mix:

AWS (S3 bucket)
My API (server)
My Client (browser)

While this is an "upload straight from the browser" solution, you still need your own API to handle parts of the process. The actual file uploading, though, does not go through your own API. This is a major advantage of this solution. You won't pay for server CPU, memory and bandwidth to proxy the file into your S3 bucket. In addition, it's typically faster without your server as a middleman in the upload.

A very high-level view of the process:

Client: User selects file(s) for upload in browser. Do this however you want--normal file dialog, folder dialog, drag & drop.
Client: Pass the filename, size, and type to your own API
API: Make the AWS request to start an MPU. Pass the MPU ID to Client
Client: Iterate over the file one chunk at a time. For each chunk, calculate an MD5 hash and call your API with the filename, size, type, md5 hash, part number (chunk sequence), and MPU ID.
API: Generate a chunk-specific, pre-signed URL. This uses the AWS SDK but happens locally--no request out to AWS. Send pre-signed URL back to Client.
Client: Upload the chunk to AWS using the pre-signed URL. The response includes an ETag you need to keep track of for the final step to complete the MPU.
Repeat 4-6 until all parts are uploaded.
Client: Call your own API to complete the MPU. Pass in the array of ETags for the uploaded parts.
API: Make AWS request to complete the MPU.
10: ???
11: Profit

A lot of nitty-gritty between those lines!

Resume is made possible by the fact AWS holds onto the uploaded parts until you either complete the MPU or send a request to delete the MPU. There is an AWS API request that takes an MPU ID and reports back on what parts are already uploaded. To resume the file upload, just start at the next chunk.

You can upload the parts in any order and many in parallel if you want. This is how speed gains can be realized.

Pro tip: Use a bucket trigger to automatically delete incomplete MPUs after some period of time. The hidden danger is that the incomplete MPU parts sit hidden in your bucket costing you storage space and there is no way to see them in the AWS Console UI! You could, for example, tell the bucket to "delete MPU parts that are more than 5 days old".

If anyone wants consulting for this, please feel free to reach out to me at [email protected]. ☕️

TroyWolf on 17 Jul 2020

👍2 😕1

I used this snippet to upload an mp4 file:

https://gist.github.com/sevastos/5804803

Works like a charm, you just have to tweak it as per your requirements

khurram-wasim on 22 Aug 2020

👎1

After so many tries, I figured it out!
I created code snippets for you guys to implement it wherever you would like to (I really like open source but when it is a tiny amount of code I prefer not to use third-party). Hope it could help you 👍
https://www.altostra.com/blog/multipart-uploads-with-s3-presigned-url