Multer: Add core stream storage driver

Created on 27 Jun 2016  ·  15Comments  ·  Source: expressjs/multer

So I am working with the "transform file" branch from #356 and I realized that what I really want is just a stream storage driver. Basically it would be one that just doesn't read from the stream at all and relies on you doing that in your handler method. What do you think about this?

Here is basically what I was thinking it would look like:

function StreamStorage (opts) {}

StreamStorage.prototype._handleFile = function _handleFile (req, file, cb) {
  cb(null, {})
}

StreamStorage.prototype._removeFile = function _removeFile (req, file, cb) {
  // Not sure if we would need to flush out the stream or not here...
  cb(null)
}

module.exports = function (opts) {
  return new StreamStorage(opts)
}

Most helpful comment

Hmmm, I actually have some ideas here, I'll try and write them down after work...

All 15 comments

Hmmm, I actually have some ideas here, I'll try and write them down after work...

Okay so during the last couple of months I've been thinking a lot of the api we expose, how it works with the storage modules and how easy it is to consume.

The first sign that things weren't the best they could was when #303 was opened. This was implemented as a storage module even though it just wanted to modify the stream.

We also have a lot of callbacks option that feels a bit too much, e.g. should we really have a callback for the directory, and one for the filename? For some apps that makes sense though, but for some it doesn't.

I have also been looking a bit at how other libs/frameworks does it, and how things are being used generally in the Node.js community.

What I've started to feel is that streams are a very good way of thinking about the "files" that are being uploaded, but that we currently do too much job in making sure that it's as effective as possible. This makes us loose in usability, and also hurts some scenarios. E.g. with the S3 module you can be forced to upload something for it just to be removed because of some kind of error, this costs money. In cases like that I would have preferred it to use some memory or temporary disk space.

Another big use case is doing some kind of validation on the file and then sending it to, e.g. S3. When you decide to add the validations, you can no longer use the multer-s3 module. This, in my opinion, takes a way a lot of the power of multer. It might be easy to get started with, but it doesn't always scale with the user.

What I would like to discuss moving towards, would be an api where we wouldn't give the users the ability to choose a storage engine. Instead we would handle "accepting" the files and storing them temporarily. That could be done in memory or, when the file size hits a threshold, on disk.

We could then expose information about the file together with a stream, that could be consumed however the end user see fit. It could be transformed, stored on disk, validated or uploaded to a third party service.

I imagine the interface looking something like this.

interface File {
  fieldname: string
  filename: string
  encoding: string
  mimetype: string
  size: number
  stream: ReadableStream
}

The user would then be given infinite opportunities to do whatever they want to the file. e.g.

Save it to disk

req.file.stream
  .pipe(fs.createWriteStream('uploads/test.png'))

Encrypt it and upload to S3

req.file.stream
  .pipe(crypto.createCipher('aes-256-gcm', 'test-key'))
  .pipe(uploadFileToS3('test.png'))

Load image, resize it and save to disk

req.file.stream.pipe(concat(buffer => {
  sharp(buffer)
    .resize(128, 128)
    .toFile('uploads/test.png')
}))

We could potentially provide some utility functions to make saving the raw file to disk easier, and we could also speed it up by only moving the file if it's already on disk. This would be similar to the famous php function move_uploaded_file.

multer.saveFileAs(req.file, 'uploads/test.png', err => {
  if (err) return next(err)

  res.send('Image uploaded!')
})

Any thoughts on this would be highly appreciated.

ping @hacksparrow @jpfluger @Fishrock123 @gabeio (I know I missed someone, sorry)

What I've started to feel is that streams are a very good way of thinking about the "files" that are being uploaded

Couldn't agree more, I purposely look for module that expose a stream interface these days.

We could potentially provide some utility functions to make saving the raw file to disk easier

I think that is "nice", but not even required. If we published storage modules separately that enabled users to so these things but documented them really well I think that is as good if not better than bundling that functionality.

I think moving to a default storage driver of stream would solve #174 as well.

Really there is no need of a storage driver in the module if you just move to external storage supported via streams in this module.

The cons to this approach are a higher cognitive load on users because they have to choose their storage mechanism and implement the stream consumer. But this can be solved really well with documentation and a close knit family of support modules.

Simple API is best API™ 😀

I like the suggested idea.

One of the factors that changed the initial relatively simpler API of multer was dealing with security issues. Would the new API take care of the security issues that the current API takes care of?

If we decide on the new API, we have to make the transition to the new API as smooth and informed as possible, so as not to frustrate our users who might be referring to old tutorials or articles.

@hacksparrow do you have links to discussions of the security issues the current api deals with? Just so I can be as informed as possible.

@wesleytodd, it was more of "fill up disk with files" kinda issue. So technically not a security issue, but an issue nonetheless. @LinusU will have a better idea about it.

Technically because the new api is a streaming interface there is no chance of filling the disk, as it is up to the implementor to write the stream somewhere. This would mean that multer could focus on handling multipart form uploads without having to deal with the actual storage part, making this api even simpler, and offloading any other risks onto other modules.

That could be done in memory or, when the file size hits a threshold, on disk.

Now it might become an "eat up your RAM" issue 😁, and there is still a chance that we will write to the disk.

We just gotta be careful, otherwise the API looks good to me.

Now it might become an "eat up your RAM" issue 😁, and there is still a chance that we will write to the disk.

Very true, I thought that is what the limits are for? https://github.com/expressjs/multer#limits

Yes, limits will help, but there are unknowns in the new direction, that we are not aware about, which could cause issues.

Best would be to review the "tangible" API when its ready for review.

I think the uses cases for transforming req.files into a stream are valid. Once you have something, I'd be interested to test it and see how it plays with my implementations.

Now that I think about it.... req.file versus req.files... Would req.file be the stream? And req.files contain resulting meta-data or actual data, if that storage engine allowed for it?

I am not proposing any changes to the req.files/req.file api. It would still be an array of files where each one would be an object with a stream property (or a single file with a stream).

My proposed change would be to add a default storage engine that left the stream as is and didn't consume it for you. That way you could handle the stream in your response handler instead of having to load the data into memory or a file before using it. This is also exactly what @LinusU outlined in his much more detailed response above.

I will put together a PR for this tonight so we can all look/test/discuss.

I see... and thus the property for stream on the file interface. Good idea!

So, part of what makes this all possible is that the stream is actually consumed so that the parser can parse all of the multipart input. So I don't think this idea will actually be functional in any way. I messed a bunch with busboy and multer to try to find a way to stream the input without reading it all into memory like the buffer storage works, and I could not find a way.

So, to get the interface like we discussed above I think we would loose all the benefits of a streaming interface (low mem footprint & not having to write to a temp file). I think for my project I am going to move forward with the transformFile methodology from the other PR. I am not totally happy with it, but I don't think that moving the stream consumption to the middleware will work like I wanted.

Feel free to re-open if you know something that I don't or am missing :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nickretallack picture nickretallack  ·  4Comments

edi picture edi  ·  4Comments

ChristianRich picture ChristianRich  ·  4Comments

tjabdoullah picture tjabdoullah  ·  4Comments

josephstgh picture josephstgh  ·  3Comments