Meteor-files: Need your word for v2.0

Created on 13 Dec 2016  路  29Comments  路  Source: veliovgroup/Meteor-Files

Hello everyone.

Today I would like to ask community to give a right path for Meteor-Files 2.0. Any feature requests, suggestion and ideas is highly appreciated.

Lately this lib gained an attention, thank you to everyone who has a chance to use is. Lots of contributors have made this lib much better.

Now it's time for v2.0, time to move from CoffeeScript to ES6, time to remove all obsolete code.

Here is some of my thoughts and questions to you:

  1. As HTTP upload shows itself as faster, stable and more efficient than DDP (WebSockets), should we drop support for DDP? And put more focus on efficient and asynchronous uploads via HTTP? I've been playing with uploads directly from Web Workers - no more UI freezes, 1.5x speed increase, POST uploads from 3rd parties. To find out more about upload transports take a look on this article.
  2. Drop dependency from MongoDB. A lot of questions and issues about meta data, additional fields, etc. - This looks like bottleneck, which ties developers hands, and forces to rely on MongoDB, and to create records in strict schema. How about to put all control in hands of developer, and let you to decide where you should store file and how to describe its metadata? On Client you call upload method, and on Server you receive everything you need at afterUpload hook. This one may also help to move this lib to NPM. Another good option is to split this lib into two - one which manages files as collections, and another which handles uploads.
  3. Protected files and uploads are also popular topic here. Need to standardise authentication method. Looking forward old friend Authorization header with current Meteor token for uploads. For file requests (download, display, play) it's still unclear. As discussed here we can not rely on cookies as they are not accessible at various circumstances, especially on production with mobile apps. Authentication headers is not under control when you displaying file in HTML. Get query is not safe, HTTPS (SSL) partly solves get query parameters security.
  4. Is any of developers uses public files? Like when it served by proxy-server nginx, apache, etc.
  5. What API do you prefer/most use?:

    • Callbacks

    • Events

    • Promises

    • Other?

As always main idea to keep this project flexible as possible, by providing thin API to work with files and uploads without overhead.

SECURITY enhancement help wanted

Most helpful comment

I vote to keep ddp.

All 29 comments

@dr-dimitru this looks like a very solid feature list and we are looking forward to see 2.0 already.

  1. I didn't really work with these internals of MF, faster and POST sounds great of course, but web worker sound like a potential legacy-support problem case.
  2. this sounds very good, we have had to work around some parts of the MF database organization a couple times (bugs with the schema) and would've preferred to embed the meteor files documents at times and couldn't. I think current behaviour, or something similar, should be available as an implementation that users can choose to use or instead configure themselves. I imagine something like a metadataHandler object/interface. You could have a class that accepts constructor arguments like the database name and some of the current 'global' constructor arguments and creates a Mongodb-backend.

This reminds me of something i would like to mention: the current API is rather opaque and inconsistent in naming and parameters. It is workable because the API is documented well enough, but I think structuring the API into different parts would greatly help this: MeteorFiles.Collection(storage, metadata, options), MeteorFiles.StorageBackend and MeteorFiles.MetadataBackend or something like that would make it much easier to grasp and extend in single places by grouping the callbacks and configuration options by purpose and defining clear interfaces.

If 2.0 is going to break backwards compatibility anyway, I would suggest fixing some of the naming issues in the same sweep. For example there is no clear style for callback options in the configuration, there are some names like on(Before/After/Initiate)*, then there are downloadCallback and namingFunction and finally some properties that can be values or functions creating them: protected, interceptDownload... This is very confusing because all of them have different structures; some are called function or callback and obviously should be functions, some are verbs and don't even sound like they should be set by the user (interceptDownload) and some are adjectives and don't sound like they could be scripted.

  1. We talked about this in the other issue, I don't think it needs a comment here
  2. We don't use public currently because all of our files are controlled with rather strict permission tests (or should be, the authentication issue is relevant here). I think it is still a useful feature. In my proposal above it could very easily be made possible by an alternative StorageBackend implementation that returns a link built according to a user function and saves files at a path supplied like that.

@s-ol thank you for feedback

Agree about mess with naming through library. This project started more than two yeas ago, when it was less than 200 lines. Both v1.5.0 and v1.6.0 was attempts to rewrite lib from the scratch, but due to backward compatibility all naming was saved. You can participate in reviewing 2.0 once its branch will be pushed.

And thanks for all contributions from your end, during this year.

Please add a TypeScript definitions file. This will also make it easier - even for non-TypeScript users - to better understand the structure and usage of the package. Thanks in advance!

@Avijobo Thank you.
Here is something you can start with: https://github.com/VeliovGroup/Meteor-Files/issues/226

Just found the package (yep, looking to get rid of CFS :P).

As the (extreme) modularity fan I am, I don't see the point in splitting the lib in tow (files VS handling uploads).

I mean: does anyone on Earth need a file collection if his app is not allowing to upload files?

PS: May I understand from issue #51 that cluster is supported? That's the main reason we're moving away from CFS (although we'll set a files microservice in another machine this time). Nop, we don't have any experienced devop in the team :P

@luixal I'm not sure I understand what you mean:

As the (extreme) modularity fan I am, I don't see the point in splitting the lib in tow (files VS handling uploads).

files vs handling uploads? I don't understand what the 'file' part of that separation would be. What I am proposing is to split between

  • __Data Storage__: where and how the data is written to (local FS, local DB, remote CDN...), where and how to access and link the data (remote URL, stream from file, stream from remote URL...),
  • __Metadata Storage__: where and how the metadata is stored and accessed (seperate Mongo Collection, seperate Mongo Collection w/o versions, embedded Collection, other DB) and I guess
  • __Data Transfer__: how the data is transported from client to server (DDP, WS, HTTP POST...).

Not all of these things need to be strictly seperate, and not all of them need to be very extensible (I didn't suggest to abstract the Data Transfer layer, though it would be possible), but I think especially the two types of storage are important to be patchable conveniently, any average-sized app will have to touch them eventually.

I mean: does anyone on Earth need a file collection if his app is not allowing to upload files?

no, but there are tons of use cases for different storage and metadata-storage backends (and the separation in between).

For example:

  • embedding the metadata inside another collection (very strong point considering mongo's limitations in joins etc.)
  • storing and retrieving the files from remote hosting services like S3 etc.
    yes, this is possible right now by specifying two callbacks, overloading a class method and breaking the schema in at least two places, and you still need to manually delete the files after you have uploaded them. This could, and should be a lot cleaner to do, especially when you start to integrate it a bit deeper with application logic.
  • doing that for multiple collections that only have the backend in common (you can create a custom constructor, but that's pretty much the same as implementing a tiny backend interface and less clean and reusable anyway)
  1. Drop it, but keep it possible to add other protocols later on.
  2. Yes.
  3. Yes.
  4. All our files a restricted by default, imho public and protected files should coexist.

I think that this is heading into the very right direction. After about one year with this great library we'd love to see a more modularized solution written in ES6. From a developer's perspective this project should primarily aim on the heavy lifting e.g. getting data from client to server and vice versa, this must of course include authorization and later on maybe sophisticated solutions for different protocols like webRTC etc.
In addition all we need are pre- and post-hooks, where context must contain the meta data, every hook can cancel further progress and data consistency is automatically maintained while transfers are in progress (to avoid orphaned data left on storage).

@menelike

In addition all we need are pre- and post-hooks, where context must contain the meta data, every hook can cancel further progress and data consistency is automatically maintained while transfers are in progress (to avoid orphaned data left on storage).

Right, agree what all hooks, callbacks, events should share same context with .abort() method and meta-data.

Btw what API do you prefer/most use?:

  1. Callbacks
  2. Events
  3. Promises

@dr-dimitru we are using react client-side and try to use a data-driven approach throughout the whole project, so currently we use promises. When I get back to work tomorrow I might be able to provide you with a brief overview of how we currently use MF1.0, that might give you a good idea of our view and use cases.

I don't think events are appropriate for what is a very linear process and callback chains are mostly just 'dumb' promises so I do believe they might be the objectively best here.

@s-ol thank you, waiting for more info from your end.

I guess you were talking about client. As on server it is not a linear process it's just events of incoming data, requests, etc...

@dr-dimitru no, i was actually meaning the server, but I didn't explain well what I meant. Meteor Files so far does not have a process like I described, but we added that to it in our usage idiom and I hope I can show you how and why soon. But basically the process encompasses data validation, processing (thumbnails, storing size metadata) and kicking of database work in other collections.

I think callbacks may be more _standard_ (that's why I would go with callbacks. That and supporting wrapAsync as in the issue you have referenced above), but we are using promises more often lately. They are quite similar (IMHO), and your projects provide good documentation (thanks for that!), so any of them would be a good solution.

Maybe you could choose one way to go at first and add the others (or ask for PR) later on :)

So here's a couple of "problems" we faced in our up/download api programming with meteor files:

  1. we need to generate thumbnails of various sizes (conditionally)
  2. some images have EXIF rotation which is problematic for displaying on websites
  3. we need to know metadata (mostly image dimensions) in application logic
  4. we need to create and update associated data in the files collection and our own collections
  5. we need store data in the azure cloud

most of these things are fairly straightforward to do and documented in the meteor-files wiki, but it becomes a lot of boilerplate if you have a couple of different collections.

1., 2. and 3. we do via graphicsmagic and originally plainly in the onAfterUpload like it is recommended on the wiki too. However using the asynchronous gm library to autorotate, write back, loop through all sizes, write back etc. becomes a lot of callbacks with a lot of things that can go wrong. In the face of this we started using promises. Our intermediate solution looked something like this:

// because it's tedious to collect all those 'err' arguments
export const rejectErrArg = (reject, callback) => (err, result) => {
  if (err) reject(err);
  else callback(result);
};

export const autoOrient = docRef => new Promise((resolve, reject) => {
  if (!docRef.isImage) resolve(docRef);
  gm(docRef.path).autoOrient().write(docRef.path, rejectErrArg(reject, () => resolve(docRef)));
});

export const resize = (newSize, quality = 85) => docRef => new Promise((resolve, reject) => {
  if (!docRef.isImage) resolve(docRef);
  if (docRef.versions.original.meta.width <= newSize && docRef.versions.original.meta.height <= newSize) {
    resolve(docRef);
    return;
  }
  gm(docRef.path)
    .format(rejectErrArg(reject, format => {
      let { type, extension } = docRef;
      const newPath = `${docRef.path}_${newSize}.${docRef.extension}`;
      gm(docRef.path)
        .resize(newSize, newSize, '>')
        .quality(quality)
        .write(newPath, rejectErrArg(reject, () => {
          gm(newPath).size(rejectErrArg(reject, dimensions => {
            const { width, height } = dimensions;
            // store metadata in docRef.versions
            resolve(docRef);
          }));
        }));
    }));
});
function onAfterUpload(fileRef) {
        autoOrient(fileRef)
          .then(size)
          .then(createSketch)
          .then(resize(docRef, 64))
          .then(resize(docRef, 128))
          .then(resize(docRef, 256))
          .then(resize(docRef, 512))
          .then(docRef => {
            this.collection.update(
              { _id: docRef._id },
              {
                $set: {
                  versions: docRef.versions,
                  // other metadata we need to store
                },
              }
            );
          })
          .catch(err => {
            console.error(err);
            throw new Meteor.Error(err);
          });

        // some other post hooks: insert into referenced collection
    }

this is how we used it for quite a while, but we couldn't really reuse as much of this code as we wanted to and when we started to work on 5 (azure cloud storage integration) we wanted to make the whole thing more modular for various reasons. For one we wanted to be able to reuse the azure handling code in multiple collections, but we also needed the ability to fall back to an equally supported local filesystem solution onDemand (for debugging / local development purposes).

Integrating azure was very straightforward by following along the lines of the examples in the wiki. What we found great is having access to the node HTTP response / stream object. For some implementation reasons we do not want to use public CDN URLs to our files and instead loop the response through our meteor instance. Access to the http.response stream object means we dont have to deal with chunking and other I/O ourselves and can let the nodejs implementation take care of it:

function interceptDownloadAzure(http, fileRef, name) {
  const { versions, meta: { azureContainer } } = fileRef;
  const { azureBlobId } = versions[name] || {};

  if (containerName && blobId) {
    blobService.getBlobToStream(azureContainer, azureBlobId, http.response, logError);
    return true;
  }

  return false;
};

In the development of all this we realized that all of our data processing followed the same process pattern, with different steps according to the type of data:

media: upload -> fix EXIF rotation -> store size metadata -> select first frame for gifs -> generate thumbnails -> [upload to azure -> delete local files -> save azure access metadata to collection] -> post message
avatars: store size metadata -> check size against avatar conventions -> generate thumbnails -> [upload to azure -> delete local files -> save azure access metadata to collection]-> update profile

in the end we came up with this concept:


/*
 * creates a Meteor-Files Collection that transforms documents using a Chain of callbacks
 *
 * additional Options:
 * - preChain: an Array of callbacks [(docRef) => docRefOrPromise] to run before files are stored
 * - postChain: an Array of callbacks [(docRef) => docRefOrPromise] to run after files are stored
 * - onTransformFailed: a Callback [(docRef, err) =>] to run when a Chain is rejected
 *
 * the docRef is supposed to be modified (preferrably) in-place in the preChain. After the preChain,
 * the document's 'versions' field will automatically be updated in the Collection.
 * If the document is changed in the postChain, updates must be applied manually.
 *
 * NOTE: onAfterUpload must only be overwritted with caution, needs to run pre/postChain manually
 */
export class LocalTransformCollection extends FilesCollection {
  constructor(options) {
    const { preChain = [], postChain = [] } = options;

    // eslint-disable-next-line no-param-reassign
    options.onAfterUpload = options.onAfterUpload || function onAfterUploadStorage(fileRef) {
      if (Meteor.isServer) {
        const chain = [...preChain];
        chain.push(function updateDoc(docRef) {
          const { _id, versions } = docRef;

          this.collection.update(
            { _id },
            { $set: { versions } }
          );

          return docRef;
        });
        chain.push(...postChain);

        reducePromise(fileRef, chain.map(f => f.bind(this))).catch(err => {
          logError(err, 'in Storage chain');
          if (options.onTransformFailed) options.onTransformFailed(fileRef, err);
          throw new Meteor.Error(err);
        });
      }
    };

    super(options);
  }
}

and an alternate version:

/*
 * like LocalTransformCollection, but:
 * - uploads to azure blob storage
 * - automatically saves 'meta.containerName' aswell
 * - removes 'path' from all versions
 * - calls 'unlink' on document
 */
export class AzureTransformCollection extends TransformCollection {
...
}

what I like about this approach:

  • it is very modular and reusable, and quite transparent
  • error handling is straightforward
  • asynchroneous handling works well

things im not quite happy with:

  • metadata changes is a bit of a pain. every step that changes something in docRef needs to either update the collection directly or be followed by some other step that collects changes and updates them and such crossdependencies are pretty hard to see in the code.
  • MF limiting us to use a seperate collection and schema for files the way it does/did pushed our database layout in a direction I am personally not completely happy with

these both fall into point 4 from my list above I guess, and I hope you can address them in MF 2.0, as you have already mention them in your original questions in this thread.

Also note that I don't want to say this TransformCollection is the way to go. It is our solution to our personal set of problems with the limitations and interfaces of Meteor Files.

I hope this helps with your design considerations and I am looking forward to see what you come up with :)

@s-ol reviewing the comments here, I realize I didn't write back to yours (as I thought I'd done). Your case is much more complicated than ours (we're basically using a microservice that receives, stores and serves files, nothing else, no transformations and no external integrations for now), but I totally agree with one of your points:

embedding the metadata inside another collection (very strong point considering mongo's limitations in joins etc.)

That would be great. Also, I would keep it transparent to anyone who doesn't need to work with metadata separately (kind of a get helper that would return all info when asking for a file).

I can't tell about other points as we're not facing those problems right now.

@s-ol , @luixal thank you for additional and very detailed info.

I'm started API planning here any thoughts, comments and suggestions is highly appreciated. Naming was an issue in v1, here is a chance to make it better :)

All callbacks has camelCased names, and starts with on prefix, like onEventName, while events has no prefix .on('eventName', () => {/*...*/});.

Also here is idea to have global hooks on client (on Server all hooks is global by default), like from second column "Client Upload Instance Hooks", - here

The problem between events and callbacks, as you can't really return something from event, like onBeforeRemove is callback as you need to have possibility to return false to abort removal.

Here is some more questions:

  1. Should we rename Client's .insert() to .upload()?
  2. What naming is better onUploaded or onEnd (for events - uploaded, end) ?
  3. Recently @marcuslma asked about update method - have you ever needed to replace already uploaded file by the new one?
  4. Have you ever used the piping on the Client? If so, what for?

Dropping Mongo dependency:

As we're going to omit Mongo Collection dependency in v2, the FileCursor and FilesCursor will gone too. The basic idea - is to return path, URI to file and other metadata to onUploded hook and let to you decide what you're going to do with this data. You can store it to separate collection as it was in v1, or you may just to extend existing docs in other collections.

Like if you upload Avatar for user's profile, you will just update user's document. What do you think?

From my nooby point of view (I've just play around a little bit with meteor-files):

  1. Naming it insert keeps the standard name with mongo collections but, as you're removing that dependency, maybe upload is a better description (in the end, what you're really doing from the client is _uploading_ a file). I'll vote for upload and maybe keeping insert as kind of an alias.
  2. I'd keep name consistency with point 1: onUploaded (if you finally keep the insert, I'd call it onInserted.
  3. No. I thought that was the point about having file versions :)
  4. Haven't used it, but seem useful.


    About what to be returned when uploading/inserting a file, it would be great to have the full path on the returned object. I has to switch tasks and stopped working with this a week ago, but I was at that point, how do you get that url now?


    Important: I'm doing the upload through an API built with restivus so I need to block until the file is stored and return the response with that download link in it.
  1. If there is no real use-case for DDP uploads, it can be dropped

  2. I am eager to that. I only want to upload my files.

  3. 驴what about signed urls like in nginx? with DDP you get a signed url that allows you to get the file. It is valid both, upload and download.

  4. I think it is not necessary, if you can copy your file elsewhere in the FS, you can place it in a public served directory.

  5. callbaks and events should be the standard in node.

Thanks.

Hello @hacknlove ,
Thank you for your point of view.
Here is my comments:

  1. Right, HTTP seems the right way now
  2. Okay, most of us agree on this, but again how will you generate downloadable URLs? Especially if authorization is required to get access to "protected" files?
  3. Well, I've went through "signed urls" term. And I may assume authorization via get-parameter is fits it. As token is unique per user per session
  4. It's true what you're free to move uploaded files around, again to n.2 how you will manage URLs to the files?
  5. Isn't "Promises" is promising feature in node? You may not answer this, I'm agree about callbacks and events

@dr-dimitru just saw that it seems like I didn't reply to point 5 above; Promises are definitely the direction we are headed.

As we're going to omit Mongo Collection dependency in v2, the FileCursor and FilesCursor will gone too. The basic idea - is to return path, URI to file and other metadata to onUploded hook and let to you decide what you're going to do with this data. You can store it to separate collection as it was in v1, or you may just to extend existing docs in other collections.

While this is great for our - and most more complex - use cases, i think there should at least be some optional utility layer that has similar functionality to MF now, since many projects just need a very basic API to store files for them and don't want to worry about figuring out the whole data schema again I would believe.

Hi @s-ol ,

Thank you for update and opinion on the Promises.

  1. Speaking of dropping dependency from Mongo. I'm agree on what most devs looking for "drop-in" solution
  2. We can make collection dependency optional for fast prototyping
  3. Also I thought to move package to NPM, and similar to JoSk package developers can use .rawCollection() to pass Collection, but I'm still not sure how this will work on the Client... I guess it will be plain Meteor.Mongo instance, which will be updated/managed on server and updated synced with Client

@dr-dimitru

Sometimes I will copy the files to some folder served with NGINX and sometimes I am going to upload the files to S3
Bothways I am using signed urls to get protected files

To get the signed url I will call some meteor method to get the url server-side signed.

nginx configuration example:

  location ~ /private/ {
    secure_link $arg_md5,$arg_expires;
    secure_link_md5 "$secure_link_expires$uri some secret";

    if ($secure_link = "") {
    return 403;
    }

    if ($secure_link = "0") {
    return 410;
    }
    try_files $uri =404;
  }

urls signation example

const secret = 'some secret'
const crypto = require('crypto')
const generateSecurePathHash = function (url, expires, secret) {
  if (!url || !expires || !secret) {
    return undefined
  }
  return (new Buffer(crypto.createHash('md5').update(expires + url + ' ' + secret).digest()).toString('base64')).replace(/=/g, '').replace(/\+/g, '-').replace(/\//g, '_')
}
var nginxsign = function (url, secret, timeout) {
  var t = Date.now() + timeout
  return url + '?m=' + generateSecurePathHash(url, t, secret) + '&t=' + t
}

var s3sing = function (path) {
  return s3.getSignedUrl('getObject', {
    Bucket: 'dakapp',
    Key: path,
    Expires: 60 * 60 * 4
  })
}

Apropos of signed urls:
Yes, the signature goes in the url. But they have an expiration date.

In NGINX you can improve the security adding the IP or some header into the signature just do secure_link_md5 "$some_nginx_variable$secure_link_expires$uri some secret";
Sadly it is not easy get the client ip in a Meteor callback

There are lots of ways to deal with url creation.

I think each developer should choose the one that fits best with their needs.

Sometimes you have enough knowing the _id of the related object (a post for instance) to get the main image, doing https://example.com/images/post/{{_id}}.jpg

Some times, you need to store the filename in a field of the document you are showing, or the fullurls if you allow to link external resorces...

Just a comment: if dropping Mongo integration is in the roadmap, maybe another package (meteor-files-mongo-collections or similar) could be the options. Adding both packages you get the full solution if you're working with mongo.

Main problem here is avoiding to be the package hell CollectionFS is right now.

I'm currently trying to understand cookie usage in an app which uses Meteor and Ostrio files.

The cookie aspect looks problematic, as it reuses the same security token as is kept in localStorage for Meteor, so exposing that token via failings of both localStorage and cookies, so removing the use of cookies would seem a big security gain, as hardly any deployments secure cookies correctly. Is there some specific reason for doing file uploads this way, I'm presuming old versions of IE or some such.

Old wisdom: https://blog.meteor.com/why-meteor-doesnt-use-session-cookies-e988544f52c9

@SimonWaters I don't know about uploads, but we had issues with downloads on Cordova that prompted some investigation. In the end, for downloads, it comes down to the fact that cookies are just about the only data you can transmit in an <img> tag (without changing the URL for each user, which is a viable path IMO but complicates things a bit between data layer and front-end).

I think the authentication runs piggy-backs off the meteor tokens so that the leases don't have to be manually controlled by the app and you can't have weird situations where you are logged in but mf thinks you aren't or the other way around.

Had quick discussion with JS guru here. We are using data URLs in authenticated requests to deliver images in non-Meteor app over web sockets, but that is largely because we have other constraints on how images are handled rather than a preference for data URLs. However feels to me if you have to use cookies then perhaps the localStorage route was a mistake. Cookies have a horrid security model, but simply sticking the same value in both places feels like the wrong solution.

Hi @SimonWaters ,

  1. Session token stored in localStorage used internally by Meteor
  2. We sync cookie with token stored in localStorage to have access to token in HTTP headers
  3. If you're using HTTPS - headers (including cookies), path, query and transferred data is encrypted

Cookies want a secure flag and/or HSTS with include subdomains directive (pragmatically you want both because without HSTS users will click through warnings). I have plenty of practice securing them (and stealing them, and changing them). But the point of Meteor localStorage was to avoid the machinations of securing cookies and get a mechanism that uses the URI scheme in the same origin policy by default. I thought the proposal above was to lose the cookies, or has that gone?

I thought the proposal above was to lose the cookies, or has that gone?

Yes, by appending session token to the uri, which looks less secure, but by the fact is the same, will be token in a Header or URI Query - doesn't changing situation much.

WDYT?

I vote to keep ddp.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rlhk picture rlhk  路  4Comments

ck23onGithub picture ck23onGithub  路  3Comments

stefanve picture stefanve  路  4Comments

menelike picture menelike  路  3Comments

tuarrep picture tuarrep  路  4Comments