Feathers: Add Support For Streams

Created on 29 Jun 2017  路  14Comments  路  Source: feathersjs/feathers

In order to support larger datasets, and have better support for file upload we'd like to have core support for both input and output streams. I don't foresee this being a breaking change.

Why The Change

  • We could support file upload over sockets
  • We can better support looking up large amounts of data in and out
  • We can support larger file upload and image/video streaming much better
  • I'm hopeful we might also gain some performance benefits. Better CPU usage and lower memory footprint when querying for or sending in a lot of data. We'll see...

Proposal

The hook chain is basically a poor man's stream. So why not _actually_ make them streams instead of a promise chain where we "pipe" the hook object from one hook to the next.

What we can do to support streams is have the transport adapters (socket.io, rest, etc.) convert, if necessary, to a stream when data first comes in and when data is serialized and sent out to the client. This could be behind a feature flag when registering the transport adapter if necessary.

const app = feathers();

app.configure(rest({ stream: true }));
app.configure(socketio({ stream: true }));

The rest of the Feathers internals would just process hooks and service adapter calls as streams.

How does this affect service adapters

Service adapters would need to handle streams as an input and return results as a stream. Some of the underlying DB module used inside Feathers service adapters already support streaming (ie. mongoose, mongodb, etc.). If the adapter does not support streams natively we could streamify the results.

An alternative to updating the service adapters might be to have a couple hooks that convert the hook object to and from streams. Along with setting the stream: true flag, this could be included manually by the developer in order to enable/disable streaming on a service. However, it likely would defeat the purpose of using streams as app memory and CPU would likely spike whenever the conversion happens.

How does this affect hooks

I don't think this would require any changes to hooks that people write. They would still return promises and/or the hook object. We already enforce this in Feathers core. Therefore, we would adapt core Feathers such that the hook chains are actually streams of Promise functions under the hood instead of a Promise chain of Promises.

Hook functions will still return promises but we'd "streamify" them and simply .pipe() the hook object through each hook function. Not too sure what this conversion looks like but here is some pseudo code and references to where we might tap in:

// Hooks are still written the same way
const log = (namespace) => (hook) => {
  console.log('I am running in a stream', hook);
  return Promise.resolve(hook);
};

// They are still registered the same way
app.service('users').hooks({
  before: {
    all: [
      log('users')
    ]
  }
});

// create a stream in the transport adapter
const stream = require('stream');
let hookStream = new stream.Readable({ objectMode: true });

// Inside core hooks we create our hook object. We might need to
// turn hook.data and hook.result into streams as well.
// https://github.com/feathersjs/feathers/blob/major/src/hooks.js#L40
// https://github.com/feathersjs/feathers-commons/blob/major/src/hooks.js#L43-L59

// Inside Feathers core when running hooks we streamify them.
// I think this where the majority of the work will happen.
// https://github.com/feathersjs/feathers-commons/blob/major/src/hooks.js#L136-L175

// Pipe the result of a hook back into the stream
// https://github.com/feathersjs/feathers/blob/major/src/hooks.js#L63-L66

What Changes

I believe this can be fully backwards compatible and will only change underlying core modules.

  • We need to add support for I/O streams to each transport plugin. We need to detect when streams are enabled and then convert the incoming data to a stream and pipe the result through the transport.
  • Hook chains would become streams of Promises instead of a chain of promises under the hood.
  • Service adapters would need to be updated to support input streams and return streams as the result, or
  • have hooks that can convert a hook object to a stream and vice-versa.

If we are to proceed, I think the next release (Crow) is the best time to do it especially if it requires service adapter changes because we're looking at changing the hook structure and query syntax at that time. This is likely to be a relatively small change to core and with the Buzzard release we'll have shuffled the core modules around with this in mind.

I don't think this will be a breaking change so I believe it could be rolled out as a patch minor release.

Obviously this is a proposal and likely to change. I've talked with @daffl and a couple users of Feathers outside the core team at length about this but we haven't spiked it out yet. We would :heart: some input from the community and the rest of the @feathersjs/core-team. Feel free to comment or simply give a 馃憤 or 馃憥 or if you're feeling super awesome feel free to take a stab at a prototype implementation.

Related Issues

Some work has already started this process:

Discussion Feature Proposal

Most helpful comment

I ran into this timely and perhaps relevant snippet today:

There was a previous discussion on how do you know when someone is addicted to over-engineering, and looking at this giant UML diagram, I think this might be the case.
The amount of complexity I'm willing to accept is proportional to the the difficulty of the problem. In this case it's manipulating web pages, which shouldn't be too hard. This isn't a knock on React in particular, but it seems all the major vendors are competing on complexity. React has "fibers", Ember has a "virtual machine", Angular has their own overblown architecture (I don't know anything about it, and don't want to).

All 14 comments

Feathers clients would have to support streams for client-side hooks and services. I wonder how large the pollyfill would be.

A minified build of browserify-stream is 85k: https://github.com/marshallswain/stream

Not sure what the difference is, here: https://github.com/creatorrr/web-streams-polyfill, or if it's compatible, but it's 67k.

I want to point out that streams are more than just a chain hook. A stream can divide into several; input data is piped to different streams depending on what's to be done with it. Several streams may pipe into the same stream.

A pure stream implementation of hooks might be:

  • all service calls are passed into the "all" stream,
  • once all the "all" processing is done, the call is piped into one of the "find", "get", "create", etc. streams.
  • these streams pipe into the DB call.

I'm not suggesting this implementation be used, just pointing out that a fuller stream implementation would allow, for example, the "create" hook stream to divide and combine within itself.

@marshallswain there is also:

We might be able to steal some inspiration from:


@eddyystop To some extent that's how I envisioned hooks working once @daffl suggested error and app level hooks. Where they are essentially "forks" in the hook chain and you can get back on the main track if you wanted to. For example,

_I might want to handle errors and log them to specific locations based on severity but only return an actual error to the client if it is severe._

This is a realistic use case if you are calling out to external/internal services to trigger async actions (ie. notifications, emails, etc.) and the core request succeeds but the async action fails. Right now handling this use case is a bit clunky to do even with error hooks.

@ekryski Anything from substack is likely godly.

I'm still wrapping my head around how far stream support can take us. The good: hooks become so powerful they shouldn't be called hooks anymore. The bad: the conceptual load increases and people who are iffy about working with Promises now have to absorb streams.

Would be great to discuss over some (pseudo)code examples on how it could look like. I know streams are nice for array-like objects and binary data but I'm not seeing how exactly it would improve handling the hook chain.

In general I think it would make sense to be able to consume and return streams in a service though (without necessarily changing how hooks work).

I ran into this timely and perhaps relevant snippet today:

There was a previous discussion on how do you know when someone is addicted to over-engineering, and looking at this giant UML diagram, I think this might be the case.
The amount of complexity I'm willing to accept is proportional to the the difficulty of the problem. In this case it's manipulating web pages, which shouldn't be too hard. This isn't a knock on React in particular, but it seems all the major vendors are competing on complexity. React has "fibers", Ember has a "virtual machine", Angular has their own overblown architecture (I don't know anything about it, and don't want to).

Just brainstorming some problems to consider for huge files from personal experience:

  • giving up range requests (seeking to not yet downloaded positions in the video)/hls/dash and these pseudo streaming technics that work really well
  • giving up low processor usage by using a dedicated specialized streaming server/cdn server
  • no outsourcing to azure, amazon, ... for direct file/media delivery without any processing on your platform
  • difficult to implement a strategy for distributing the files across different file servers and cdns. If the disk is full, it's full.
  • problem to direct the stream to the data server with enough available bandwith where the file is also located on disk (multi-server or server+data servers scenario or server+some cloud)
  • compared to simply send a direct URL with an access token to the client and let the client fetch it
  • reinventing the wheel: see tus.io

Pros:

  • i would love streams for medium sized files (up to 20mb maybe) when I know that this application will never scale beyond a one server limit.
  • maybe for streaming webcam recordings upstream
  • actually I love streams in c++ a lot and I have a lot of ideas. All with smaller amounts of data ;)

I add my 2 cents on this as well with https://kalisio.gitbooks.io/krawler as real-world production experiment to mix hooks and streams. Here are some issues we faced:

  • stream API is chunck-oriented, it adds a complexity layer over simple hook functions, you need to at least handle the special case of ending the stream
  • error management is harder, if multiple chuncks of the same stream are processed at the same time in different "stages" of the hook chain what happen when an error is raised can be really tricky
  • a lot of libraries or tasks are not "stream-ready" so that you might loose all the benefit of streaming if in the middle of your hook chain a single hook cannot handle streams
  • streams mux-demux so that you always have to manage a variable set of input/output streams to be generic

I strongly believe hooks and streams to be really different beasts: if you see the hook chain as a processing pipeline then the streams are the data flow while the hooks are the transformation functions applied on this data flow. Ideally the processing functions should be abstracted from the implementation of the dataflow, streams or in-memory buffers, although this is not easy.

I did not provide any working solution but my feeling is that hooks are quite good now and if well implemented stream support should be something that will not break this fundamental functional feature.

Did anything come of this ?

I'm going to close this since I tend to agree with @claustres plus it appears many of the benefits of iterators have been superseded by async/await. It might be possible to do some neat things here by returning an async iterator but plain JavaScript object payloads have been proving a good abstraction that works for any transport protocol and so far there didn't seem to be a compelling use case from a Feathers perspective where streams would have significant benefits (you can always stream larger datasets in normal Express middleware).

I'm a bit disappointed here because it force me to use the rest client and re-authenticate my users with express ( or I'm missing something) to be able to display a video stream on the client.

If the hooks is the main problem here, would it be possible to bypass them completely for simple streams?

Authentication is already possible with the Express authentication middleware.

That's what I did first but for some reasons, it didn't wanted to authenticate with the token that was issued via the socketio client.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

arve0 picture arve0  路  4Comments

rrubio picture rrubio  路  4Comments

RickEyre picture RickEyre  路  4Comments

Vincz picture Vincz  路  4Comments

huytran0605 picture huytran0605  路  3Comments