In order to support larger datasets, and have better support for file upload we'd like to have core support for both input and output streams. I don't foresee this being a breaking change.
The hook chain is basically a poor man's stream. So why not _actually_ make them streams instead of a promise chain where we "pipe" the hook object from one hook to the next.
What we can do to support streams is have the transport adapters (socket.io, rest, etc.) convert, if necessary, to a stream when data first comes in and when data is serialized and sent out to the client. This could be behind a feature flag when registering the transport adapter if necessary.
const app = feathers();
app.configure(rest({ stream: true }));
app.configure(socketio({ stream: true }));
The rest of the Feathers internals would just process hooks and service adapter calls as streams.
Service adapters would need to handle streams as an input and return results as a stream. Some of the underlying DB module used inside Feathers service adapters already support streaming (ie. mongoose, mongodb, etc.). If the adapter does not support streams natively we could streamify the results.
An alternative to updating the service adapters might be to have a couple hooks that convert the hook object to and from streams. Along with setting the
stream: trueflag, this could be included manually by the developer in order to enable/disable streaming on a service. However, it likely would defeat the purpose of using streams as app memory and CPU would likely spike whenever the conversion happens.
I don't think this would require any changes to hooks that people write. They would still return promises and/or the hook object. We already enforce this in Feathers core. Therefore, we would adapt core Feathers such that the hook chains are actually streams of Promise functions under the hood instead of a Promise chain of Promises.
Hook functions will still return promises but we'd "streamify" them and simply .pipe() the hook object through each hook function. Not too sure what this conversion looks like but here is some pseudo code and references to where we might tap in:
// Hooks are still written the same way
const log = (namespace) => (hook) => {
console.log('I am running in a stream', hook);
return Promise.resolve(hook);
};
// They are still registered the same way
app.service('users').hooks({
before: {
all: [
log('users')
]
}
});
// create a stream in the transport adapter
const stream = require('stream');
let hookStream = new stream.Readable({ objectMode: true });
// Inside core hooks we create our hook object. We might need to
// turn hook.data and hook.result into streams as well.
// https://github.com/feathersjs/feathers/blob/major/src/hooks.js#L40
// https://github.com/feathersjs/feathers-commons/blob/major/src/hooks.js#L43-L59
// Inside Feathers core when running hooks we streamify them.
// I think this where the majority of the work will happen.
// https://github.com/feathersjs/feathers-commons/blob/major/src/hooks.js#L136-L175
// Pipe the result of a hook back into the stream
// https://github.com/feathersjs/feathers/blob/major/src/hooks.js#L63-L66
I believe this can be fully backwards compatible and will only change underlying core modules.
If we are to proceed, I think the next release (Crow) is the best time to do it especially if it requires service adapter changes because we're looking at changing the hook structure and query syntax at that time. This is likely to be a relatively small change to core and with the Buzzard release we'll have shuffled the core modules around with this in mind.
I don't think this will be a breaking change so I believe it could be rolled out as a patch minor release.
Obviously this is a proposal and likely to change. I've talked with @daffl and a couple users of Feathers outside the core team at length about this but we haven't spiked it out yet. We would :heart: some input from the community and the rest of the @feathersjs/core-team. Feel free to comment or simply give a 馃憤 or 馃憥 or if you're feeling super awesome feel free to take a stab at a prototype implementation.
Some work has already started this process:
Feathers clients would have to support streams for client-side hooks and services. I wonder how large the pollyfill would be.
A minified build of browserify-stream is 85k: https://github.com/marshallswain/stream
Not sure what the difference is, here: https://github.com/creatorrr/web-streams-polyfill, or if it's compatible, but it's 67k.
I want to point out that streams are more than just a chain hook. A stream can divide into several; input data is piped to different streams depending on what's to be done with it. Several streams may pipe into the same stream.
A pure stream implementation of hooks might be:
I'm not suggesting this implementation be used, just pointing out that a fuller stream implementation would allow, for example, the "create" hook stream to divide and combine within itself.
@marshallswain there is also:
We might be able to steal some inspiration from:
@eddyystop To some extent that's how I envisioned hooks working once @daffl suggested error and app level hooks. Where they are essentially "forks" in the hook chain and you can get back on the main track if you wanted to. For example,
_I might want to handle errors and log them to specific locations based on severity but only return an actual error to the client if it is severe._
This is a realistic use case if you are calling out to external/internal services to trigger async actions (ie. notifications, emails, etc.) and the core request succeeds but the async action fails. Right now handling this use case is a bit clunky to do even with error hooks.
@ekryski Anything from substack is likely godly.
I'm still wrapping my head around how far stream support can take us. The good: hooks become so powerful they shouldn't be called hooks anymore. The bad: the conceptual load increases and people who are iffy about working with Promises now have to absorb streams.
Would be great to discuss over some (pseudo)code examples on how it could look like. I know streams are nice for array-like objects and binary data but I'm not seeing how exactly it would improve handling the hook chain.
In general I think it would make sense to be able to consume and return streams in a service though (without necessarily changing how hooks work).
I ran into this timely and perhaps relevant snippet today:
There was a previous discussion on how do you know when someone is addicted to over-engineering, and looking at this giant UML diagram, I think this might be the case.
The amount of complexity I'm willing to accept is proportional to the the difficulty of the problem. In this case it's manipulating web pages, which shouldn't be too hard. This isn't a knock on React in particular, but it seems all the major vendors are competing on complexity. React has "fibers", Ember has a "virtual machine", Angular has their own overblown architecture (I don't know anything about it, and don't want to).
Just brainstorming some problems to consider for huge files from personal experience:
Pros:
I add my 2 cents on this as well with https://kalisio.gitbooks.io/krawler as real-world production experiment to mix hooks and streams. Here are some issues we faced:
I strongly believe hooks and streams to be really different beasts: if you see the hook chain as a processing pipeline then the streams are the data flow while the hooks are the transformation functions applied on this data flow. Ideally the processing functions should be abstracted from the implementation of the dataflow, streams or in-memory buffers, although this is not easy.
I did not provide any working solution but my feeling is that hooks are quite good now and if well implemented stream support should be something that will not break this fundamental functional feature.
Did anything come of this ?
I'm going to close this since I tend to agree with @claustres plus it appears many of the benefits of iterators have been superseded by async/await. It might be possible to do some neat things here by returning an async iterator but plain JavaScript object payloads have been proving a good abstraction that works for any transport protocol and so far there didn't seem to be a compelling use case from a Feathers perspective where streams would have significant benefits (you can always stream larger datasets in normal Express middleware).
I'm a bit disappointed here because it force me to use the rest client and re-authenticate my users with express ( or I'm missing something) to be able to display a video stream on the client.
If the hooks is the main problem here, would it be possible to bypass them completely for simple streams?
Authentication is already possible with the Express authentication middleware.
That's what I did first but for some reasons, it didn't wanted to authenticate with the token that was issued via the socketio client.
Most helpful comment
I ran into this timely and perhaps relevant snippet today:
There was a previous discussion on how do you know when someone is addicted to over-engineering, and looking at this giant UML diagram, I think this might be the case.
The amount of complexity I'm willing to accept is proportional to the the difficulty of the problem. In this case it's manipulating web pages, which shouldn't be too hard. This isn't a knock on React in particular, but it seems all the major vendors are competing on complexity. React has "fibers", Ember has a "virtual machine", Angular has their own overblown architecture (I don't know anything about it, and don't want to).