The Readable and Writable objects are not well encapsulated: I can inject data into a Readable stream by calling Readable#push, and I can read data from a Writable stream by monkey-patching Writable#_read.
Further, these interfaces are not symmetrical. If I want to make data available on a readable side, why not use the write API? This is how it works on every other application: If a client creates a TCP stream, there's one "writable" side, and one "readable" side; yet if I want to make both sides in the same process, I have to use a different API, for some reason.
the Node.js stream library should include DuplexPair and SimplexPair objects, or equivalent factory functions.
interface SimplexPair {
Readable readable;
Writable writable;
}
interface DuplexPair {
Duplex client;
Duplex server;
}
SimplexPair creates two different but related objects, one Readable and one Writable; anything written to the Writable side is made available on the Readable side. For encapsulation, the objects would not have public references to each other, and there would be no way to make data available on the readable side without access to the writable side.
DuplexPair is the same, but both sides are writable and readable.
This paradigm should not be new to Node.js developers; a PassThrough stream is just a special case of SimplexPair where both sides are exposed on a single Duplex object.
Transform streams generate two pairs, and returns one from each:
function ROT13Pair(){
const input = new SimplexPair;
const output = new SimplexPair;
input.readable.on('data', function(buf){
output.writable.write(buf.toString().replace(/[a-zA-Z]/g, function(c){
const d = c.charCodeAt(0) + 13;
return String.fromCharCode( ((c<="Z")?90:122)>=d ? d : d-26 );
}));
});
return {
writable: input.writable,
readable: output.readable,
};
}
const { writable, readable } = new ROT13Pair;
process.stdin.pipe(writable);
readable.pipe(process.stdout);
This pattern is repeatable to any level:
function ROT26Pair(){
const input = new ROT13Pair;
const output = new ROT13Pair;
input.readable.pipe(output.writable);
return {
writable: input.writable,
readable: output.readable,
};
}
const { writable, readable } = new ROT26Pair;
process.stdin.pipe(writable);
readable.pipe(process.stdout);
... although I'm not sure how practical this particular example would be in production.
I would bet this style could also result in a modest performance improvement, since much of the logic around buffering and flow control could be re-implemented.
/cc @nodejs/streams
Without voicing a strong preference about whether this should be the “default” stream paradigm, I’d be okay with exposing the DuplexPair code that we already ship as an internal utility anyway.
@awwright As far as I am concerned, feel free to open a PR if you like – the current implementation of DuplexPair is in lib/internal/streams/duplexpair.js (and there is one in test/common/duplexpair.js that could be removed if we make the functionality a public API). Aside maybe from documentation, the PR wouldn’t have to be complex, and it’s usually easier to have a debate over a concrete suggestion.
@addaleax Sure, I'll be doing that in the near future (exposing the current DuplexPair implementation, and adding a SimplexPair). Feedback on naming and such is welcome first.
Uh, the transform examples aren't that great (they don't really offer anything beyond a Transform stream), but the idea is there would be another API that would create a Duplex stream from one Readable and one Writable:
process.stdin.pipe(new Transform(new ROT26Pair)).pipe(process.stdout);
And second, while I think exposing DuplexPair is an improvement, part of the request is better encapsulation—having a reference to Readable by itself shouldn't let me write data to that stream. This might require a re-implementation of some logic, which is why I figure much of the existing logic around buffering/flow control/object streams could be optimized away.
I'm fine in exposing DuplexPair, I don't think we should change the other streams API right now.
I'm not sure I agree with this. We've quite recently had problems with DuplexPair and I don't think the; how we want it to work, how it does work and how it can work is quite consistent.
I'm worried further extending the streams public API is going to make it even more difficult to make sure streams behave and are used correctly and can be reasoned about.
There is already a npm package that exposes this functionality where semver can be applied.
@ronag Well, I reported that bug, and my impression is it happened because regular streams (with the asymmetric API) are way too complicated for one person to understand how they function. And the same bug is in that "duplexpair" package, too, because it depends on a Node.js-derived stream to work.
Do you have a critique of this proposal _per se_ (by itself), or is the concern simply that the change is unwarranted for now?
And is encapsulation in Node.js simply not a priority? I've noticed a lot of other stuff has been getting ported to symbol properties, for instance.
@awwright: My main objection is from a personal perspective. I find the existing paradigms flawed but good enough for now. If we are going to introduce another overlapping paradigm to our users it should to according to me at least be significantly better and something we actively encourage and document. Not something we just add in.
are way too complicated for one person to understand how they function
Yes, and DuplexPair is still built on top of those. For me it's just another layer which makes it even harder to reason about. If we built DuplexPair from scratch and with well defined semantics I would be more positive but I don't believe that's the way to go either. Or if it in someway allowed us to get around some difficult stream related edge cases that are difficult to fix due to compat concerns.
Right now I'm not sure of either how DuplexPair works exactly or how it is supposed to work (even though I was involved with sorting out that bug).
Do you have a critique of this proposal per se (by itself), or is the concern simply that the change is unwarranted for now?
I like the proposal from the theoretical perspective but I think it's unwarranted right now from a practical stand-point. I would rather focus on trying to sort out the existing API's and make them more consistent and easier to use.
And is encapsulation in Node.js simply not a priority?
This issue is for me not about encapsulation. We could theoretically hide push behind private symbols (not sure what you mean about Writable._read, that doesn't exist). So it's not really about the fact that we have Readable and Writable but more of an implementation detail.
I have a lot of opinions on this but they are soft. I don't have any hard objections to this but would advise against it.
I would encourage you to discuss with @Fishrock123 who is working on a larger overhaul of streams and might make use of some of your feedback.
@ronag Fair enough, thanks for the explanation! I'll check that out for sure. Maybe it solves the same things.
@addaleax I'm guessing you have a differing opinion, may I hear?
@awwright I mean, in the end it boils down to exposing DuplexPair from streams, right? I don’t really see any issue with that. It doesn’t have to be a paradigm change, and we don’t have to adopt it elsewhere in core, but if people want to use it, why not.
(And regarding
Right now I'm not sure of either how DuplexPair works exactly or how it is supposed to work (even though I was involved with sorting out that bug).
… that can be put into words relatively directly: What is written to one of the Duplex streams is read from the other.)
I mean, in the end it boils down to exposing DuplexPair from streams, right? I don’t really see any issue with that. It doesn’t have to be a paradigm change, and we don’t have to adopt it elsewhere in core, but if people want to use it, why not.
Then why not use the npm package? If we expose something from node core at least in my view we officially supporting and encourage usage. It's a mark of this component is mature and thought through and usable without having to give extra thought to edge cases that we might take into account when using it in core. To me it's more than "just exposing it".
… that can be put into words relatively directly: What is written to one of the Duplex streams is read from the other.)
That's a bit too simplistic though in my opinion... there are a lot of edge cases that might need to be thought through...
I haven't given this enough thought and some of the above points might be nonsense but I hope that clarifies my concern.