I've tried node 5.8.0, 5.10.0 and 4.4.2. I always receive string objects in my data handler unless I manually remove the decoder from the socket readableState:
const server = net.createServer( ( c ) => {
// See https://nodejs.org/api/stream.html#stream_readable_setencoding_encoding
c.setEncoding( null );
// Hack that must be added to make this work as expected
delete c._readableState.decoder;
c.on( 'data', ( d ) => {
// Expect 'object' but get 'string' without deleting the decoder manually
console.log( typeof d );
} );
} );
Perhaps this might be a bug in ReadableStream based on a cursory examination of https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L193
I haven't contributed to Node before but if it seems like a correct approach to null out the encoder if enc === null in that function then I would be more than happy to submit a PR/tests.
Doc change happened in #5155.
@TriAnMan @silverwind @jasnell From what I can tell passing null has never disabled an existing StringDecoder instance?
Hmm, that doc change looks wrong, actually. string_decoder does fallback to 'utf8' when encoding is falsy.
I've using this workaround for v4.3.1 and for v0.10.36
Here is a code snippet:
var processServerResponse = function (data) {
data.setEncoding(null); // Fix the bug with braked utf8 symbols, converted into two unicode REPLACEMENT CHARACTER
var str = '';
data.on('data', function (chunk) {
str += chunk;
});
data.on('end', function () {
// do something with str
});
};
http.request({/* some valid http options */}, processServerResponse).end();
I don't know what is a typeof chunk but can test.
If I use data.setEncoding(null) then typeof chunk is a 'string'
If I comment out setEncoding then typeof chunk becomes an 'object'
This is for v4.3.1, can't test v0.10.36 right now.
@mscdex ... as far as I know, utf8 is the default when setEncoding is null. If calling data.setEncoding('buffer') does not make it return a buffer, then that's a change we should make. We use encoding = 'buffer' in other places to explicitly ask for returning buffers (e.g. in 'fs')
@jasnell IIRC it does not return buffers using 'buffer' since the value is passed to StringDecoder() which would treat it as a single-byte/binary string encoding.
It doesn't work with 'buffer' because it's not a valid encoding.
Buffer.isEncoding('buffer')
> false
The problem is caused by the falsy check in string_decoder.js as @silverwind already said.
Another quick fix is to use non-flowing mode.
'use strict';
const cp = require('child_process');
const child = cp.spawn('ls', { stdio:['ignore','pipe','ignore'] });
// This works
child.stdout.on('readable', () => {
let chunk;
while (null !== (chunk = child.stdout.read())) {
console.log('read %s', typeof chunk);
}
});
// Type will always be a string
child.stdout.setEncoding(null);
child.stdout.on('data', (chunk) => {
console.log('onData %s', typeof chunk);
});
This workaround also works – pretty hacky but does the trick until this is fixed (and backported?):
// Simulate socket.setEncoding(null):
if (socket._readableState) {
delete socket._readableState.decoder
delete socket._readableState.encoding
}
EDIT: Sorry, just noticed this is basically in the issue description – my bad for not reading fully!
PR to fix: #12766
The PR to fix appears stalled. A doc-only update as at least an interim would be welcome. Labeling doc....
A year on, and I'm still having to apply the hack to read binary :(
@nodejs/streams
Ping @nodejs/streams ... @mafintosh @mcollina ... any thoughts here?
This needs to be done. It would have to wait until December :/
This seems to be an issue also with the http.IncomingMessage.
I had to use the hack to read binary body to Buffer for HTTP request. I was using Node v10.13.0.
@jheusala As long as you don't call setEncoding(), you should get Buffer chunks on the incoming message stream by default.
@mscdex That's what I thought, too. However I have nowhere in my code another .setEncoding(), nor "encoding" or "utf" strings found. (I tried to do a smaller example but it works as expected; so something else in my code must be triggering it as string stream.)
Strange. It started to work also on my complete app. I could swear it was receiving strings, not Buffers, last evening.
@nodejs/streams ... what's the status on this one? Should this remain open?
The work of @addaleax to fix this (https://github.com/nodejs/node/pull/12766) never landed.
I'm closing due to lack of activity/interest in this problem.
Should this behaviour be documented at least?
@dchest I don’t know if this can be fixed, though. There’s a specific reason why my PR wasn’t merged – the decoding step happens at the time the data is pushed into the readable stream but before it is actually read, that means, the buffered data inside the stream is potentially already decoded. We could re-encode it, but the problem is that the result is potentially different from the original input data in that case.
So I think this is something that would need to be documented, yes.
@addaleax isn't that true for any setEncoding call though? Why would it only be setEncoding(null) that would be affected by that?
@mhart Right, for other encodings this is problematic as well.
I think the way to fix this would be to move decoding to the point where the chunks are emitted, rather than the point where they are pushed into the stream. That would enable solving this for both null/'buffer' and other encodings.
think the way to fix this would be to move decoding to the point where the chunks are emitted, rather than the point where they are pushed into the stream.
That sounds like a way forward... though probably a bit difficult to implement.
Most helpful comment
A year on, and I'm still having to apply the hack to read binary :(