Node: stream: socket.setEncoding(null) to receive binary Buffers rather than strings has no effect

Created on 4 Apr 2016  Â·  26Comments  Â·  Source: nodejs/node

  • Version: v5.8.0
  • Platform: Darwin 15.3.0 Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64 x86_64
  • Subsystem: stream

I've tried node 5.8.0, 5.10.0 and 4.4.2. I always receive string objects in my data handler unless I manually remove the decoder from the socket readableState:

const server = net.createServer( ( c ) => {

    // See https://nodejs.org/api/stream.html#stream_readable_setencoding_encoding
    c.setEncoding( null );

    // Hack that must be added to make this work as expected
    delete c._readableState.decoder;

    c.on( 'data', ( d ) => {

        // Expect 'object' but get 'string' without deleting the decoder manually
        console.log( typeof d );
    } );

} );
doc stream

Most helpful comment

A year on, and I'm still having to apply the hack to read binary :(

All 26 comments

Perhaps this might be a bug in ReadableStream based on a cursory examination of https://github.com/nodejs/node/blob/master/lib/_stream_readable.js#L193

I haven't contributed to Node before but if it seems like a correct approach to null out the encoder if enc === null in that function then I would be more than happy to submit a PR/tests.

Doc change happened in #5155.

@TriAnMan @silverwind @jasnell From what I can tell passing null has never disabled an existing StringDecoder instance?

Hmm, that doc change looks wrong, actually. string_decoder does fallback to 'utf8' when encoding is falsy.

I've using this workaround for v4.3.1 and for v0.10.36
Here is a code snippet:

var processServerResponse = function (data) {
    data.setEncoding(null); // Fix the bug with braked utf8 symbols, converted into two unicode REPLACEMENT CHARACTER

    var str = '';
    data.on('data', function (chunk) {
        str += chunk;
    });
    data.on('end', function () {
        // do something with str
    });
};

http.request({/* some valid http options */}, processServerResponse).end();

I don't know what is a typeof chunk but can test.

If I use data.setEncoding(null) then typeof chunk is a 'string'
If I comment out setEncoding then typeof chunk becomes an 'object'

This is for v4.3.1, can't test v0.10.36 right now.

@mscdex ... as far as I know, utf8 is the default when setEncoding is null. If calling data.setEncoding('buffer') does not make it return a buffer, then that's a change we should make. We use encoding = 'buffer' in other places to explicitly ask for returning buffers (e.g. in 'fs')

@jasnell IIRC it does not return buffers using 'buffer' since the value is passed to StringDecoder() which would treat it as a single-byte/binary string encoding.

It doesn't work with 'buffer' because it's not a valid encoding.

Buffer.isEncoding('buffer')
> false

The problem is caused by the falsy check in string_decoder.js as @silverwind already said.
Another quick fix is to use non-flowing mode.

'use strict';
const cp = require('child_process');
const child = cp.spawn('ls', { stdio:['ignore','pipe','ignore'] });

// This works
child.stdout.on('readable', () => {
  let chunk;
  while (null !== (chunk = child.stdout.read())) {
    console.log('read %s', typeof chunk);
  }
});

// Type will always be a string
child.stdout.setEncoding(null);
child.stdout.on('data', (chunk) => {
  console.log('onData %s', typeof chunk);
});

This workaround also works – pretty hacky but does the trick until this is fixed (and backported?):

// Simulate socket.setEncoding(null):
if (socket._readableState) {
  delete socket._readableState.decoder
  delete socket._readableState.encoding
}

EDIT: Sorry, just noticed this is basically in the issue description – my bad for not reading fully!

PR to fix: #12766

The PR to fix appears stalled. A doc-only update as at least an interim would be welcome. Labeling doc....

A year on, and I'm still having to apply the hack to read binary :(

@nodejs/streams

Ping @nodejs/streams ... @mafintosh @mcollina ... any thoughts here?

This needs to be done. It would have to wait until December :/

This seems to be an issue also with the http.IncomingMessage.

I had to use the hack to read binary body to Buffer for HTTP request. I was using Node v10.13.0.

@jheusala As long as you don't call setEncoding(), you should get Buffer chunks on the incoming message stream by default.

@mscdex That's what I thought, too. However I have nowhere in my code another .setEncoding(), nor "encoding" or "utf" strings found. (I tried to do a smaller example but it works as expected; so something else in my code must be triggering it as string stream.)

Strange. It started to work also on my complete app. I could swear it was receiving strings, not Buffers, last evening.

@nodejs/streams ... what's the status on this one? Should this remain open?

The work of @addaleax to fix this (https://github.com/nodejs/node/pull/12766) never landed.

I'm closing due to lack of activity/interest in this problem.

Should this behaviour be documented at least?

@dchest I don’t know if this can be fixed, though. There’s a specific reason why my PR wasn’t merged – the decoding step happens at the time the data is pushed into the readable stream but before it is actually read, that means, the buffered data inside the stream is potentially already decoded. We could re-encode it, but the problem is that the result is potentially different from the original input data in that case.

So I think this is something that would need to be documented, yes.

@addaleax isn't that true for any setEncoding call though? Why would it only be setEncoding(null) that would be affected by that?

@mhart Right, for other encodings this is problematic as well.

I think the way to fix this would be to move decoding to the point where the chunks are emitted, rather than the point where they are pushed into the stream. That would enable solving this for both null/'buffer' and other encodings.

think the way to fix this would be to move decoding to the point where the chunks are emitted, rather than the point where they are pushed into the stream.

That sounds like a way forward... though probably a bit difficult to implement.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sandeepks1 picture sandeepks1  Â·  3Comments

Icemic picture Icemic  Â·  3Comments

addaleax picture addaleax  Â·  3Comments

fanjunzhi picture fanjunzhi  Â·  3Comments

srl295 picture srl295  Â·  3Comments