Node: Can't convert Buffer into String correctly with binary encoding

Created on 8 May 2017  Â·  6Comments  Â·  Source: nodejs/node

Hello, I was working with binary files and Buffer, and encountered this problem.

Code:

b = Buffer.from([0xca, 0xc5, 0x0e])
console.log(b.toString('binary'))
console.log(b)

I would expect as output something like:

ÊÅ^N

which is exactly the 3 bytes 0xca, 0xc5, 0x0e
Instead of this I got the equivalent of:

\xc3\x8a\xc3\x85\x0e

Is it the expected behavior? If so, how can I obtain the string "\xca\xc5\x0e" from the Buffer?

  • Version: 7.9.0
  • Platform: Mac OS X Sierra
buffer question

Most helpful comment

I would expect as output something like: ÊÅ^N

That is the output that you get, though: The string ÊÅ^N – encoded as UTF-8.

which is exactly the 3 bytes 0xca, 0xc5, 0x0e

Yes, that’s ÊÅ^N encoded as ISO-8859-1 (which is known as latin1 in Node v6+, and binary before).

Is it the expected behavior?

Yes. You’re using console.log() to print the string to the terminal, which uses process.stdout.write() in its implementation; and unless you call process.stdout.setDefaultEncoding(…), the default encoding for stdout will be UTF-8 (because, practically speaking, that’s what terminal emulators support).

So, what’s going on is that Node takes your Buffer, decodes to a string it as if it were ISO-8859-1-encoded text (because you said so), then re-encodes that string into a Buffer as UTF-8 (because it has to encode it somehow to write it to stdout, and UTF-8 is the most sensible choice), and then sends that Buffer to the terminal (where it gets decoded as UTF-8 again, in all likelihood.)

If so, how can I obtain the string "\xca\xc5\x0e" from the Buffer?

b.toString('binary') (b.toString('latin1')) is actually the right answer – you can verify that by running b.toString('binary') === "\xca\xc5\x0e" on your example, which does return true as expected.

Let me know if this doesn’t help!

All 6 comments

I would expect as output something like: ÊÅ^N

That is the output that you get, though: The string ÊÅ^N – encoded as UTF-8.

which is exactly the 3 bytes 0xca, 0xc5, 0x0e

Yes, that’s ÊÅ^N encoded as ISO-8859-1 (which is known as latin1 in Node v6+, and binary before).

Is it the expected behavior?

Yes. You’re using console.log() to print the string to the terminal, which uses process.stdout.write() in its implementation; and unless you call process.stdout.setDefaultEncoding(…), the default encoding for stdout will be UTF-8 (because, practically speaking, that’s what terminal emulators support).

So, what’s going on is that Node takes your Buffer, decodes to a string it as if it were ISO-8859-1-encoded text (because you said so), then re-encodes that string into a Buffer as UTF-8 (because it has to encode it somehow to write it to stdout, and UTF-8 is the most sensible choice), and then sends that Buffer to the terminal (where it gets decoded as UTF-8 again, in all likelihood.)

If so, how can I obtain the string "\xca\xc5\x0e" from the Buffer?

b.toString('binary') (b.toString('latin1')) is actually the right answer – you can verify that by running b.toString('binary') === "\xca\xc5\x0e" on your example, which does return true as expected.

Let me know if this doesn’t help!

Thank you so much for the useful and complete explanation, you saved my day!

Okay, I’m closing this as an answered question then – feel free to ask any follow-up questions, here or on https://github.com/nodejs/help. :)

Hi there! Sorry but I'm still a bit confused.

In the nodejs docs I see this example:

const buf1 = Buffer.from('this is a tést');
const buf2 = Buffer.from('7468697320697320612074c3a97374', 'hex');

console.log(buf1.toString());
// Prints: this is a tést
console.log(buf2.toString());
// Prints: this is a tést

Why can't I interact with a binary representation the same way I can with hex?

For instance:

const buf3 = Buffer.from('11101...0100', 'binary') // '...' just used to truncate
buf3.toString()
// Prints '11101...0100', not 'this is a tést' (which I want!)

Furthermore:

buf1.toString('hex')
// Prints: 7468697320697320612074c3a97374

buf1.toString('binary')
// Prints: 'this is a tést', not '11101...0100' (which I want!)

Thank you very much for your help.

I see. Thank you.

Was this page helpful?
0 / 5 - 0 ratings