Node: Can't convert Buffer into String correctly with binary encoding

Created on 8 May 2017 · 6Comments · Source: nodejs/node

Hello, I was working with binary files and Buffer, and encountered this problem.

Code:

b = Buffer.from([0xca, 0xc5, 0x0e])
console.log(b.toString('binary'))
console.log(b)

I would expect as output something like:

ÊÅ^N

which is exactly the 3 bytes 0xca, 0xc5, 0x0e
Instead of this I got the equivalent of:

\xc3\x8a\xc3\x85\x0e

Is it the expected behavior? If so, how can I obtain the string "\xca\xc5\x0e" from the Buffer?

Version: 7.9.0
Platform: Mac OS X Sierra

buffer question

Source

bennesp

Most helpful comment

I would expect as output something like: ÊÅ^N

That is the output that you get, though: The string ÊÅ^N – encoded as UTF-8.

which is exactly the 3 bytes 0xca, 0xc5, 0x0e

Yes, that’s ÊÅ^N encoded as ISO-8859-1 (which is known as latin1 in Node v6+, and binary before).

Is it the expected behavior?

Yes. You’re using console.log() to print the string to the terminal, which uses process.stdout.write() in its implementation; and unless you call process.stdout.setDefaultEncoding(…), the default encoding for stdout will be UTF-8 (because, practically speaking, that’s what terminal emulators support).

So, what’s going on is that Node takes your Buffer, decodes to a string it as if it were ISO-8859-1-encoded text (because you said so), then re-encodes that string into a Buffer as UTF-8 (because it has to encode it somehow to write it to stdout, and UTF-8 is the most sensible choice), and then sends that Buffer to the terminal (where it gets decoded as UTF-8 again, in all likelihood.)

If so, how can I obtain the string "\xca\xc5\x0e" from the Buffer?

b.toString('binary') (b.toString('latin1')) is actually the right answer – you can verify that by running b.toString('binary') === "\xca\xc5\x0e" on your example, which does return true as expected.

Let me know if this doesn’t help!

addaleax on 8 May 2017

👍17 ❤7 🎉7

All 6 comments

I would expect as output something like: ÊÅ^N

That is the output that you get, though: The string ÊÅ^N – encoded as UTF-8.

which is exactly the 3 bytes 0xca, 0xc5, 0x0e

Yes, that’s ÊÅ^N encoded as ISO-8859-1 (which is known as latin1 in Node v6+, and binary before).

Is it the expected behavior?

If so, how can I obtain the string "\xca\xc5\x0e" from the Buffer?

Let me know if this doesn’t help!

addaleax on 8 May 2017

👍17 ❤7 🎉7

Thank you so much for the useful and complete explanation, you saved my day!

bennesp on 8 May 2017

❤2

Okay, I’m closing this as an answered question then – feel free to ask any follow-up questions, here or on https://github.com/nodejs/help. :)

addaleax on 8 May 2017

👍1

Hi there! Sorry but I'm still a bit confused.

In the nodejs docs I see this example:

const buf1 = Buffer.from('this is a tést');
const buf2 = Buffer.from('7468697320697320612074c3a97374', 'hex');

console.log(buf1.toString());
// Prints: this is a tést
console.log(buf2.toString());
// Prints: this is a tést

Why can't I interact with a binary representation the same way I can with hex?

For instance:

const buf3 = Buffer.from('11101...0100', 'binary') // '...' just used to truncate
buf3.toString()
// Prints '11101...0100', not 'this is a tést' (which I want!)

Furthermore:

buf1.toString('hex')
// Prints: 7468697320697320612074c3a97374

buf1.toString('binary')
// Prints: 'this is a tÃ©st', not '11101...0100' (which I want!)

Thank you very much for your help.