Hello, I was working with binary files and Buffer, and encountered this problem.
Code:
b = Buffer.from([0xca, 0xc5, 0x0e])
console.log(b.toString('binary'))
console.log(b)
I would expect as output something like:
ÊÅ^N
which is exactly the 3 bytes 0xca, 0xc5, 0x0e
Instead of this I got the equivalent of:
\xc3\x8a\xc3\x85\x0e
Is it the expected behavior? If so, how can I obtain the string "\xca\xc5\x0e" from the Buffer?
I would expect as output something like:
ÊÅ^N
That is the output that you get, though: The string ÊÅ^N
– encoded as UTF-8.
which is exactly the 3 bytes 0xca, 0xc5, 0x0e
Yes, that’s ÊÅ^N encoded as ISO-8859-1 (which is known as latin1
in Node v6+, and binary
before).
Is it the expected behavior?
Yes. You’re using console.log()
to print the string to the terminal, which uses process.stdout.write()
in its implementation; and unless you call process.stdout.setDefaultEncoding(…)
, the default encoding for stdout will be UTF-8 (because, practically speaking, that’s what terminal emulators support).
So, what’s going on is that Node takes your Buffer, decodes to a string it as if it were ISO-8859-1-encoded text (because you said so), then re-encodes that string into a Buffer as UTF-8 (because it has to encode it somehow to write it to stdout, and UTF-8 is the most sensible choice), and then sends that Buffer to the terminal (where it gets decoded as UTF-8 again, in all likelihood.)
If so, how can I obtain the string "\xca\xc5\x0e" from the Buffer?
b.toString('binary')
(b.toString('latin1')
) is actually the right answer – you can verify that by running b.toString('binary') === "\xca\xc5\x0e"
on your example, which does return true
as expected.
Let me know if this doesn’t help!
Thank you so much for the useful and complete explanation, you saved my day!
Okay, I’m closing this as an answered question then – feel free to ask any follow-up questions, here or on https://github.com/nodejs/help. :)
Hi there! Sorry but I'm still a bit confused.
In the nodejs docs I see this example:
const buf1 = Buffer.from('this is a tést');
const buf2 = Buffer.from('7468697320697320612074c3a97374', 'hex');
console.log(buf1.toString());
// Prints: this is a tést
console.log(buf2.toString());
// Prints: this is a tést
Why can't I interact with a binary representation the same way I can with hex?
For instance:
const buf3 = Buffer.from('11101...0100', 'binary') // '...' just used to truncate
buf3.toString()
// Prints '11101...0100', not 'this is a tést' (which I want!)
Furthermore:
buf1.toString('hex')
// Prints: 7468697320697320612074c3a97374
buf1.toString('binary')
// Prints: 'this is a tést', not '11101...0100' (which I want!)
Thank you very much for your help.
@dustinmichels binary
is an alias for latin1
: https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings
I see. Thank you.
Most helpful comment
That is the output that you get, though: The string
ÊÅ^N
– encoded as UTF-8.Yes, that’s ÊÅ^N encoded as ISO-8859-1 (which is known as
latin1
in Node v6+, andbinary
before).Yes. You’re using
console.log()
to print the string to the terminal, which usesprocess.stdout.write()
in its implementation; and unless you callprocess.stdout.setDefaultEncoding(…)
, the default encoding for stdout will be UTF-8 (because, practically speaking, that’s what terminal emulators support).So, what’s going on is that Node takes your Buffer, decodes to a string it as if it were ISO-8859-1-encoded text (because you said so), then re-encodes that string into a Buffer as UTF-8 (because it has to encode it somehow to write it to stdout, and UTF-8 is the most sensible choice), and then sends that Buffer to the terminal (where it gets decoded as UTF-8 again, in all likelihood.)
b.toString('binary')
(b.toString('latin1')
) is actually the right answer – you can verify that by runningb.toString('binary') === "\xca\xc5\x0e"
on your example, which does returntrue
as expected.Let me know if this doesn’t help!