Node: Buffer.from(str, encoding) silently ignores decoding errors

Created on 17 Sep 2016 · 14Comments · Source: nodejs/node

Version: Node v6.5.0
Platform: OS X 10.11
Subsystem: Buffer

Buffer.from silently ignores decoding errors.

Buffer.from('aGVsbG8=', 'base64') // valid base64 string
// => hello
Buffer.from('aGV     sbG8=', 'base64') // invalid base64 string
// => hello

I know this is now the expected behavior but it would be useful, e.g. when writing parsers for strict protocols, to have a Buffer.safeFrom which throws on decoding errors. It could also be passed as a parameter, e.g.: Buffer.from(str, [encoding, throwOnError]).

buffer doc stalled

Source

olalonde

👍4

Most helpful comment

This issue affects also hex encoding, and documentation is not clear about how it will behave in such cases.

// Node 7.10.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2

var x = Buffer.from('a2zza2 odd!', 'hex'); // TypeError: Invalid hex string
var y = x.toString('hex');

// Node 8.0.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2

var x = Buffer.from('a2zza2 odd!', 'hex');
var y = x.toString('hex'); // a2

4miners on 19 Jul 2017

👍13

All 14 comments

@olalonde I think we can use regular expressions to check for the validity of the encoding.

ojss on 17 Sep 2016

I think we can use regular expressions to check for the validity of the encoding.

Well, yeah, the user can do that if it’s necessary (and I’d be curious why it would be necessary), so I’m not sure it’s important to have that extra validation in core.

addaleax on 17 Sep 2016

@addaleax Sorry I should have been more clear. That's what I meant.

On 18-Sep-2016 2:22 AM, "Anna Henningsen" [email protected] wrote:

I think we can use regular expressions to check for the validity of the
encoding.

Well, yeah, the user can do that if it’s necessary (and I’d be curious why
it would be necessary), so I’m not sure it’s important to have that extra
validation in core.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/nodejs/node/issues/8569#issuecomment-247807214, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AKL6ATmkH9rTRdmeofmiH8BIquLXt2vTks5qrFMggaJpZM4J_c_9
.

ojss on 18 Sep 2016

👍1

Whether this belongs core is subjective. Regex would work but even for base64, it's not as straightforward as it may seem, e.g. the regex that https://github.com/kevva/base64-regex/blob/master/index.js uses: '(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)'; or this stackoverflow answer.

One scenario I had in mind was using Buffer.from with the expectation that it will throw if the encoding is invalid, for example when implementing a strict protocol that requires you to reply with an error message if the encoding of some value is invalid. Right now this can't be done with Buffer.from alone.

olalonde on 22 Sep 2016

Buffer.from('aGV sbG8=', 'base64')

Real-world base64 frequently has extra whitespace or missing == trailers.

History factoid: one of my first contributions was a series of base64 decoder performance improvements, followed a few days later by a set of fix-ups because people complained I made it too strict. :-)

bnoordhuis on 23 Sep 2016

😄1

@olalonde It might help if you could lay out why you think this would be useful to have _in Node core_. I realize it can be useful, but what advantage would that have over userland alternatives (which, as you point out, exist)?

addaleax on 23 Sep 2016

Well, the tricky part to validation here is the performance hit. Validating the encoding can be quite expensive if we're expected to do it every time.

jasnell on 23 Sep 2016

I don’t think we really have to worry about that, anything like this would probably have go into its own API anyway?

addaleax on 23 Sep 2016

It could be a a strict option passed to Buffer.from. In that mode, Buffer.from would throw when it sees invalid bytes instead of ignoring/replacing them.

olalonde on 23 Sep 2016

Perhaps yeah. We could, I suppose, add a new option to Buffer.from() that says "Validate this input" but I'm not sure there's enough value in doing so vs. validating when reading the data out of the Buffer instance. For utf8 and utf16 it's trivial to simply validate when reading the data. I guess what I'm trying to say is that a strict option would only really be useful for base64 because the other encodings are easily validated on read... and since it would only be useful on base64, does it make sense to add it to core. I'm really not sure.

jasnell on 23 Sep 2016

Real-world base64 frequently has extra whitespace or missing == trailers.

It's fine to skip whitespace or padding characters — you can always encode such result back to the correct base64, but returning incorrect results as in:

var x = Buffer.from("вот зе фак", "base64") // -> <Buffer d8 1e f9 0f 4f>
var y = x.toString('base64') // -> '2B75D08='

instead of throwing error is unreasonable and will lead to data corruption and vulnerabilities.

https://tools.ietf.org/html/rfc4648#section-3.3

dchest on 13 Mar 2017

👍7

@dchest, I just finished https://github.com/ronomon/base64 as a stop-gap until these and other base64 issues are fixed. The decoder will skip whitespace or missing padding characters but it will detect and throw an exception for corrupt or truncated base64 without additional performance penalty. The decoder is also nearly 2x faster when decoding wrapped base64.

jorangreef on 3 Apr 2017

👍1

This issue affects also hex encoding, and documentation is not clear about how it will behave in such cases.

// Node 7.10.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2

var x = Buffer.from('a2zza2 odd!', 'hex'); // TypeError: Invalid hex string
var y = x.toString('hex');

// Node 8.0.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2

var x = Buffer.from('a2zza2 odd!', 'hex');
var y = x.toString('hex'); // a2

4miners on 19 Jul 2017

👍13

This issue is closed, however, the base64 decoding doesn't just "ignore spaces"... in the referenced documentation. It ignores any invalid character. At the very least there should be a "{strict:true}" option to from() that raises exceptions when you : pass base64 stuff to hex decodings, pass an array to base64 decoding, etc.