Node: Buffer.from(str, encoding) silently ignores decoding errors

Created on 17 Sep 2016  Â·  14Comments  Â·  Source: nodejs/node

  • Version: Node v6.5.0
  • Platform: OS X 10.11
  • Subsystem: Buffer

Buffer.from silently ignores decoding errors.

Buffer.from('aGVsbG8=', 'base64') // valid base64 string
// => hello
Buffer.from('aGV     sbG8=', 'base64') // invalid base64 string
// => hello

I know this is now the expected behavior but it would be useful, e.g. when writing parsers for strict protocols, to have a Buffer.safeFrom which throws on decoding errors. It could also be passed as a parameter, e.g.: Buffer.from(str, [encoding, throwOnError]).

buffer doc stalled

Most helpful comment

This issue affects also hex encoding, and documentation is not clear about how it will behave in such cases.

// Node 7.10.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2

var x = Buffer.from('a2zza2 odd!', 'hex'); // TypeError: Invalid hex string
var y = x.toString('hex');

// Node 8.0.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2

var x = Buffer.from('a2zza2 odd!', 'hex');
var y = x.toString('hex'); // a2

All 14 comments

@olalonde I think we can use regular expressions to check for the validity of the encoding.

I think we can use regular expressions to check for the validity of the encoding.

Well, yeah, the user can do that if it’s necessary (and I’d be curious why it would be necessary), so I’m not sure it’s important to have that extra validation in core.

@addaleax Sorry I should have been more clear. That's what I meant.

On 18-Sep-2016 2:22 AM, "Anna Henningsen" [email protected] wrote:

I think we can use regular expressions to check for the validity of the
encoding.

Well, yeah, the user can do that if it’s necessary (and I’d be curious why
it would be necessary), so I’m not sure it’s important to have that extra
validation in core.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/nodejs/node/issues/8569#issuecomment-247807214, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AKL6ATmkH9rTRdmeofmiH8BIquLXt2vTks5qrFMggaJpZM4J_c_9
.

Whether this belongs core is subjective. Regex would work but even for base64, it's not as straightforward as it may seem, e.g. the regex that https://github.com/kevva/base64-regex/blob/master/index.js uses: '(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)'; or this stackoverflow answer.

One scenario I had in mind was using Buffer.from with the expectation that it will throw if the encoding is invalid, for example when implementing a strict protocol that requires you to reply with an error message if the encoding of some value is invalid. Right now this can't be done with Buffer.from alone.

Buffer.from('aGV sbG8=', 'base64')

Real-world base64 frequently has extra whitespace or missing == trailers.

History factoid: one of my first contributions was a series of base64 decoder performance improvements, followed a few days later by a set of fix-ups because people complained I made it too strict. :-)

@olalonde It might help if you could lay out why you think this would be useful to have _in Node core_. I realize it can be useful, but what advantage would that have over userland alternatives (which, as you point out, exist)?

Well, the tricky part to validation here is the performance hit. Validating the encoding can be quite expensive if we're expected to do it every time.

I don’t think we really have to worry about that, anything like this would probably have go into its own API anyway?

It could be a a strict option passed to Buffer.from. In that mode, Buffer.from would throw when it sees invalid bytes instead of ignoring/replacing them.

Perhaps yeah. We could, I suppose, add a new option to Buffer.from() that says "Validate this input" but I'm not sure there's enough value in doing so vs. validating when reading the data out of the Buffer instance. For utf8 and utf16 it's trivial to simply validate when reading the data. I guess what I'm trying to say is that a strict option would only really be useful for base64 because the other encodings are easily validated on read... and since it would only be useful on base64, does it make sense to add it to core. I'm really not sure.

Real-world base64 frequently has extra whitespace or missing == trailers.

It's fine to skip whitespace or padding characters — you can always encode such result back to the correct base64, but returning incorrect results as in:

var x = Buffer.from("вот зе фак", "base64") // -> <Buffer d8 1e f9 0f 4f>
var y = x.toString('base64') // -> '2B75D08='

instead of throwing error is unreasonable and will lead to data corruption and vulnerabilities.

https://tools.ietf.org/html/rfc4648#section-3.3

@dchest, I just finished https://github.com/ronomon/base64 as a stop-gap until these and other base64 issues are fixed. The decoder will skip whitespace or missing padding characters but it will detect and throw an exception for corrupt or truncated base64 without additional performance penalty. The decoder is also nearly 2x faster when decoding wrapped base64.

This issue affects also hex encoding, and documentation is not clear about how it will behave in such cases.

// Node 7.10.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2

var x = Buffer.from('a2zza2 odd!', 'hex'); // TypeError: Invalid hex string
var y = x.toString('hex');

// Node 8.0.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2

var x = Buffer.from('a2zza2 odd!', 'hex');
var y = x.toString('hex'); // a2

This issue is closed, however, the base64 decoding doesn't just "ignore spaces"... in the referenced documentation. It ignores any invalid character. At the very least there should be a "{strict:true}" option to from() that raises exceptions when you : pass base64 stuff to hex decodings, pass an array to base64 decoding, etc.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

addaleax picture addaleax  Â·  3Comments

cong88 picture cong88  Â·  3Comments

danialkhansari picture danialkhansari  Â·  3Comments

willnwhite picture willnwhite  Â·  3Comments

Brekmister picture Brekmister  Â·  3Comments