Buffer.from
silently ignores decoding errors.
Buffer.from('aGVsbG8=', 'base64') // valid base64 string
// => hello
Buffer.from('aGV sbG8=', 'base64') // invalid base64 string
// => hello
I know this is now the expected behavior but it would be useful, e.g. when writing parsers for strict protocols, to have a Buffer.safeFrom
which throws on decoding errors. It could also be passed as a parameter, e.g.: Buffer.from(str, [encoding, throwOnError])
.
@olalonde I think we can use regular expressions to check for the validity of the encoding.
I think we can use regular expressions to check for the validity of the encoding.
Well, yeah, the user can do that if it’s necessary (and I’d be curious why it would be necessary), so I’m not sure it’s important to have that extra validation in core.
@addaleax Sorry I should have been more clear. That's what I meant.
On 18-Sep-2016 2:22 AM, "Anna Henningsen" [email protected] wrote:
I think we can use regular expressions to check for the validity of the
encoding.Well, yeah, the user can do that if it’s necessary (and I’d be curious why
it would be necessary), so I’m not sure it’s important to have that extra
validation in core.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/nodejs/node/issues/8569#issuecomment-247807214, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AKL6ATmkH9rTRdmeofmiH8BIquLXt2vTks5qrFMggaJpZM4J_c_9
.
Whether this belongs core is subjective. Regex would work but even for base64, it's not as straightforward as it may seem, e.g. the regex that https://github.com/kevva/base64-regex/blob/master/index.js uses: '(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)';
or this stackoverflow answer.
One scenario I had in mind was using Buffer.from
with the expectation that it will throw if the encoding is invalid, for example when implementing a strict protocol that requires you to reply with an error message if the encoding of some value is invalid. Right now this can't be done with Buffer.from alone.
Buffer.from('aGV sbG8=', 'base64')
Real-world base64 frequently has extra whitespace or missing == trailers.
History factoid: one of my first contributions was a series of base64 decoder performance improvements, followed a few days later by a set of fix-ups because people complained I made it too strict. :-)
@olalonde It might help if you could lay out why you think this would be useful to have _in Node core_. I realize it can be useful, but what advantage would that have over userland alternatives (which, as you point out, exist)?
Well, the tricky part to validation here is the performance hit. Validating the encoding can be quite expensive if we're expected to do it every time.
I don’t think we really have to worry about that, anything like this would probably have go into its own API anyway?
It could be a a strict
option passed to Buffer.from
. In that mode, Buffer.from would throw when it sees invalid bytes instead of ignoring/replacing them.
Perhaps yeah. We could, I suppose, add a new option to Buffer.from()
that says "Validate this input" but I'm not sure there's enough value in doing so vs. validating when reading the data out of the Buffer
instance. For utf8
and utf16
it's trivial to simply validate when reading the data. I guess what I'm trying to say is that a strict
option would only really be useful for base64
because the other encodings are easily validated on read... and since it would only be useful on base64
, does it make sense to add it to core. I'm really not sure.
Real-world base64 frequently has extra whitespace or missing == trailers.
It's fine to skip whitespace or padding characters — you can always encode such result back to the correct base64, but returning incorrect results as in:
var x = Buffer.from("вот зе фак", "base64") // -> <Buffer d8 1e f9 0f 4f>
var y = x.toString('base64') // -> '2B75D08='
instead of throwing error is unreasonable and will lead to data corruption and vulnerabilities.
@dchest, I just finished https://github.com/ronomon/base64 as a stop-gap until these and other base64 issues are fixed. The decoder will skip whitespace or missing padding characters but it will detect and throw an exception for corrupt or truncated base64 without additional performance penalty. The decoder is also nearly 2x faster when decoding wrapped base64.
This issue affects also hex
encoding, and documentation is not clear about how it will behave in such cases.
// Node 7.10.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2
var x = Buffer.from('a2zza2 odd!', 'hex'); // TypeError: Invalid hex string
var y = x.toString('hex');
// Node 8.0.0
var x = Buffer.from('a2zza2 even!', 'hex');
var y = x.toString('hex'); // a2
var x = Buffer.from('a2zza2 odd!', 'hex');
var y = x.toString('hex'); // a2
This issue is closed, however, the base64 decoding doesn't just "ignore spaces"... in the referenced documentation. It ignores any invalid character. At the very least there should be a "{strict:true}" option to from() that raises exceptions when you : pass base64 stuff to hex decodings, pass an array to base64 decoding, etc.
Most helpful comment
This issue affects also
hex
encoding, and documentation is not clear about how it will behave in such cases.