Node: Feature request: Crypto APIs should optionally return bigints

Created on 25 Sep 2018 · 25Comments · Source: nodejs/node

Now that bigint support is part of V8, the crypto APIs should be updated to optionally return bigints instead of buffers.

I've done a bit of testing with N-API and get significantly increased performance, something like 3-4x depending on the algorithm involved, even though I'm using OpenSSL just as Node is to do the hashing: https://github.com/no2chem/bigint-hash

It also seems that providing a one-shot hashing implementation (which just hashes a buffer instead of allocating a Hash object) would improve hashing performance as well.

I'm happy to take a look at submitting a PR for this if this is something that makes sense for node.

crypto feature request

Source

no2chem

👍1

Most helpful comment

Which crypto operations are commonly used to actually produce bigints?

Anything that produces a hash, key or ciphertext? We currently output strings or buffers but they can easily be restated as bigints. Whether that's useful is another question; I'm undecided.

Some of tls.TLSSocket#getPeerCertificate()'s properties are really just numbers even though we encode them as strings or buffers, e.g., .pubkey, .fingerprint, .serialNumber, etc.

There are probably more examples.

will most likely introduce side-channel vulnerabilities since comparing bigints is not timing-safe

Good point. It should be possible to teach crypto.timingSafeEqual() about bigints. We can unbox them to bitstrings with v8::BigInt::ToWordsArray() and then treat them like we do buffers.

Open question: are other relational operators besides equality needed?

Interesting side question: can two different bitstrings encode the same number or is there a canonical encoding? They're trimmed of leading zeroes, so... yes? Needs more research.

bnoordhuis on 8 Oct 2018

❤1 👍1

All 25 comments

cc @nodejs/crypto

Now that bigint support is part of V8, the crypto APIs should be updated to optionally return bigints instead of buffers.

One problem with bigints is that people might try to use them in order to increase performance by computing hash values as bigints and then comparing them, which will most likely introduce side-channel vulnerabilities since comparing bigints is not timing-safe. How would we compensate for that?

Let's leave performane aside for a moment: Which crypto operations are commonly used to actually produce bigints?

It also seems that providing a one-shot hashing implementation (which just hashes a buffer instead of allocating a Hash object) would improve hashing performance as well.

It obviously will. The current API was designed to allow hashing of arbitrarily large data streams. The same applies to createHash, createSign, createVerify, createCipher, createDecipher, createCipheriv, createDecipheriv and createHmac, but if we decided to add one-shot APIs, we would need to be really careful not to bloat the API surface with lots of functions.

tniessen on 29 Sep 2018

which will most likely introduce side-channel vulnerabilities since comparing bigints is not timing-safe

While true, I think that differences due to timing comparisons would be due to bit-width, and the digest (hash) size would be fixed by the algorithm chosen, which is going to be the same if you're comparing hashes. Besides, it's just as timing unsafe as comparing a Buffer, I really doubt that the timing performance is value dependent.

no2chem on 29 Sep 2018

For buffers, there is crypto.timingSafeEqual, but I'm not sure how easy it would be to adapt that to also work with bigints.

tniessen on 30 Sep 2018

I see, I stand corrected. It seems that the implementation in v8 currently just iterates over the digits:

bool BigInt::EqualToBigInt(BigInt* x, BigInt* y) {
  if (x->sign() != y->sign()) return false;
  if (x->length() != y->length()) return false;
  for (int i = 0; i < x->length(); i++) {
    if (x->digit(i) != y->digit(i)) return false;
  }
  return true;
}

Which seems quite vulnerable to timing attacks.

It seems though that this isn't easily measured. I performed some simple benchmarks:

bigint compare small: 169458328±0.53% ops/s 5.90±0.153 ns/op (91 runs)
bigint compare large: 168162980±1.36% ops/s 5.95±0.397 ns/op (93 runs)
bigint compare large 2vals: 169617690±0.28% ops/s 5.90±0.081 ns/op (96 runs)
bigint compare large with fewer bits: 168035082±1.03% ops/s 5.95±0.300 ns/op (92 runs)
bigint compare large with small: 168757732±0.35% ops/s 5.93±0.100 ns/op (92 runs)
bigint compare large with 1: 169567537±0.26% ops/s 5.90±0.075 ns/op (93 runs)
bigint compare small with 0: 168169244±0.82% ops/s 5.95±0.239 ns/op (92 runs)
bigint compare large with 0: 110366905±3.45% ops/s 9.06±1.417 ns/op (79 runs)

I don't know why comparison with 0 is slower for a large bigint (the length check should just return false), but practically speaking, measuring these time differences would be quite difficult, compared to string/buffer comparisons, which operate much slowly.

I guess it wouldn't be that hard to implement a timing safe-comparison using N-API.

Which goes back to your comment, @tniessen -

we would need to be really careful not to bloat the API surface with lots of functions

I'm not really connected to node development, but what's the preference here over implementing an external library?

no2chem on 3 Oct 2018

It seems though that this isn't easily measured.

Yes, timing differences are _usually_ barely noticable, and are _usually_ difficult to exploit. However, some scenarios (e.g. massive CPU load causing the process to be much slower, scheduler timing, access to actual CPU time / statistics etc.) make such attacks more likely.

From a (theoretical) cryptographer's point of view, an attacker should not be able to decide whether the first or last bit of a bitstring differs from the expected value with a probability that is significantly better than guessing at random, even if the length of the bitstring approaches infinity. And this is not true for bigints, at least not with the v8 implementation.

I'm not really connected to node development, but what's the preference here over implementing an external library?

I am not sure what you mean, sorry. Could you try to rephrase it?

tniessen on 3 Oct 2018

From a (theoretical) cryptographer's point of view, an attacker should not be able to decide whether the first or last bit of a bitstring differs from the expected value with a probability that is significantly better than guessing at random, even if the length of the bitstring approaches infinity. And this is not true for bigints, at least not with the v8 implementation.

Sure. My point is just that the differences are even more subtle with BigInts, that's all.
I would make a point that this is also not true for Buffers, you have to use crypto.timingSafeEqual.

I am not sure what you mean, sorry. Could you try to rephrase it?

I mean that you brought up the question of API bloat. Now that N-API is maturing, I wonder what cutoff is for having functionality as a feature within Node, or just implemented as an addon. After all, most of crypto is just a wrapper around openSSL, and it's possible to build Node without crypto support in the first place!

And back to the previous question:

Let's leave performane aside for a moment: Which crypto operations are commonly used to actually produce bigints?

Digests and signatures seem to be pretty clear candidates here. So I would suggest having Hash, Hmac and Sign output bigints, and Verify take bigints as input. And provide a .timingSafeEqual for bigints.

no2chem on 3 Oct 2018

I mean that you brought up the question of API bloat. Now that N-API is maturing, I wonder what cutoff is for having functionality as a feature within Node, or just implemented as an addon. After all, most of crypto is just a wrapper around openSSL, and it's possible to build Node without crypto support in the first place!

You can definitely implement that as a separate module if you want to, it should be possible using the N-API BigInt API.

tniessen on 3 Oct 2018

Which crypto operations are commonly used to actually produce bigints?

Anything that produces a hash, key or ciphertext? We currently output strings or buffers but they can easily be restated as bigints. Whether that's useful is another question; I'm undecided.

Some of tls.TLSSocket#getPeerCertificate()'s properties are really just numbers even though we encode them as strings or buffers, e.g., .pubkey, .fingerprint, .serialNumber, etc.

There are probably more examples.

will most likely introduce side-channel vulnerabilities since comparing bigints is not timing-safe

Good point. It should be possible to teach crypto.timingSafeEqual() about bigints. We can unbox them to bitstrings with v8::BigInt::ToWordsArray() and then treat them like we do buffers.

Open question: are other relational operators besides equality needed?

Interesting side question: can two different bitstrings encode the same number or is there a canonical encoding? They're trimmed of leading zeroes, so... yes? Needs more research.

bnoordhuis on 8 Oct 2018

❤1 👍1

@bnoordhuis I agree, the only thing I'm unsure about is the usefulness. Most outputs will be transmitted or stored in some way, at which point most of the performance benefits are gone. It sounds reasonable to accept bigints for certain parameters though.

tniessen on 9 Oct 2018

While some crypto algorithms operate on large numbers, this is mostly opaque to users. Some exceptions exist, the RSA exponent is one. Treating outputs as numbers can also have surprising results, such as when SHA hashes are used for cert serial numbers, and some serial numbers end up being negative.

I'd want to see the performance argued in terms of end-to-end. Having a block cipher return a big int quickly, just so the app can perform an extra step to convert it to a buffer to stream it to a file or network socket might not show any perf improvement, or even be net negative.

sam-github on 15 Oct 2018

can two different bitstrings encode the same number or is there a canonical encoding? They're trimmed of leading zeroes, so... yes?

Leading ones are an issue, too, the numbers may or may not be negative. And there are endianness issues. A buffer's mapping to a big int can't be known without having some extra information.

sam-github on 15 Oct 2018

I'd want to see the performance argued in terms of end-to-end. Having a block cipher return a big int quickly, just so the app can perform an extra step to convert it to a buffer to stream it to a file or network socket might not show any perf improvement, or even be net negative.

Exactly!

tniessen on 15 Oct 2018

@sam-github Bigints are mainly useful in the crypto context in that comparisons are much more efficient than buffer compares. I'd expect that to improve as the bigint support in V8 improves, for example AVX-512 instructions could be used to do compares instead of polluting the processor cache. So if your workload is primarily checking hashes, bigints would result in a big win. In our workloads which do lots of in-memory computations and comparisons of hashes (a Merkle tree) this really helps!

Also, I don't understand your signedness argument. If the output is an unsigned bigint, I'm not sure where the signedness maters. What does matter is the size in bits of the output, which is not included with a bigint. To @bnoordhuis question - two bigints could encode the same number if they were from Buffers with different lengths. Let's use CRC16 and CRC32 as an example: CRC16 could produce 0x1234n, and crc32 could produce 0x00001234n. Buffer.compare() would have treated these as NOT equal, but these bigints are strictly equal:

> 0x1234n === 0x00001234n
true

I don't view this as a problem for digests (As only digests from the same algorithm should be compared), but for other things such as a fingerprints or keys, they could be a problem. Basically, a bigint doesn't store (or have) a maximal number of bits.

Finally,

Having a block cipher return a big int quickly, just so the app can perform an extra step to convert it to a buffer ...

I strongly agree here - I would argue that it'd be important to keep both the buffer and bigint APIs and allow the developer to choose which one to use. If you're going to output a hash (or receive a hash) over a network socket, then probably you want Buffers. If you're doing a bunch of local operations (like computing and comparing hashes locally), then bigints might be much more efficient.

no2chem on 15 Oct 2018

Have you tried converting the Buffers to bigint? If you do lots of comparisons, that might help your particular use case.

The mapping from a SHA-1 digest to an unsigned integer is not clear. Endianness is unknown. I guess the endianness doesn't matter for your case, because even though you would get a big int, you aren't going to do arithmetic with that number, its only important that the mapping be unique.

I've seen a lot of crypto APIs, but never one that modelled digest output as numbers. But maybe they never had the advantage of being in a language with arbitrary sized numbers.

sam-github on 15 Oct 2018

@sam-github I have: https://github.com/no2chem/bigint-hash#performance has some numbers. Converting the buffers to bigint has two overheads: First, the crypto binding needs to allocate a buffer. This is pretty expensive as I think (not up to date on how Buffer is implemented) that allocation is done outside and managed outside of the v8 heap. Then you need to do the conversion, which is small (about 100ns, so 2% of the time). In my experiments, generating a bigint seems to be 500ns faster than generating a Buffer, at least via N-API. I speculate that this is because bigints are managed by V8, and perhaps they have an efficient allocator for bigints. Even if you ultimately have to transform that bigint into a buffer (to write to a network packet, for example), that 500ns might be sufficient to make returning a bigint worth it: likely the network packet will have to be assembled anyway, so you'll have to allocate a new buffer (where most of the performance cost is), and insert a bunch of constants into it.

For example, a network packet which "sends a hash" might consist of [request no, client id, hash]. To assemble such a packet, I would allocate (with Buffer.alloc()), and then serialize the request no and client id into it which might be Numbers. Then if .hash returns a buffer, I will pay that extra +500ns to hash into a buffer, and call Buffer.copy() into the packet. -but- if .hash returns a bigint, I would serialize into the Buffer (which could be done with a lot of shifts), If the cost of serializing the bigint was similar to the Buffer.copy(), then we would save the 500ns on the allocation of the buffer for the hash. And in our tests (a merkle tree), this savings is significant.

As long as the endianness for comparisons is consistent, it doesn't matter for my case, although I could imagine an application which requires taking a digest and adding a constant to it to generate a signing key, for example. So --- you have a good point, perhaps having the choice of the digest interpreted as an unsigned LE or BE makes sense.

With respect to crypto APIs not returning numbers, you are right here. What I'm trying to approximate here is a "register". But this conversation has led me to think that perhaps it's too early in the stages of the proposal to actually have this as a part of Node. It seems that what we actually need is not a BigInt, but a FixedInt. That might be worth taking up with TC39.

no2chem on 16 Oct 2018

What's the next step here? Implement as an external module? Implement a proof-of-concept in Node.js core and open a pull request? Gather more performance metrics? Close this? Something else?

Trott on 13 Nov 2018

@Trott The current proposal, as far as I remember it, doesn't have too many use cases for anything but hashes, and even there only to some extent. Based on the information so far, I would prefer not to include it in core.

tniessen on 13 Nov 2018

I ran into one at least somewhat valid use case the other day: generating random bigints. It's not that you can't do it right now but it's inconvenient and the naive approach is really slow.

I'll probably publish a module later this week that takes the sting out but it might also be a good candidate for inclusion in core.

I can see three ways of skinning this cat:

Specific: add a crypto.randomBigInt() function, or
Specific: a { bigint: true } option to crypto.randomBytes(), or
Generic: add a Buffer.prototype.toBigInt() method - but then you also need to think about about sign and endianness.

1 has the benefit that it could take the size in bits rather than bytes. That avoids a whole class of biased random number bugs.

2 has the benefit that it doesn't introduce a new API method (minor benefit, but I wanted to say something nice about it.)

3 that it's useful beyond random numbers. A Buffer.from() that takes a BigInt would also be a logical complement.

bnoordhuis on 10 Dec 2018

FWIW, I think 3 seems to be a really good option, as you pointed out:

it's useful beyond random numbers

Why not prefer a generalized solution over a more specific one? Not only will it solve a lot of similar BigInt vs Buffer use-cases, but it'd be trivial for userland developers to use the resultant methods alongside standard Buffer APIs to implement 1 or 2, right?

ryzokuken on 11 Dec 2018

I agree (and I admit to phrasing it as a leading question to some extent :-)) but a crypto.randomBigInt(bits) method will still be useful to avoid the aforementioned bias bugs.

bnoordhuis on 11 Dec 2018

Hmm... I'd generally prefer the crypto.randomBitInt(bits) variant over adding a new method to all Buffers.

jasnell on 11 Dec 2018

At the very least, crypto.randomBitInt(bits) sounds like a much simpler to pull off logistically (I'd believe there's lot less scope to get it right as compared to Buffer.p.toBigInt()).

Perhaps we could expose crypto.randomBitInt right now and once Buffer.p.toBigInt() to sth like that comes along later, change it to to use the latter internally?

ryzokuken on 11 Dec 2018

I'm thinking that a userland module to convert a buffer (or any TypedArray) to a bigint would be more useful

jasnell on 11 Dec 2018

+1 for crypto.randomBigInt(). People shouldn't confuse cryptographic randomness with other sources of randomness, adn this will make it clear.

tniessen on 11 Dec 2018