I noticed src/Bundle.js:139 uses md5: https://github.com/parcel-bundler/parcel/blob/a1a190bb19a2c625ad3af9f4072fbbc8f342f746/src/Bundle.js#L139
It would be good to switch this from 'md5' to 'sha256' for now, and ideally switch to SHA-512/256 or SHA-3 later.
(Actually, MD5 is used in a few places.)
I don鈥檛 think that SHA-256 (or any SHA hashing algorithm) makes sense for what we鈥檙e doing. We鈥檙e not using hashing for anything that needs to be cryptographically secure, so using the Secure Hashing Algorithm, is overkill.
Especially since we鈥檙e going for performance, I think the inherent performance penalty of using a cryptographic hashing algorithm, is unnecessary for what we are using the hashes for.
I think we should use xxHash, which would give us a huge speed-up in our hash generation time. While it鈥檚 not a cryptographic algorithm, so the odds of a hash collision are not gaurenteed, since we don鈥檛 use the hash for anything cryptographic we don鈥檛 really care.
What do you guys think?
Given I / we just did this, @davidnagli is correct, there is no need to use a crypto hash & it has an adverse effect on performance.
https://github.com/d3viant0ne/hash-perf
xxHash has plenty of different implementation options & is significantly faster ( in a non-js form ) than using a crypto hash.
If your intent is to use pure javascript / node, MD4 is your fastest bet.
If cryptographic strength isn't needed, there's probably no reason to switch from md5. It's built in, so anything else would be an extra dependency.
That said, if performance is a consideration, here are some hard numbers, based on https://medium.com/@drainingsun/in-search-of-a-good-node-js-hashing-algorithm-8052b6923a3b
https://gist.github.com/shawwn/a2e51592ff9d97bfb06f1f14b5396f21
farmhash-hash64 seems to be the winner for web-sized assets (~250k), although note that 64 bit hashes increase collision probability over large datasets. (See the article above.)
@shawwn Did you see the link that d3viant0ne posted?
Was busy running benchmarks. :)
Nice link! I'll save that for future reference. I wish it had compared sha256 too.
It's usually a better idea to let profilers tell us where bottlenecks are. I'll close the issue for now. Feel free to re-open it if it's valid.
@d3viant0ne Is there a reason that you specified MD4 rather than MD5??
@davidnagli - If the desire is to use crypto & not add a library with or without bindings, MD4 is the most performant in general.
Something built with C++ is obviously going to be significantly faster, we (Webpack) decided web assembly was a nice happy medium between performance & usability given you can use it natively in node 8.
Regardless of what algorithm you use, leveraging anything in crypto('whatever') is going to be faster than a JavaScript implementation of a non-crypto safe algorithm.
@shawwn - All the different Crypto hash's have been performance tested to death, see this repo https://github.com/hex7c0/nodejs-hash-performance.
MD4/5 are on average ever so slightly faster than sha1 & all three are faster than sha256 which is to be expected.
That repo tested the algorithms we were considering but in the end, the low level implementations offer more performance than you are realistically going to use bundling even an extremely large web application. We opted for more performance while simplifying usability by going with the web assembly implementation of the xxHash algorithm.