Some background: I recently read an old article that, among other things, looked at performance in Node (6 at the time) for map/reduce vs using a for loop, and after seeing a comment by someone else about the recent performance changes in later version of V8, decided to see if there was any difference in the results in later version of Node. So I set out to try and reproduce their test and results, and started looking at newer versions of Node to compare, and ended up stumbling upon what appears to be a rather significant performance regression.
The test is fairly simple, take a list of 32 million random numbers, and calculate the sum of the squares. The full code for the test I ran can be found here: https://gist.github.com/kbjr/1a551ddb3b59e9685b04bb051daef44c
But the two basic samples are this:
const result = data
.map((x) => x ** 2)
.reduce((memo, x) => memo + x, 0);
let result = 0;
for (let i = 0; i < data.length; i++) {
result += data[i] ** 2;
}
The for loop version is faster, but that has always been the case and is not the point of this issue. Here are the results I got from running this test on 10.15.0
:
runMapReduce 12602.4ms 13285.9ms 12729.4ms
runForEach 2407.41ms 2358.85ms 2432.26ms
runForLoop 69.4525ms 47.4004ms 56.8418ms
runForLoopNoLength 66.4152ms 51.5009ms 55.3084ms
Which looks about the same as the original results in the article I had read, so I wasn't really surprised at the outcome. What did surprise me was when I ran the test again on 12.1.0
:
runMapReduce 126262ms 126607ms 129501ms
runForEach 1451.66ms 1457.29ms 1490.63ms
runForLoop 38.9766ms 45.9452ms 34.7133ms
runForLoopNoLength 38.9530ms 44.0065ms 35.1938ms
The other tests actually all got some small speed ups, but the map/reduce slowed down by basically a whole order of magnitude.
iirc fast paths for handling holey arrays (the one you create is holey) were removed while rewriting parts of v8 internals.
@nodejs/v8
iirc fast paths for handling holey arrays (the one you create is holey) were removed while rewriting parts of v8 internals.
By "holey", you mean "has gaps"? I don't think data
has gaps in the code provided:
let data = new Array(size);
for (let i = 0; i < size; i++) {
data[i] = rand();
}
Or am I misunderstanding?
@trott it's an internal array kind in V8 describing its memory layout. holey arrays operate more like dictionaries since they could be missing indices. afaik even though the array is eventually filled it remains in that initial type.
@devsnek If we assume that is accurate, does that mean that populating the array like
const data = [ ];
for (...) {
data.push(rand());
}
should remove this "holey" status and make the process fast again?
If that's the case, I would be happy to rerun the test like this to verify.
@kbjr that would indeed not be a holey array
So, I ran the test again like this, but the results are basically the same:
runMapReduce 126969ms 126892ms 126506ms
runForEach 1642.75ms 1642.28ms 1688.50ms
runForLoop 37.5993ms 42.6008ms 34.8042ms
runForLoopNoLength 36.9135ms 45.7981ms 34.7570ms
The code change:
// let data = new Array(size);
// for (let i = 0; i < size; i++) {
// data[i] = rand();
// }
let data = [ ];
for (let i = 0; i < size; i++) {
data.push(rand());
}
i guess that's not the issue then :)
Something contributing to this might be my rewrite of v8's exp operator which did make it a bit slower (and more accurate), but i doubt that would result in the enormous difference here.
What interests me the most is that there doesn't seem to be an impact on the forEach
test (in fact, its actually a bit faster).
If it was something to do with the way V8 reads the array (or how the exp operator works), I would expect to see a similar result in that test as well (or all of them, if it were the exp operator).
My gut instinct would be something to do with the map
function and the way it creates the new array, but that's just a hunch. I'm going to try breaking the test apart into a map
and a reduce
and see if one of those steps separately is indeed causing the majority of the slow down.
Okay, I ran the map
and reduce
separately with the following new tests:
let squares;
const runMap = () => runWithTimer(() => {
squares = data.map((x) => x ** 2);
});
const runReduce = () => runWithTimer(() => {
const result = squares.reduce((memo, x) => memo + x, 0);
});
And here are the results (only run once because I'm lazy and this takes a while):
runMap 126779ms
runReduce 862.096ms
So it is most definitely the map
call that is slow.
I'm going to try map
without the exponent, just to rule that out as well...
As expected, removing the exponent had basically no effect:
runMap 124786ms
runReduce 883.284ms
// squares = data.map((x) => x ** 2);
squares = data.map((x) => x);
So, it seems the real regression is as simple as, the map
function is very slow, even if you do literally nothing in the callback.
So, here's the most basic, reproducible case:
console.log('Creating test data');
const data = [ ];
for (let i = 0; i < 32000000; i++) {
data.push(Math.random() * 100);
}
console.log('Running test');
const start = Date.now();
data.map((x) => x);
console.log(`Done. Duration: ${Date.now() - start}ms`);
And the output:
```
位 nvm use 10.15
10.15.0
Now using node v10.15.0 (64-bit)
位 node .map-regression.js
Creating test data
Running test
Done. Duration: 6060ms
位 nvm use 12.3.1
Now using node v12.3.1 (64-bit)
位 node .map-regression.js
Creating test data
Running test
Done. Duration: 125899ms
weeee ok i read through the actual source and found that the fast path now only considers PACKED_SMI_ELEMENTS (arrays of 32 bit integers). Indeed if I use data.push(Math.floor(Math.random() * 100));
the time drops to around 500ms.
/cc @bmeurer
Can you file a bug at https://crbug.com/v8/new please?
Filed a bug: https://bugs.chromium.org/p/v8/issues/detail?id=9302
Thanks for this.
@devsnek you're indeed absolutely correct when you say that PACKED arrays can become HOLEY but not the other way 'round.
Basically, once a HOLEY array, always a HOLEY array. 馃槆
@bmeurer if it's fine by you, I'd love to help out with this.
Short status update: the regression likely comes from a change in V8's GC where we reduced the page size. That moved an odd performance cliff that existed before, in a way that this test case exposes. We are working on this.
Is there an update on this issue? We started adopting Node 12 now that it became LTS and one of our services was hit hard by this performance regression.
Would it make sense to start removing map
from our code bases or is there a chance that this is going to be fixed in a future 12 minor/patch release?
As this issue is open for a while, I fear its more the first option.
I would love to offer my help, but my C++ knowledge is rusty at best.
Has there been any update on this? Maybe I'm misunderstanding, but this seems like a massive regression, especially if it hits all map
calls.
Has there been any update on this? Maybe I'm misunderstanding, but this seems like a massive regression, especially if it hits all
map
calls.
Testing locally, it seems the bug still exists in Node.js 12.14.1 (which was released within the last 24 hours), but was fixed in Node.js 13.2.0.
13.2.0 updated V8 to 7.9. I believe 12.x is still running V8 7.7.
It's possible that the fix to this in 12.x would be to find the right patch to float on V8 7.7, but it may not apply easily/cleanly. /ping @nodejs/v8-update
Tested with v12.16.0 which ships with V8 7.8, but performance is still not on a node v10 level.
Using the script from https://github.com/nodejs/node/issues/27930#issuecomment-496353854
$ nvm use 10
Now using node v10.18.1 (npm v6.13.4)
$ node bench.js
Creating test data
Running test
Done. Duration: 4805ms
$ nvm use v12.15.0
Now using node v12.15.0 (npm v6.13.4)
$ node bench.js
Creating test data
Running test
Done. Duration: 73486ms
$ nvm use v12.16.0
Now using node v12.16.0 (npm v6.13.4)
$ node bench.js
Creating test data
Running test
Done. Duration: 73772ms
@targos sorry for the ping, but I couldn't find any meeting notes or other kind of documentation whether v8 7.9 is planned to be back-ported to v12? https://github.com/nodejs/node/pull/30020 has a backport-requested-v12.x
label, but no comments whether this is still planned.
Could you provide some insights?
There is no plan to backport V8 7.9 to v12.
There is no plan to backport V8 7.9 to v12.
Is cherry-picking the V8 fix onto the 12.x branch possible/likely?
There is no plan to backport V8 7.9 to v12.
Is cherry-picking the V8 fix onto the 12.x branch possible/likely?
Sure, but I'm not aware of which commit fixed it.
I鈥檓 not familiar with the v8 code base, but the issue you opened references https://chromium-review.googlesource.com/c/1529006 as the culprit and I guess we could go from the history after this or by reverting the change or is this thinking too simple?
Just an FYI, v8 has closed this issue as "won't fix"
Just an FYI, v8 has closed this issue as "won't fix"
If V8 is going to live with this, I think we will have to as well. Should this be closed?
I'm fine either way, but keeping it open might be better for discoverability.
Let's not keep it open, that wouldn't seem to serve much purpose at this point
Most helpful comment
Short status update: the regression likely comes from a change in V8's GC where we reduced the page size. That moved an odd performance cliff that existed before, in a way that this test case exposes. We are working on this.