v14.12.0Darwin Simens-MacBook-Pro.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64vmSee https://github.com/SimenB/vm-script-vs-compile-function and run node index.js.
It's just yarn init -2 to fetch the bundled yarn v2 source code (to have a large JS file), then loading that into new Script and compileFunction 100 times and seeing how long it takes.
Running this prints the following on my machine (MBP 2018):
Running with Script took 50 ms
Running with compileFunction took 4041 ms
Every time
It should be comparably fast to pass source code into compileFunction as it is to pass it into new Script.
About 80x worse performance. It gets worse the larger the loop is, so I'm assuming it's "simply" some cached data so new Script avoids the hit of parsing the same code multiple times, as it's fairly constant regardless of loop iterations
This might be the same as #26229.
As mentioned in https://github.com/nodejs/node/issues/26229#issuecomment-674385686 we see a huge performance improvement in Jest if we switch from compileFunction to the "old" new Script approach. As also mentioned there, this might be "fixed" (or at least possible to solve in userland) by the seemingly stalled #24069?
I added a quick and dirty produceCachedData, and it went down to about 180 ms. Which is a great improvement, albeit still 5x slower than new Script. And harder to manage (I'd prefer the implicit caching of the new Script APIs)
cc @nodejs/v8, this might not be something fixable on node's side.
This is about v8::ScriptCompiler::CompileFunctionInContext() not using the code cache, right?
I don't think that's V8's issue to fix. V8 can't know whether it's safe to use the cache because the argument names and context extensions affect how the function is parsed.
Not that it's easy for Node.js. options.contextExtensions is a free-form list of mutable objects. Hard to cache on that unless object identity is good enough - but it probably isn't.
What Node.js (or perhaps V8) could optimize for is the presumably common case of no context extensions. TBD how to cache without introducing memory leaks.
In Jest's case we'll be passing a different parsingContext all the time (Jest uses a different Context per test file for sandboxing), so if the cached data is dependent on the Context we'd probably still get a painful performance regression vs using vm.Script. If it's not possible to get caching based on the source code string (and the arguments) alone, we'll probably have to drop usage of compileFunction and stay on vm.Script.
Using the args array to vary the cache is no issue for our use case, though - 99% of the time they'll be the same, and for the cases it's not I guess it'd just create some more? Same as we have today with the manual function(some, args) {USER_SOURCE_CODE_HERE} wrapper we add which changes its source code via the arguments we pass in the manual function wrapper.
(That said, improving performance for compileFunction in the case of not passing parsingContext seems like a good thing even if it wouldn't be enough for Jest's use case 馃檪)
If it's not possible to get caching based on the source code string (and the arguments) alone
I don't think that can work so you're probably better off sticking with vm.Script, yes. Argument names aren't insurmountable but context extensions are.
When you pass in context extensions, what's evaluated looks like this in pseudo code:
// assume a = { y: 42 } and b = { z: 1337 }
with (a)
with (b)
function f(x) {
return x * y * z
}
I say "pseudo code" but in fact it's really close to the runtime representation.
@bnoordhuis reading your comment more closely, I see you're talking about contextExtensions. We don't use that, we use parsingContext - does that have the same limitations you think?
I think this issue should be upstreamed.
Most helpful comment
I think this issue should be upstreamed.