Node: `vm.compileFunction(source)` is much slower than `new vm.Script(source)`

Created on 27 Sep 2020 · 7Comments · Source: nodejs/node

Version: v14.12.0
Platform: Darwin Simens-MacBook-Pro.local 19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64
Subsystem: vm

What steps will reproduce the bug?

See https://github.com/SimenB/vm-script-vs-compile-function and run node index.js.

It's just yarn init -2 to fetch the bundled yarn v2 source code (to have a large JS file), then loading that into new Script and compileFunction 100 times and seeing how long it takes.

Running this prints the following on my machine (MBP 2018):

Running with Script took 50 ms
Running with compileFunction took 4041 ms

How often does it reproduce? Is there a required condition?

Every time

What is the expected behavior?

It should be comparably fast to pass source code into compileFunction as it is to pass it into new Script.

What do you see instead?

About 80x worse performance. It gets worse the larger the loop is, so I'm assuming it's "simply" some cached data so new Script avoids the hit of parsing the same code multiple times, as it's fairly constant regardless of loop iterations

Additional information

This might be the same as #26229.

As mentioned in https://github.com/nodejs/node/issues/26229#issuecomment-674385686 we see a huge performance improvement in Jest if we switch from compileFunction to the "old" new Script approach. As also mentioned there, this might be "fixed" (or at least possible to solve in userland) by the seemingly stalled #24069?

performance vm

Source

SimenB

👍15 👀6

Most helpful comment

I think this issue should be upstreamed.

devsnek on 30 Sep 2020

👍2

All 7 comments

I added a quick and dirty produceCachedData, and it went down to about 180 ms. Which is a great improvement, albeit still 5x slower than new Script. And harder to manage (I'd prefer the implicit caching of the new Script APIs)

SimenB on 28 Sep 2020

👍2

cc @nodejs/v8, this might not be something fixable on node's side.

devsnek on 28 Sep 2020

This is about v8::ScriptCompiler::CompileFunctionInContext() not using the code cache, right?

I don't think that's V8's issue to fix. V8 can't know whether it's safe to use the cache because the argument names and context extensions affect how the function is parsed.

Not that it's easy for Node.js. options.contextExtensions is a free-form list of mutable objects. Hard to cache on that unless object identity is good enough - but it probably isn't.

What Node.js (or perhaps V8) could optimize for is the presumably common case of no context extensions. TBD how to cache without introducing memory leaks.

bnoordhuis on 29 Sep 2020

In Jest's case we'll be passing a different parsingContext all the time (Jest uses a different Context per test file for sandboxing), so if the cached data is dependent on the Context we'd probably still get a painful performance regression vs using vm.Script. If it's not possible to get caching based on the source code string (and the arguments) alone, we'll probably have to drop usage of compileFunction and stay on vm.Script.

Using the args array to vary the cache is no issue for our use case, though - 99% of the time they'll be the same, and for the cases it's not I guess it'd just create some more? Same as we have today with the manual function(some, args) {USER_SOURCE_CODE_HERE} wrapper we add which changes its source code via the arguments we pass in the manual function wrapper.

(That said, improving performance for compileFunction in the case of not passing parsingContext seems like a good thing even if it wouldn't be enough for Jest's use case 🙂)

SimenB on 29 Sep 2020

If it's not possible to get caching based on the source code string (and the arguments) alone

I don't think that can work so you're probably better off sticking with vm.Script, yes. Argument names aren't insurmountable but context extensions are.

When you pass in context extensions, what's evaluated looks like this in pseudo code:

// assume a = { y: 42 } and b = { z: 1337 }
with (a)
  with (b)
    function f(x) {
      return x * y * z
    }

I say "pseudo code" but in fact it's really close to the runtime representation.

bnoordhuis on 29 Sep 2020

👍2

@bnoordhuis reading your comment more closely, I see you're talking about contextExtensions. We don't use that, we use parsingContext - does that have the same limitations you think?

SimenB on 30 Sep 2020

I think this issue should be upstreamed.

devsnek on 30 Sep 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings