Is your feature request related to a problem? Please describe.
Node takes around 60ms to start on a modern Skylake Server, processing ~230M instructions to execute a mostly empty function (see below). From looking at the startup timeline view (using trace-events). During this time, it loads 56 JavaScript core library files and compiles them. Much of the startup time is due to that.
"events.js"
"internal/trace_events_async_hooks.js"
"async_hooks.js"
"internal/errors.js"
"internal/validators.js"
"internal/async_hooks.js"
"internal/safe_globals.js"
"util.js"
"internal/util/inspect.js"
"internal/util.js"
"internal/util/types.js"
"internal/encoding.js"
"buffer.js"
"internal/buffer.js"
"internal/process/per_thread.js"
"internal/process/main_thread_only.js"
"internal/process/stdio.js"
"assert.js"
"internal/assert.js"
"fs.js"
"path.js"
"internal/constants.js"
"internal/fs/utils.js"
"internal/url.js"
"internal/querystring.js"
"internal/process/warning.js"
"internal/process/next_tick.js"
"internal/process/promises.js"
"internal/fixed_queue.js"
"internal/inspector_async_hook.js"
"timers.js"
"internal/linkedlist.js"
"internal/priority_queue.js"
"internal/timers.js"
"internal/modules/cjs/loader.js"
"vm.js"
"url.js"
"internal/modules/cjs/helpers.js"
"console.js"
"tty.js"
"net.js"
"stream.js"
"internal/streams/pipeline.js"
"internal/streams/end-of-stream.js"
"internal/streams/legacy.js"
"_stream_readable.js"
"internal/streams/buffer_list.js"
"internal/streams/destroy.js"
"internal/streams/state.js"
"_stream_writable.js"
"_stream_duplex.js"
"_stream_transform.js"
"_stream_passthrough.js"
"internal/net.js"
"internal/stream_base_commons.js"
"internal/tty.js"
The test code.
```
'use strict';
module.exports.empty = (event, context) => {
let start = new Date().getTime();
// nothing here
const response = {
statusCode: 200,
headers: {
'Access-Control-Allow-Origin': '*', // Required for CORS support to work
},
body: JSON.stringify({
startTimestamp: start,
endTimestamp: new Date().getTime()
}),
};
};
const handler = require('./handler');
handler.empty(null, null,function(err, results) {
console.log(results);
});
````
The timeline view of Node Startup for the above empty program. The green parts are the compilation. There are 56 JavaScript files that are compiled.
Describe the solution you'd like
The compilation for these core libraries can be done ahead of time and cached and the entire generated code snapshot could be loaded at one shot. Multiple copies of node on the same machine could share it.
There is also some v8 GC activity that could be postponed to 100ms or later.
It would be great to get the startup to 10-15 ms in the default.
@joyeecheung @hashseed @jasnell @mcollina @addaleax
Describe alternatives you've considered
Using the codecache feature the compilation gets reduced. But each of the library compiled code gets loaded one at a time. Notice the green has reduced but each of the classes are still loaded one by one.
FaaS providers are also starting to keep pre-warmed instances
perf stat -e cycles,instructions
聽 | Cycles | Instructions
-- | -- | --
empty (default) | 170,637,353 | 230,444,605
empty (codecache) | 101,290,689 | 114,769,349
Duplicate of https://github.com/nodejs/node/issues/17058?
Yup. Ultimately you want to use startup snapshot instead of code cache. That way you also won't run into GCs.
This is currently blocked by refactoring startup into a static and a dynamic (depending on environment and command line args) part.
This is currently blocked by refactoring startup into a static and a dynamic (depending on environment and command line args) part.
I believe I have done most of this part, another part left would be a more organized way to bookkeep the external reference registration. We could um, copy-paste whatever external references created by existing module initializers run during bootstrap for now if we just want to do an initial implementation, but as people keep adding new FunctionTemplates etc., it's not going to be very maintainable (and if it simply crashes when the references are not properly registered without proper hints, it would be quite confusing to contributors who are not familiar with the snapshot process).
Anyway I am working on a new iteration of https://github.com/nodejs/node/issues/17058 and so far it's safe to snapshot before pre-execution. I should be able to have a valid POC soon, the bookkeeping part may or may not block it but that would be much easier to tackle than the bootstrap refactoring.
There might be some short-term low-hanging fruit to reap. There are a lot of intertwined dependencies that are probably not strictly necessary.
An instrumented node -e ''
prints this for me. Seems kind of excessive for doing nothing.
LookupAndCompile internal/per_context/setup.js
LookupAndCompile internal/per_context/domexception.js
LookupAndCompile internal/bootstrap/primordials.js
LookupAndCompile internal/bootstrap/loaders.js
LookupAndCompile internal/bootstrap/node.js
LookupAndCompile internal/util.js
LookupAndCompile internal/errors.js
LookupAndCompile events.js
LookupAndCompile buffer.js
LookupAndCompile internal/util/types.js
LookupAndCompile internal/util/inspect.js
LookupAndCompile internal/assert.js
LookupAndCompile internal/validators.js
LookupAndCompile internal/buffer.js
LookupAndCompile internal/process/per_thread.js
LookupAndCompile internal/process/main_thread_only.js
LookupAndCompile internal/process/stdio.js
LookupAndCompile internal/async_hooks.js
LookupAndCompile internal/process/task_queues.js
LookupAndCompile internal/process/promises.js
LookupAndCompile internal/fixed_queue.js
LookupAndCompile internal/console/global.js
LookupAndCompile internal/console/constructor.js
LookupAndCompile internal/util/inspector.js
LookupAndCompile internal/url.js
LookupAndCompile internal/querystring.js
LookupAndCompile internal/constants.js
LookupAndCompile path.js
LookupAndCompile internal/encoding.js
LookupAndCompile timers.js
LookupAndCompile internal/linkedlist.js
LookupAndCompile internal/timers.js
LookupAndCompile internal/priority_queue.js
LookupAndCompile internal/process/execution.js
LookupAndCompile internal/process/warning.js
LookupAndCompile internal/main/eval_string.js
LookupAndCompile internal/bootstrap/pre_execution.js
LookupAndCompile internal/options.js
LookupAndCompile internal/modules/cjs/helpers.js
LookupAndCompile url.js
LookupAndCompile internal/idna.js
LookupAndCompile internal/inspector_async_hook.js
LookupAndCompile internal/util/debuglog.js
LookupAndCompile internal/modules/cjs/loader.js
LookupAndCompile vm.js
LookupAndCompile util.js
LookupAndCompile fs.js
LookupAndCompile internal/fs/utils.js
CompileFunction [eval]-wrapper
diff --git a/src/node_contextify.cc b/src/node_contextify.cc
index 087727d93a..1266a3d338 100644
--- a/src/node_contextify.cc
+++ b/src/node_contextify.cc
@@ -985,6 +985,10 @@ void ContextifyContext::CompileFunction(
// Argument 2: filename
CHECK(args[1]->IsString());
Local<String> filename = args[1].As<String>();
+ {
+ Utf8Value filename_s(args.GetIsolate(), filename);
+ printf("%s %s\n", __func__, *filename_s);
+ }
// Argument 3: line offset
CHECK(args[2]->IsNumber());
diff --git a/src/node_native_module.cc b/src/node_native_module.cc
index 08c0ab16e3..62e332574b 100644
--- a/src/node_native_module.cc
+++ b/src/node_native_module.cc
@@ -299,6 +299,7 @@ MaybeLocal<Function> NativeModuleLoader::LookupAndCompile(
Local<String> source = source_it->second.ToStringChecked(isolate);
std::string filename_s = id + std::string(".js");
+ printf("%s %s\n", __func__, filename_s.c_str());
Local<String> filename =
OneByteString(isolate, filename_s.c_str(), filename_s.size());
Local<Integer> line_offset = Integer::New(isolate, 0);
@bnoordhuis BTW you can get a more complete list (including the bindings loaded) with process._rawDebug(process.moduleLoadList.join('\n'))
.
A lot of these dependencies come from the fact that we set the browser globals by default. We could make them load lazily before snapshot is implemented and shipped, but after snapshot is in core we would want them to be included in the snapshot - I believe I've managed to clean up most of them so that they are now side-effect free and can be included in the snapshot.
Not everyday you get to see a 2X improvement for node -e {}
Nice work @joyeecheung and other collaborators.
聽 | nodejs v10.13.0 | nodejs master | nodejs v10.13.0/master
-- | -- | -- | --
cycles | 149,778,245 | 77,046,107 | 1.9
instructions | 210,776,025 | 91,695,996 | 2.3
We will analyze this some more. Let us know where we can help.
The new trace for empty shows it is down to 21ms. Very impressive. @hashseed there is a 1ms of GC Scavenge around 15 ms. Any suggestions?
There is also a bit of sparse script_parse_and_compile bits of green. Is that loading the cached code? Would be nice to batch it and cut some more.
FYI... Enabling the node.vm trace event category will cause the trace event file to include the file names of the core modules so you can see exactly how long each one adds during bootstrap. Enabling the node.bootstrap category will include milestone events so you can see exactly how long between various bootstrap phases.
Thanks @jasnell . Yes I did trace it to include those categories, but the file names are not being included. Maybe I should report that seperately. Let me know if it works for you correctly.
~/git/node-master/node.latest --trace-event-categories v8,node,node.asy
nc_hooks,node.bootstrap,node.vm.script --trace-event-file-pattern '${pid}-${rotatio
n}.log' -e {}
It was hard to see the bootStrapComplete in the Chrome view. I looked at the logfile after it was processed with
jq '.traceEvents' 104695-1.log > 104695.processed.log
BootStrap is down to 17ms.
start ts | bootstrapComplete ts | bootstrap
-- | -- | --
3,204,644,094,501 | 3,204,644,111,624 | 17,123
Hmm.. will have to test in a bit that it still works as expected.
Other events for bootStrap. If I am reading this right, 1ms for nodeStart and 4ms for v8Start. We have to account for the other 12ms
nodeStart | v8Start | Environment | v8Start - nodeStart | Environment-v8Start
-- | -- | -- | -- | --
3,204,644,091,814 | 3,204,644,092,877 | 3,204,644,097,060 | 1,063 | 4,183
Do I understand correctly that these data points are from embedding code cache for internal modules?
In that case, a GC is quite understandable. Everytime we deserialize code for a module, there is going to be a quite sizeable allocation.
If we are aiming for eventually using startup snapshot, then I don't think there is any point in tweaking GC to squeeze out that 1ms. Maybe you want to play around with --initial-old-space-size though.
BTW with https://github.com/nodejs/node/pull/27147 you can capture a CPU profile with more detailed call stacks. For instance when running node -p process.versions
(I've bumped the sampling interval to 10us to be more accurate)
As you can see in the screenshot, basically every startup mode consists of three phases (modulo per-context scripts that runs too early to profile at the moment):
internal/bootstrap/loader.js
- the first top anonymous function (blue) in the screenshotinternal/bootstrap/node.js
- the second top anonymous function (green)internal/main/*.js
- the third top anonymous function (pink)(Note that actually 1 and 2 can be merged but we still keep them separate for readability)
The execution of the per-context scripts as well as 1 and 2 above will be eliminated by the snapshot integration. Then we'll need to move as much as possible from 3 to 2 while preserving the invariant the there should be no environment-dependent code in 1 or 2.
As shown above the majority of time is spent on compiling native modules (with code cache in my branch) - this is not entirely eliminatable with the snapshot integration. At the moment to be safe we only load modules that are used during bootstrap on demand. We could load and cache more when building the snapshot, but we'll have to refactor and make sure that the modules loaded do not do anything environment-dependent (roughly means they need to be side-effect-free) during their first evaluation - this does not only apply to the JS code but also to the initializer of the bindings that they load. This issue is surprisingly common in the code base and it's a bit difficult to enforce the side-effect constraint at the moment (considering the native modules are all um...CJS)
The conversation here appears to have stalled out. Should this remain open?
closing as inactive for more than an year; pls re-open if there is still outstanding work
BTW this is what I currently get with --cpu-prof
It has improved after we shipped embedded snapshot for the bootstrap (step 1 and 2 are gone compared to https://github.com/nodejs/node/issues/27196#issuecomment-484355185), the remaining work is sketched out in https://docs.google.com/document/d/15bu038I36oILq5t4Qju1sS2nKudVB6NSGWz00oD48Q8/edit though the bigger goal is to enable user land snapshot for user applications :). Not sure how far we've gone compared to what we had in the OP (since I don't have the same machine to benchmark) but I am pretty sure we've improved a lot now with code cache and the snapshot and we've probably hit the goal of reducing the number to 10-15ms
BTW the new tracking issue for snapshot integration is https://github.com/nodejs/node/issues/35711
Most helpful comment
BTW the new tracking issue for snapshot integration is https://github.com/nodejs/node/issues/35711