I analyzed the performance of a native module that was converted to N-API using a profiling tool, and compared the results to the original module that used V8 APIs. While overall the overhead of N-API is fairly minimal already, I did manage to identify 3 potential optimizations. Each of these can substantially reduce the impact of N-API in certain scenarios. I have tested an early version of these fixes already to confirm that, but I wanted to give a chance for discussion before I submit a PR.
N-API currently offers only one way to create a number value:
napi_status napi_create_number(napi_env env, double value, napi_value* result);
V8 has an optimized representation for 32-bit integer values, but because the value provided to N-API is always a double it always calls v8::Number::New() (never v8::Integer::New), so it does not create the optimal integer representation. Therefore these integer values are slower to create and slower to work with than they could be.
Instead of a single napi_create_number() API, there should probably be one for each of: int32_t, uint32_t, int64_t, double. Note there are already napi_get_value_*() functions for each of those 4 types, so having the same 4 napi_create_*() variants is more natural anyway.
The N-API functions that get integer values do some work to get a v8::Context that is never actually used. The profiler data showed that the call to v8::Isolate::GetCurrentContext() is actually somewhat expensive. (And it is apparently _not_ optimized out by the compiler.)
The implementation of napi_get_value_int32() includes this code:
RETURN_STATUS_IF_FALSE(env, val->IsNumber(), napi_number_expected);
v8::Local<v8::Context> context = isolate->GetCurrentContext();
*result = val->Int32Value(context).FromJust();
But v8::Value::Int32Value() does not use the context argument when the value is a number type (a condition that was already checked above):
Maybe<int32_t> Value::Int32Value(Local<Context> context) const {
auto obj = Utils::OpenHandle(this);
if (obj->IsNumber()) return Just(NumberToInt32(*obj));
PREPARE_FOR_EXECUTION_PRIMITIVE(context, Object, Int32Value, int32_t);
i::Handle<i::Object> num;
has_pending_exception = !i::Object::ToInt32(isolate, obj).ToHandle(&num);
RETURN_ON_FAILED_EXECUTION_PRIMITIVE(int32_t);
return Just(num->IsSmi() ? i::Smi::cast(*num)->value()
: static_cast<int32_t>(num->Number()));
}
I can think of two ways to make this faster:
v8::Value::Int32Value() overload that does not take a context (and does not return a maybe). The problem is it is marked as "to be deprecated soon".v8::Local<v8::Context> value to v8::Value::Int32Value(). This relies on the internal implementation detail that it does not use the context when the value is a number type. But in practice it should be safe, and will be easily caught by tests in the unlikely event V8 ever changes that API behavior.I also considred caching the v8::Context in the napi_env structure, but that probably isn't valid because APIs can be called from different context scopes.
V8 handle scopes are normally stack-allocated. But the current N-API implementation puts them on the heap, which means every entry/exit of a scope involves expensive new and delete operations.
I can think of two ways to make this faster:
napi_env. Track which ones are used/freed, and allocate new handle scopes on the heap only if the pre-allocated ones are all in use.1) SGTM.
2) Indeed, Isolate::GetCurrentContext() is not defined inline in the header file, so it's not possible for the compiler to optimize it out w/o LTO. I guess passing an empty context is the best way to go here.
3) > Pre-allocate memory for some small fixed number of handle scopes (maybe only 1?), attached to the napi_env.
This seems to be the way to go. A handle scope isn't that big memory-wise anyway so a couple more won't hurt either. Though, given v8::HandleScope only has an explicit constructor with an overloaded operator new(size_t size) that calls ABORT(), how do you plan to implement this?
given v8::HandleScope only has an explicit constructor with an overloaded operator new(size_t size) that calls ABORT(), how do you plan to implement this?
N-API already uses a wrapper around v8::HandleScope to enable it to be allocated on the heap. So I think we can just call operator new(size_t, void*) on that wrapper class, and that will construct the HandleScope member.
V8 has an optimized representation for 32-bit integer values, but because the value provided to N-API is always a double it always calls
v8::Number::New()(neverv8::Integer::New), so it does not create the optimal integer representation. Therefore these integer values are slower to create and slower to work with than they could be.
Reading the V8 source code, that doesn鈥檛 seem to be true: https://github.com/nodejs/node/blob/7fdcb68dc3db6e2765f06ad3e43af34c51a6bb70/deps/v8/src/factory.cc#L1332-L1337
Regarding 2: It鈥檚 probably easiest to just add
if (val->IsInt32()) {
*result = val.As<Int32>()->Value();
return napi_clear_last_error(env);
}
even before the val->IsNumber() check. I鈥檒l open a PR shortly.
Yes, you're right v8::Number::New() does eventually create the optimized integer representation. But it's slower than v8::Integer::New().
Somehow I had overlooked v8::Int32::Value() -- thanks!
Two more...
v8::TryCatchAll of the napi_create_*() functions use the NAPI_PREAMBLE macro that sets up a v8::TryCatch at the function scope. However these functions have no possibility of executing JavaScript code so as far as I know there is no reason to pay the cost of the TryCatch. The same is true for napi_wrap() and napi_get_buffer_info() functions. Errors in non-JavaScript V8 API calls are reported either via the fatal error callback (node::OnFatalError()) or as empty Maybe return values, never via JavaScript exceptions catchable with a TryCatch.
The N-API function callback adapters wrap pointers in v8::External values for use with SetInternalField() / GetInternalField(). But it's more efficient to omit the v8::External wrapper and use SetAlignedPointerInInternalField() / GetAlignedPointerFromInternalField() instead. (I think we can assume these pointers will be aligned... anyway there is a check that reports a fatal error if they are not.)
I believe we have addressed all points except 3).
There's not much more we can do, and we've covered most of these points. Closing.
Was 4. Over-use of v8::TryCatch addressed for napi_get_buffer_info?
@jorangreef looks like NAPI_PREAMBLE() (which uses the TryCatch) has now been reduced to CHECK_ENV() which does not 馃憤
@TimothyGu
Indeed,
Isolate::GetCurrentContext()is not defined inline in the header file, so it's not possible for the compiler to optimize it out w/o LTO.
For GCC and Clang, applying __attribute__((pure)) (or [[gnu::pure]]) to the declaration would allow the compiler to optimize away calls to this function if its return value is never used.
Isolate::GetCurrentContext() isn't pure, it has side effects - it creates a new Local<Context> in the active HandleScope.