Node: n-api: Potential N-API performance optimizations

Created on 20 Jul 2017  路  11Comments  路  Source: nodejs/node

I analyzed the performance of a native module that was converted to N-API using a profiling tool, and compared the results to the original module that used V8 APIs. While overall the overhead of N-API is fairly minimal already, I did manage to identify 3 potential optimizations. Each of these can substantially reduce the impact of N-API in certain scenarios. I have tested an early version of these fixes already to confirm that, but I wanted to give a chance for discussion before I submit a PR.

1. Creating integer values

N-API currently offers only one way to create a number value:

napi_status napi_create_number(napi_env env, double value, napi_value* result);

V8 has an optimized representation for 32-bit integer values, but because the value provided to N-API is always a double it always calls v8::Number::New() (never v8::Integer::New), so it does not create the optimal integer representation. Therefore these integer values are slower to create and slower to work with than they could be.

Instead of a single napi_create_number() API, there should probably be one for each of: int32_t, uint32_t, int64_t, double. Note there are already napi_get_value_*() functions for each of those 4 types, so having the same 4 napi_create_*() variants is more natural anyway.

2. Getting integer values

The N-API functions that get integer values do some work to get a v8::Context that is never actually used. The profiler data showed that the call to v8::Isolate::GetCurrentContext() is actually somewhat expensive. (And it is apparently _not_ optimized out by the compiler.)

The implementation of napi_get_value_int32() includes this code:

  RETURN_STATUS_IF_FALSE(env, val->IsNumber(), napi_number_expected);
  v8::Local<v8::Context> context = isolate->GetCurrentContext();
  *result = val->Int32Value(context).FromJust();

But v8::Value::Int32Value() does not use the context argument when the value is a number type (a condition that was already checked above):

Maybe<int32_t> Value::Int32Value(Local<Context> context) const {
  auto obj = Utils::OpenHandle(this);
  if (obj->IsNumber()) return Just(NumberToInt32(*obj));
  PREPARE_FOR_EXECUTION_PRIMITIVE(context, Object, Int32Value, int32_t);
  i::Handle<i::Object> num;
  has_pending_exception = !i::Object::ToInt32(isolate, obj).ToHandle(&num);
  RETURN_ON_FAILED_EXECUTION_PRIMITIVE(int32_t);
  return Just(num->IsSmi() ? i::Smi::cast(*num)->value()
                           : static_cast<int32_t>(num->Number()));
}

I can think of two ways to make this faster:

  • Call the v8::Value::Int32Value() overload that does not take a context (and does not return a maybe). The problem is it is marked as "to be deprecated soon".
  • Pass an _empty_ v8::Local<v8::Context> value to v8::Value::Int32Value(). This relies on the internal implementation detail that it does not use the context when the value is a number type. But in practice it should be safe, and will be easily caught by tests in the unlikely event V8 ever changes that API behavior.

I also considred caching the v8::Context in the napi_env structure, but that probably isn't valid because APIs can be called from different context scopes.

3. Allocating handle scopes

V8 handle scopes are normally stack-allocated. But the current N-API implementation puts them on the heap, which means every entry/exit of a scope involves expensive new and delete operations.

I can think of two ways to make this faster:

  • Change the design of all the N-API handle scope APIs so that the caller must pass in a pointer to a (presumaly stack-allocated) handle scope structure to be initialized. The problem is that the _size_ of that structure is VM-specific (and must be part of the ABI). While V8 is currently the only JS engine that uses handle scopes with an N-API implementation, defining a V8-specific structure would seem like a leak in the abstraction.
  • Pre-allocate memory for some small fixed number of handle scopes (maybe only 1?), attached to the napi_env. Track which ones are used/freed, and allocate new handle scopes on the heap only if the pre-allocated ones are all in use.
n-api performance

All 11 comments

1) SGTM.

2) Indeed, Isolate::GetCurrentContext() is not defined inline in the header file, so it's not possible for the compiler to optimize it out w/o LTO. I guess passing an empty context is the best way to go here.

3) > Pre-allocate memory for some small fixed number of handle scopes (maybe only 1?), attached to the napi_env.

This seems to be the way to go. A handle scope isn't that big memory-wise anyway so a couple more won't hurt either. Though, given v8::HandleScope only has an explicit constructor with an overloaded operator new(size_t size) that calls ABORT(), how do you plan to implement this?

given v8::HandleScope only has an explicit constructor with an overloaded operator new(size_t size) that calls ABORT(), how do you plan to implement this?

N-API already uses a wrapper around v8::HandleScope to enable it to be allocated on the heap. So I think we can just call operator new(size_t, void*) on that wrapper class, and that will construct the HandleScope member.

V8 has an optimized representation for 32-bit integer values, but because the value provided to N-API is always a double it always calls v8::Number::New() (never v8::Integer::New), so it does not create the optimal integer representation. Therefore these integer values are slower to create and slower to work with than they could be.

Reading the V8 source code, that doesn鈥檛 seem to be true: https://github.com/nodejs/node/blob/7fdcb68dc3db6e2765f06ad3e43af34c51a6bb70/deps/v8/src/factory.cc#L1332-L1337

Regarding 2: It鈥檚 probably easiest to just add

if (val->IsInt32()) {
  *result = val.As<Int32>()->Value();
  return napi_clear_last_error(env);
}

even before the val->IsNumber() check. I鈥檒l open a PR shortly.

Yes, you're right v8::Number::New() does eventually create the optimized integer representation. But it's slower than v8::Integer::New().

Somehow I had overlooked v8::Int32::Value() -- thanks!

Two more...

4. Over-use of v8::TryCatch

All of the napi_create_*() functions use the NAPI_PREAMBLE macro that sets up a v8::TryCatch at the function scope. However these functions have no possibility of executing JavaScript code so as far as I know there is no reason to pay the cost of the TryCatch. The same is true for napi_wrap() and napi_get_buffer_info() functions. Errors in non-JavaScript V8 API calls are reported either via the fatal error callback (node::OnFatalError()) or as empty Maybe return values, never via JavaScript exceptions catchable with a TryCatch.

5. Inefficient storage of pointers in internal fields

The N-API function callback adapters wrap pointers in v8::External values for use with SetInternalField() / GetInternalField(). But it's more efficient to omit the v8::External wrapper and use SetAlignedPointerInInternalField() / GetAlignedPointerFromInternalField() instead. (I think we can assume these pointers will be aligned... anyway there is a check that reports a fatal error if they are not.)

I believe we have addressed all points except 3).

There's not much more we can do, and we've covered most of these points. Closing.

Was 4. Over-use of v8::TryCatch addressed for napi_get_buffer_info?

@jorangreef looks like NAPI_PREAMBLE() (which uses the TryCatch) has now been reduced to CHECK_ENV() which does not 馃憤

@TimothyGu

Indeed, Isolate::GetCurrentContext() is not defined inline in the header file, so it's not possible for the compiler to optimize it out w/o LTO.

For GCC and Clang, applying __attribute__((pure)) (or [[gnu::pure]]) to the declaration would allow the compiler to optimize away calls to this function if its return value is never used.

Isolate::GetCurrentContext() isn't pure, it has side effects - it creates a new Local<Context> in the active HandleScope.

Was this page helpful?
0 / 5 - 0 ratings