Node: process.stdio.write() performance issue when providing a string argument: it's 4 times slower than a buffer argument

Created on 7 May 2018 · 4Comments · Source: nodejs/node

process.stdio.write() can take in either a string or a Buffer argument (stderr does the same but let's just put stdio in the spot light), while the performance is different -- the string version is approx 4 times slower.

Test case is as below:

// write-string.js

const SIZE = 65536;
const buf = Buffer.alloc(SIZE, 'y\n');

function bulk_write() {
  while (true) {
    if (process.stdout.write(buf)) {
      // OK, we're fine
    } else {
      process.stdout.once('drain', bulk_write);
      break;
    }
  }
}

bulk_write();

```JavaScript
// write-string.js

const SIZE = 65536;
const buf = Buffer.alloc(SIZE, 'yn').toString();

function bulk_write() {
while (true) {
if (process.stdout.write(buf)) {
// OK, we're fine
} else {
process.stdout.once('drain', bulk_write);
break;
}
}
}

bulk_write();

$ ./node write-buffer.js | pv > /dev/null
^C.1GiB 0:00:06 [ 8.8GiB/s] [ <=> ]

$ ./node write-string.js | pv > /dev/null
^C37GiB 0:00:05 [2.37GiB/s] [ <=> ]


Drilling into the issue and I've found that the `StreamBase::WriteBuffer` is much simpler than `StreamBase::WriteString<>()`, and the runtime cost makes the difference.

Typical execution time of `StreamBase::WriteBuffer()` is 6.7 microseconds for 64KiB buffer on my machine, while
Typical execution time of `StreamBase::WriteString<>()` is 23.2 microseconds for 64 KiB string. Breaking it down:
 - Typical execution time of `StringBytes::Size()` is 7.4 microseconds (Note: this is longer than `StreamBase::WriteBuffer()`)
 - Typical execution time of `StringBytes::Write()` is 9.1 microseconds
 - The actual write of the buffer still takes 6.7 microseconds.


```C++
// stream_base.cc
int StreamBase::WriteBuffer(const FunctionCallbackInfo<Value>& args) {
  CHECK(args[0]->IsObject());

  Environment* env = Environment::GetCurrent(args);

  if (!args[1]->IsUint8Array()) {
    node::THROW_ERR_INVALID_ARG_TYPE(env, "Second argument must be a buffer");
    return 0;
  }

  Local<Object> req_wrap_obj = args[0].As<Object>();

  uv_buf_t buf;
  buf.base = Buffer::Data(args[1]);
  buf.len = Buffer::Length(args[1]);

  StreamWriteResult res = Write(&buf, 1, nullptr, req_wrap_obj);

  if (res.async)
    req_wrap_obj->Set(env->context(), env->buffer_string(), args[1]).FromJust();
  SetWriteResultPropertiesOnWrapObject(env, req_wrap_obj, res);

  return res.err;
}

// stream_base.cc
template <enum encoding enc>
int StreamBase::WriteString(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);
  CHECK(args[0]->IsObject());
  CHECK(args[1]->IsString());

  Local<Object> req_wrap_obj = args[0].As<Object>();
  Local<String> string = args[1].As<String>();
  Local<Object> send_handle_obj;
  if (args[2]->IsObject())
    send_handle_obj = args[2].As<Object>();

  int err;

  // Compute the size of the storage that the string will be flattened into.
  // For UTF8 strings that are very long, go ahead and take the hit for
  // computing their actual size, rather than tripling the storage.
  size_t storage_size;
  if (enc == UTF8 && string->Length() > 65535)
    storage_size = StringBytes::Size(env->isolate(), string, enc);
  else
    storage_size = StringBytes::StorageSize(env->isolate(), string, enc);

  if (storage_size > INT_MAX)
    return UV_ENOBUFS;

  // Try writing immediately if write size isn't too big
  char stack_storage[16384];  // 16kb
  size_t data_size;
  size_t synchronously_written = 0;
  uv_buf_t buf;

  bool try_write = storage_size <= sizeof(stack_storage) &&
                   (!IsIPCPipe() || send_handle_obj.IsEmpty());
  if (try_write) {
    data_size = StringBytes::Write(env->isolate(),
                                   stack_storage,
                                   storage_size,
                                   string,
                                   enc);
    buf = uv_buf_init(stack_storage, data_size);

    uv_buf_t* bufs = &buf;
    size_t count = 1;
    err = DoTryWrite(&bufs, &count);
    // Keep track of the bytes written here, because we're taking a shortcut
    // by using `DoTryWrite()` directly instead of using the utilities
    // provided by `Write()`.
    synchronously_written = count == 0 ? data_size : data_size - buf.len;
    bytes_written_ += synchronously_written;

    // Immediate failure or success
    if (err != 0 || count == 0) {
      req_wrap_obj->Set(env->context(), env->async(), False(env->isolate()))
          .FromJust();
      req_wrap_obj->Set(env->context(),
                        env->bytes_string(),
                        Integer::NewFromUnsigned(env->isolate(), data_size))
          .FromJust();

      return err;
    }

    // Partial write
    CHECK_EQ(count, 1);
  }

  std::unique_ptr<char[], Free> data;

  if (try_write) {
    // Copy partial data
    data = std::unique_ptr<char[], Free>(Malloc(buf.len));
    memcpy(data.get(), buf.base, buf.len);
    data_size = buf.len;
  } else {
    // Write it
    data = std::unique_ptr<char[], Free>(Malloc(storage_size));
    data_size = StringBytes::Write(env->isolate(),
                                   data.get(),
                                   storage_size,
                                   string,
                                   enc);
  }

  CHECK_LE(data_size, storage_size);

  buf = uv_buf_init(data.get(), data_size);

  uv_stream_t* send_handle = nullptr;

  if (IsIPCPipe() && !send_handle_obj.IsEmpty()) {
    HandleWrap* wrap;
    ASSIGN_OR_RETURN_UNWRAP(&wrap, send_handle_obj, UV_EINVAL);
    send_handle = reinterpret_cast<uv_stream_t*>(wrap->GetHandle());
    // Reference LibuvStreamWrap instance to prevent it from being garbage
    // collected before `AfterWrite` is called.
    req_wrap_obj->Set(env->handle_string(), send_handle_obj);
  }

  StreamWriteResult res = Write(&buf, 1, send_handle, req_wrap_obj);
  res.bytes += synchronously_written;

  SetWriteResultPropertiesOnWrapObject(env, req_wrap_obj, res);
  if (res.wrap != nullptr) {
    res.wrap->SetAllocatedStorage(data.release(), data_size);
  }

  return res.err;
}

I'm using various Node.js versions (incl. master as of May 2018) and the issue is similar across versions.

Is there any way to directly use the internal buffer of a V8 string to save time when the encoding match? .. so that we can save time on buffer counting and writing (given the fact that the length counting of a large string is slower than the actual IO of the same string)

Note: I've tested v8::String::Utf8Value class and it's slower than the function v8::String::WriteUtf8() which is currently used by Node.js StringBytes class.
Note2: I have to point out that for a text processing app, the throughput is important; on the other hand, the reduction of execution time to a single API call is also a key factor to speed up the main event loop.

performance question wrong repo

Source

kenny-y

All 4 comments

I'm going to close this because the performance of strings vs. buffers is not a bug, as far as I'm concerned. Buffers (and typed arrays) can be used directly, strings have to be decoded and copied out of the JS heap first.

Is there any way to directly use the internal buffer of a V8 string to save time when the encoding match?

No. V8 has no such API and strings don't have a single encoding or even a single backing storage. Look up v8::String::ExternalStringResource and its one-byte counterpart in v8.h for a possible solution.

If you still have questions, please open an issue over at nodejs/help.

bnoordhuis on 7 May 2018

👍1

@kenny-y

As @bnoordhuis said above, Strings in V8 are rather complex and have more than one backing storage variants.

V8 does that to keep them optimized, and it is not a bug.

Some examples:

> process.memoryUsage().rss / 2**20
30.16796875
> const x = 'x'.repeat(2**28); x.length
268435456
> process.memoryUsage().rss
33517568
> process.memoryUsage().rss / 2**20
32.21875

> process.memoryUsage().rss / 2**20
29.60546875
> const x = new Array(2**25).join('x'); x.length
33554431
> const y = new Array(2**25).join('y'); y.length
33554431
> process.memoryUsage().rss / 2**20
352.79296875
> gc()
undefined
> process.memoryUsage().rss / 2**20
97.0859375
> const xy = x + y; xy.length
67108862
> process.memoryUsage().rss / 2**20
97.0859375
> const yx = y + x; yx.length
67108862
> const xyyyxyx = x + y + y + yx + yx; xyyyxyx.length
234881017
> process.memoryUsage().rss / 2**20
97.0859375

Buffer, on the other hand, is more similar to just a memory chunk and could be used and read directly.

ChALkeR on 7 May 2018

👍1

I'm OK with closing it... and thanks for all the info. Actually, I was just raising a question in a "think the unthinkable" style, which was partially inspired by the spirit of "zero-copy" in modern sw/hw system design (e.g. no buffer-copying between user space program and kernel space driver). I'm glad to hear that this was well considered.

kenny-y on 8 May 2018

Filed a bug in V8 but nobody responded yet https://bugs.chromium.org/p/v8/issues/detail?id=7759

kenny-y on 18 May 2018

Was this page helpful?

0 / 5 - 0 ratings