Node: Memory leak when a Promise is stored on an Express request that's added to a domain

Created on 25 Oct 2018 · 22Comments · Source: nodejs/node

Version: v10.12.0
Platform: Darwin zero-gravitas-lan.jetpants.com 18.0.0 Darwin Kernel Version 18.0.0: Wed Aug 22 20:13:40 PDT 2018; root:xnu-4903.201.2~1/RELEASE_X86_64 x86_64
Subsystem: domain

The following code, which is a reduced test case using functionality that's relied upon by the popular Sentry error reporting module, works fine in Node 8.12.0 but consistently leaks memory on each request in Node 10.12.0, 11.0.0, and 9.11.2:

'use strict';

const domain = require('domain');
const app = require('express')();

app.use((req, res, next) => {
  let requestDomain = domain.create();

  requestDomain.add(req);
  requestDomain.on('error', next);

  requestDomain.run(() => {
    next();
  });
});

app.get('/', (req, res) => {
  req.cachedPromise = new Promise(resolve => {
    let tenMebibyteString = 'a'.repeat(10 * 1024 * 1024);
    resolve(tenMebibyteString);
  });

  res.send('hi');
});

app.listen(5000);

I would expect the req object and its req.cachedPromise property to be GCed once the response is finished, since no references to it are being held. This is what happens in Node 8. But in Nodes 9, 10, and 11, req.cachedPromise never seems to be GCed, causing a steady memory leak.

I first noticed this issue when I tried to upgrade a production service using Sentry to Node 10. It likely affects anyone who uses Sentry with Express and caches promises on the request object (or on any descendant of the request object).

The relevant code in Sentry is here: https://github.com/getsentry/sentry-javascript/blob/839326fbb498c669bd8b9c38bd93d39ca15b6266/packages/node/src/handlers.ts#L220-L229

confirmed-bug domain memory regression v10.x

Source

rgrove

👍10

Most helpful comment

I’ve opened a PR with the solution from above in https://github.com/nodejs/node/pull/25993 … still not a great solution, though.

addaleax on 7 Feb 2019

❤4

All 22 comments

/cc @mcollina @addaleax ...

I'm thinking that there's possibly a leak in the revamped domain internals that's using async_hooks?

jasnell on 25 Oct 2018

Ouch. I think the cause here is a circular-ish reference chain…

The IncomingMessage instance (req) has a reference to the cachedPromise
The cachedPromise keeps the requestDomain alive through the global pairing map (async ID → domain)
The requestDomain has a members list
The members list contains IncomingMessage because it was added through .add().

A regular circular reference would be no problem for GC, but the fact that we have this global pairing map makes it impossible for GC to tell that there is a kind of “if X is inaccessible, then Y is going to be purged from the map, becoming inaccessible itself” logic here.

addaleax on 25 Oct 2018

@addaleax
These are all the occurrences of pairing that I was able to locate:

lib/domain.js:51:const pairing = new Map();
lib/domain.js:56:      pairing.set(asyncId, process.domain);
lib/domain.js:61:    const current = pairing.get(asyncId);
lib/domain.js:67:    const current = pairing.get(asyncId);
lib/domain.js:73:    pairing.delete(asyncId); // cleaning up

There seems to be no iteration.
Would just replacing it with WeakMap help, if that's the only place where those refs are retained?

ChALkeR on 25 Oct 2018

@ChALkeR Yes, that’s the map – but it’s… difficult. WeakMap doesn’t work because we use integers as keys – ironically, partly with the intention of avoiding issues with GC…

I don’t have a good idea of how this could be addressed at the moment.

addaleax on 25 Oct 2018

👍1

This is a fundemental design flaw of async_hooks. We use numbers as indexes (so no weakmaps) on a global map to allocate/track state and we deallocate it only when objects are destroyed. In several cases, this are tied to when the garbage collector actually cleans up the object, making it unpredictable.

In the past, I proposed to introduce a currentAsyncResource() API to be able to store state on the resource itself, removing the need for the global map. I’ll resume that work asap.

mcollina on 25 Oct 2018

👍1

So… one thing we could do is to turn the values in the map into weak references… that’s icky, but it solves this particular problem.

It would mean that we rely on the resource.domain = process.domain line in the init hook to keep the domain alive for the lifetime of the resource (which seems like a reasonable assumption?), and it would mean going back to relying on an internal utility, unless we want to expose a public API à la https://www.npmjs.com/package/weak or https://www.npmjs.com/package/weak-napi…

Example diff for v10.x in the fold

diff --git a/lib/domain.js b/lib/domain.js
index e8ae3ff10034..b66fdc3f3a09 100644
--- a/lib/domain.js
+++ b/lib/domain.js
@@ -35,6 +35,9 @@ const {
 } = require('internal/errors').codes;
 const { createHook } = require('async_hooks');

+const kWeak = Symbol('kWeak');
+const { WeakReference } = internalBinding('util');
+
 // overwrite process.domain with a getter/setter that will allow for more
 // effective optimizations
 var _domain = [null];
@@ -53,7 +56,7 @@ const asyncHook = createHook({
   init(asyncId, type, triggerAsyncId, resource) {
     if (process.domain !== null && process.domain !== undefined) {
       // if this operation is created while in a domain, let's mark it
-      pairing.set(asyncId, process.domain);
+      pairing.set(asyncId, process.domain[kWeak]);
       resource.domain = process.domain;
       if (resource.promise !== undefined &&
           resource.promise instanceof Promise) {
@@ -67,13 +70,13 @@ const asyncHook = createHook({
   before(asyncId) {
     const current = pairing.get(asyncId);
     if (current !== undefined) { // enter domain for this cb
-      current.enter();
+      current.get().enter();
     }
   },
   after(asyncId) {
     const current = pairing.get(asyncId);
     if (current !== undefined) { // exit domain for this cb
-      current.exit();
+      current.get().exit();
     }
   },
   destroy(asyncId) {
@@ -174,6 +177,7 @@ class Domain extends EventEmitter {
     super();

     this.members = [];
+    this[kWeak] = new WeakReference(this);
     asyncHook.enable();

     this.on('removeListener', updateExceptionCapture);
diff --git a/src/node_util.cc b/src/node_util.cc
index 5adecf4d9753..c41162dc9a81 100644
--- a/src/node_util.cc
+++ b/src/node_util.cc
@@ -1,5 +1,6 @@
 #include "node_internals.h"
 #include "node_watchdog.h"
+#include "base_object-inl.h"

 namespace node {
 namespace util {
@@ -8,7 +9,9 @@ using v8::ALL_PROPERTIES;
 using v8::Array;
 using v8::Context;
 using v8::FunctionCallbackInfo;
+using v8::FunctionTemplate;
 using v8::Integer;
+using v8::Isolate;
 using v8::Local;
 using v8::Object;
 using v8::ONLY_CONFIGURABLE;
@@ -172,6 +175,36 @@ void SafeGetenv(const FunctionCallbackInfo<Value>& args) {
             v8::NewStringType::kNormal).ToLocalChecked());
 }

+class WeakReference : public BaseObject {
+ public:
+  WeakReference(Environment* env, Local<Object> object, Local<Object> target)
+    : BaseObject(env, object) {
+    MakeWeak();
+    target_.Reset(env->isolate(), target);
+    target_.SetWeak();
+  }
+
+  static void New(const FunctionCallbackInfo<Value>& args) {
+    Environment* env = Environment::GetCurrent(args);
+    CHECK(args.IsConstructCall());
+    CHECK(args[0]->IsObject());
+    new WeakReference(env, args.This(), args[0].As<Object>());
+  }
+
+  static void Get(const FunctionCallbackInfo<Value>& args) {
+    WeakReference* weak_ref = Unwrap<WeakReference>(args.Holder());
+    Isolate* isolate = args.GetIsolate();
+    if (!weak_ref->target_.IsEmpty())
+      args.GetReturnValue().Set(weak_ref->target_.Get(isolate));
+  }
+
+  SET_MEMORY_INFO_NAME(WeakReference)
+  SET_SELF_SIZE(WeakReference)
+  SET_NO_MEMORY_INFO()
+ private:
+  Persistent<Object> target_;
+};
+
 void Initialize(Local<Object> target,
                 Local<Value> unused,
                 Local<Context> context) {
@@ -219,6 +252,16 @@ void Initialize(Local<Object> target,

   env->SetMethod(target, "safeGetenv", SafeGetenv);

+  Local<String> weak_ref_string =
+      FIXED_ONE_BYTE_STRING(env->isolate(), "WeakReference");
+  Local<FunctionTemplate> weak_ref =
+      env->NewFunctionTemplate(WeakReference::New);
+  weak_ref->InstanceTemplate()->SetInternalFieldCount(1);
+  weak_ref->SetClassName(weak_ref_string);
+  env->SetProtoMethod(weak_ref, "get", WeakReference::Get);
+  target->Set(context, weak_ref_string,
+              weak_ref->GetFunction(context).ToLocalChecked()).FromJust();
+
   Local<Object> constants = Object::New(env->isolate());
   NODE_DEFINE_CONSTANT(constants, ALL_PROPERTIES);
   NODE_DEFINE_CONSTANT(constants, ONLY_WRITABLE);

addaleax on 25 Oct 2018

fyi @nodejs/diagnostics

addaleax on 25 Oct 2018

Deleting from the map on promiseResolve seems to help, at least for this specific testcase.
Do we need to retain it after promiseResolve has been fired once?

I.e.:

index 0caeb624b4..af7db89b34 100644
--- a/lib/domain.js
+++ b/lib/domain.js
@@ -71,6 +71,9 @@ const asyncHook = createHook({
   },
   destroy(asyncId) {
     pairing.delete(asyncId); // cleaning up
+  },
+  promiseResolve(asyncId) {
+    pairing.delete(asyncId); // cleaning up
   }
 });

ChALkeR on 25 Oct 2018

@chalker I don’t think this is specific to promises :/

And no, I don’t think we can do something inside promiseResolve…

const async_hooks = require('async_hooks');
async_hooks.createHook({
  init(...args) { process._rawDebug('init', args) },
  before(id) { process._rawDebug('before', id) },
  after(id) { process._rawDebug('after', id) },
  destroy(id) { process._rawDebug('destroy', id) },
  promiseResolve(id) { process._rawDebug('promiseResolve', id) },
}).enable();

Promise.resolve(42).then(() => { process._rawDebug('inside then') });

prints

init [ 5, 'PROMISE', 1, PromiseWrap { isChainedPromise: false } ]
promiseResolve 5
init [ 6, 'PROMISE', 5, PromiseWrap { isChainedPromise: true } ]
before 6
inside then
promiseResolve 6
after 6

addaleax on 25 Oct 2018

@addaleax Oh, I missed that (re: call order), thanks.
Is after always called no later than in the same tick as promiseResolve? If yes, we could just delay.

diff --git a/lib/domain.js b/lib/domain.js
index 0caeb624b4..8070610235 100644
--- a/lib/domain.js
+++ b/lib/domain.js
@@ -71,6 +71,11 @@ const asyncHook = createHook({
   },
   destroy(asyncId) {
     pairing.delete(asyncId); // cleaning up
+  },
+  promiseResolve(asyncId) {
+    process.nextTick(() => {
+      pairing.delete(asyncId);
+    });
   }
 });

Re: not only to promises: can I have a testcase for that? Something timer-related? I might be missing something.

Upd:

const async_hooks = require('async_hooks');
const end = new Set();
async_hooks.createHook({
  init(...args) { process._rawDebug('init', args.join('|')) },
  before(id) { process._rawDebug('before', id) },
  after(id) {
    process._rawDebug('after', id);
    if (end.has(id)) {
      process._rawDebug('wrong execution order')
      process.exit(1)
    }
  },
  destroy(id) { process._rawDebug('destroy', id) },
  promiseResolve(id) {
    process._rawDebug('promiseResolve', id);
    process.nextTick(() => {
      end.add(id);
      process._rawDebug('promiseResolve (delayed)', id)
    })
  },
}).enable();

Not sure if that's reliable though, need to look inside async_hooks.

ChALkeR on 25 Oct 2018

@chalker I think any GC-based resource would have the same issue – I haven’t tried, but an async_hooks.AsyncResource with default options should do the trick just as well?

addaleax on 25 Oct 2018

I am not sure why I missed that issue. Sorry.

@addaleax when you suggest async_hooks.AsyncResource is that the same as @mcollina 's currentAsyncResource()?

vdeturckheim on 5 Nov 2018

@vdeturckheim I’m not sure I understand the question – the two concepts are very, very different; one is a JS class to create custom async resource types, the other is a function that gives you the currently active resource.

addaleax on 6 Nov 2018

@rgrove For the record, I've also encountered the same issue this week. Ended up with the same conclusions. To contain the leak, I'm cleaning the active domain (Sentry is the only one using domains in my stack) calling this function at the end of my request handlers:

import domain from 'domain'

export default function domainCleanup() {
  const activeDomain = domain.active
  if (activeDomain) {
    const members = activeDomain.members.slice()
    members.forEach(member => activeDomain.remove(member))
    const eventNames = activeDomain.eventNames()
    eventNames.forEach(eventName => activeDomain.removeAllListeners(eventName))
    activeDomain.__SENTRY__ = null
    delete activeDomain.__SENTRY__
  }
}

@addaleax The thing is that, using the profiler, I can still see the pairing map referencing each empty domain. I thought that by clearing the members, the request would have been garbaged (which is the case), leading the attached promised to be also collected and, ultimately, clearing the map entry. But that is not what I've observed. Didn't look at lib/domain.js though...

salper on 15 Nov 2018

👍1

Ok, so looking at lib/domain.js, it seems that if any asynchronous resources is not garbaged, it keeps a reference to the domain through the pairing global. So I need to find the resource in question. Using an asyncId, is there a way find an asynchronous resource ?

salper on 15 Nov 2018

Despite what the profiler said, the memory curve stayed flat for 10 hours, so something must collect those domains.

salper on 16 Nov 2018

@addaleax @ChALkeR @mcollina @vdeturckheim Is someone looking into a fix for this? (If not, should we label help wanted? Maybe at a minimum we can get a known_issues test together for this?)

Trott on 18 Nov 2018

@mmarchini is taking up the fix I started around May and trying to taking over the finishing line. This is going to take a while unfortunately. I'm +1 in getting a failing test in known_issues in the meanwhile.

mcollina on 19 Nov 2018

@mcollina Are you talking about currentResource()? If so, yes, that is going take time, and I’d go look into my patch from above again?

addaleax on 20 Nov 2018

@addaleax +1 on that.

mcollina on 20 Nov 2018

Based on my tests, you don't even need to attach request object manually to the domain. Just create a promise inside run. Minimal repro case:

const domain = require("domain");
const app = require("express")();

app.use((req, res, next) => {
  let requestDomain = domain.create();
  requestDomain.on("error", next);
  requestDomain.run(next);
});

app.get("/", (req, res) => {
  req.aPromise = new Promise(resolve => resolve("A".repeat(10 * 1024 * 1024)));
  res.send("hello world");
});

app.listen(5000);

Also not sure why it happens, but requestDomain.on("error", next); line is required in order to trigger this leak. Without it, everything is collected just fine.