The following code, which is a reduced test case using functionality that's relied upon by the popular Sentry error reporting module, works fine in Node 8.12.0 but consistently leaks memory on each request in Node 10.12.0, 11.0.0, and 9.11.2:
'use strict';
const domain = require('domain');
const app = require('express')();
app.use((req, res, next) => {
let requestDomain = domain.create();
requestDomain.add(req);
requestDomain.on('error', next);
requestDomain.run(() => {
next();
});
});
app.get('/', (req, res) => {
req.cachedPromise = new Promise(resolve => {
let tenMebibyteString = 'a'.repeat(10 * 1024 * 1024);
resolve(tenMebibyteString);
});
res.send('hi');
});
app.listen(5000);
I would expect the req object and its req.cachedPromise property to be GCed once the response is finished, since no references to it are being held. This is what happens in Node 8. But in Nodes 9, 10, and 11, req.cachedPromise never seems to be GCed, causing a steady memory leak.
I first noticed this issue when I tried to upgrade a production service using Sentry to Node 10. It likely affects anyone who uses Sentry with Express and caches promises on the request object (or on any descendant of the request object).
The relevant code in Sentry is here: https://github.com/getsentry/sentry-javascript/blob/839326fbb498c669bd8b9c38bd93d39ca15b6266/packages/node/src/handlers.ts#L220-L229
/cc @mcollina @addaleax ...
I'm thinking that there's possibly a leak in the revamped domain internals that's using async_hooks?
Ouch. I think the cause here is a circular-ish reference chain…
IncomingMessage instance (req) has a reference to the cachedPromisecachedPromise keeps the requestDomain alive through the global pairing map (async ID → domain)requestDomain has a members listmembers list contains IncomingMessage because it was added through .add().A regular circular reference would be no problem for GC, but the fact that we have this global pairing map makes it impossible for GC to tell that there is a kind of “if X is inaccessible, then Y is going to be purged from the map, becoming inaccessible itself” logic here.
@addaleax
These are all the occurrences of pairing that I was able to locate:
lib/domain.js:51:const pairing = new Map();
lib/domain.js:56: pairing.set(asyncId, process.domain);
lib/domain.js:61: const current = pairing.get(asyncId);
lib/domain.js:67: const current = pairing.get(asyncId);
lib/domain.js:73: pairing.delete(asyncId); // cleaning up
There seems to be no iteration.
Would just replacing it with WeakMap help, if that's the only place where those refs are retained?
@ChALkeR Yes, that’s the map – but it’s… difficult. WeakMap doesn’t work because we use integers as keys – ironically, partly with the intention of avoiding issues with GC…
I don’t have a good idea of how this could be addressed at the moment.
This is a fundemental design flaw of async_hooks. We use numbers as indexes (so no weakmaps) on a global map to allocate/track state and we deallocate it only when objects are destroyed. In several cases, this are tied to when the garbage collector actually cleans up the object, making it unpredictable.
In the past, I proposed to introduce a currentAsyncResource() API to be able to store state on the resource itself, removing the need for the global map. I’ll resume that work asap.
So… one thing we could do is to turn the values in the map into weak references… that’s icky, but it solves this particular problem.
It would mean that we rely on the resource.domain = process.domain line in the init hook to keep the domain alive for the lifetime of the resource (which seems like a reasonable assumption?), and it would mean going back to relying on an internal utility, unless we want to expose a public API à la https://www.npmjs.com/package/weak or https://www.npmjs.com/package/weak-napi…
Example diff for v10.x in the fold
diff --git a/lib/domain.js b/lib/domain.js
index e8ae3ff10034..b66fdc3f3a09 100644
--- a/lib/domain.js
+++ b/lib/domain.js
@@ -35,6 +35,9 @@ const {
} = require('internal/errors').codes;
const { createHook } = require('async_hooks');
+const kWeak = Symbol('kWeak');
+const { WeakReference } = internalBinding('util');
+
// overwrite process.domain with a getter/setter that will allow for more
// effective optimizations
var _domain = [null];
@@ -53,7 +56,7 @@ const asyncHook = createHook({
init(asyncId, type, triggerAsyncId, resource) {
if (process.domain !== null && process.domain !== undefined) {
// if this operation is created while in a domain, let's mark it
- pairing.set(asyncId, process.domain);
+ pairing.set(asyncId, process.domain[kWeak]);
resource.domain = process.domain;
if (resource.promise !== undefined &&
resource.promise instanceof Promise) {
@@ -67,13 +70,13 @@ const asyncHook = createHook({
before(asyncId) {
const current = pairing.get(asyncId);
if (current !== undefined) { // enter domain for this cb
- current.enter();
+ current.get().enter();
}
},
after(asyncId) {
const current = pairing.get(asyncId);
if (current !== undefined) { // exit domain for this cb
- current.exit();
+ current.get().exit();
}
},
destroy(asyncId) {
@@ -174,6 +177,7 @@ class Domain extends EventEmitter {
super();
this.members = [];
+ this[kWeak] = new WeakReference(this);
asyncHook.enable();
this.on('removeListener', updateExceptionCapture);
diff --git a/src/node_util.cc b/src/node_util.cc
index 5adecf4d9753..c41162dc9a81 100644
--- a/src/node_util.cc
+++ b/src/node_util.cc
@@ -1,5 +1,6 @@
#include "node_internals.h"
#include "node_watchdog.h"
+#include "base_object-inl.h"
namespace node {
namespace util {
@@ -8,7 +9,9 @@ using v8::ALL_PROPERTIES;
using v8::Array;
using v8::Context;
using v8::FunctionCallbackInfo;
+using v8::FunctionTemplate;
using v8::Integer;
+using v8::Isolate;
using v8::Local;
using v8::Object;
using v8::ONLY_CONFIGURABLE;
@@ -172,6 +175,36 @@ void SafeGetenv(const FunctionCallbackInfo<Value>& args) {
v8::NewStringType::kNormal).ToLocalChecked());
}
+class WeakReference : public BaseObject {
+ public:
+ WeakReference(Environment* env, Local<Object> object, Local<Object> target)
+ : BaseObject(env, object) {
+ MakeWeak();
+ target_.Reset(env->isolate(), target);
+ target_.SetWeak();
+ }
+
+ static void New(const FunctionCallbackInfo<Value>& args) {
+ Environment* env = Environment::GetCurrent(args);
+ CHECK(args.IsConstructCall());
+ CHECK(args[0]->IsObject());
+ new WeakReference(env, args.This(), args[0].As<Object>());
+ }
+
+ static void Get(const FunctionCallbackInfo<Value>& args) {
+ WeakReference* weak_ref = Unwrap<WeakReference>(args.Holder());
+ Isolate* isolate = args.GetIsolate();
+ if (!weak_ref->target_.IsEmpty())
+ args.GetReturnValue().Set(weak_ref->target_.Get(isolate));
+ }
+
+ SET_MEMORY_INFO_NAME(WeakReference)
+ SET_SELF_SIZE(WeakReference)
+ SET_NO_MEMORY_INFO()
+ private:
+ Persistent<Object> target_;
+};
+
void Initialize(Local<Object> target,
Local<Value> unused,
Local<Context> context) {
@@ -219,6 +252,16 @@ void Initialize(Local<Object> target,
env->SetMethod(target, "safeGetenv", SafeGetenv);
+ Local<String> weak_ref_string =
+ FIXED_ONE_BYTE_STRING(env->isolate(), "WeakReference");
+ Local<FunctionTemplate> weak_ref =
+ env->NewFunctionTemplate(WeakReference::New);
+ weak_ref->InstanceTemplate()->SetInternalFieldCount(1);
+ weak_ref->SetClassName(weak_ref_string);
+ env->SetProtoMethod(weak_ref, "get", WeakReference::Get);
+ target->Set(context, weak_ref_string,
+ weak_ref->GetFunction(context).ToLocalChecked()).FromJust();
+
Local<Object> constants = Object::New(env->isolate());
NODE_DEFINE_CONSTANT(constants, ALL_PROPERTIES);
NODE_DEFINE_CONSTANT(constants, ONLY_WRITABLE);
fyi @nodejs/diagnostics
Deleting from the map on promiseResolve seems to help, at least for this specific testcase.
Do we need to retain it after promiseResolve has been fired once?
I.e.:
index 0caeb624b4..af7db89b34 100644
--- a/lib/domain.js
+++ b/lib/domain.js
@@ -71,6 +71,9 @@ const asyncHook = createHook({
},
destroy(asyncId) {
pairing.delete(asyncId); // cleaning up
+ },
+ promiseResolve(asyncId) {
+ pairing.delete(asyncId); // cleaning up
}
});
@chalker I don’t think this is specific to promises :/
And no, I don’t think we can do something inside promiseResolve…
const async_hooks = require('async_hooks');
async_hooks.createHook({
init(...args) { process._rawDebug('init', args) },
before(id) { process._rawDebug('before', id) },
after(id) { process._rawDebug('after', id) },
destroy(id) { process._rawDebug('destroy', id) },
promiseResolve(id) { process._rawDebug('promiseResolve', id) },
}).enable();
Promise.resolve(42).then(() => { process._rawDebug('inside then') });
prints
init [ 5, 'PROMISE', 1, PromiseWrap { isChainedPromise: false } ]
promiseResolve 5
init [ 6, 'PROMISE', 5, PromiseWrap { isChainedPromise: true } ]
before 6
inside then
promiseResolve 6
after 6
@addaleax Oh, I missed that (re: call order), thanks.
Is after always called no later than in the same tick as promiseResolve? If yes, we could just delay.
diff --git a/lib/domain.js b/lib/domain.js
index 0caeb624b4..8070610235 100644
--- a/lib/domain.js
+++ b/lib/domain.js
@@ -71,6 +71,11 @@ const asyncHook = createHook({
},
destroy(asyncId) {
pairing.delete(asyncId); // cleaning up
+ },
+ promiseResolve(asyncId) {
+ process.nextTick(() => {
+ pairing.delete(asyncId);
+ });
}
});
Re: not only to promises: can I have a testcase for that? Something timer-related? I might be missing something.
Upd:
const async_hooks = require('async_hooks');
const end = new Set();
async_hooks.createHook({
init(...args) { process._rawDebug('init', args.join('|')) },
before(id) { process._rawDebug('before', id) },
after(id) {
process._rawDebug('after', id);
if (end.has(id)) {
process._rawDebug('wrong execution order')
process.exit(1)
}
},
destroy(id) { process._rawDebug('destroy', id) },
promiseResolve(id) {
process._rawDebug('promiseResolve', id);
process.nextTick(() => {
end.add(id);
process._rawDebug('promiseResolve (delayed)', id)
})
},
}).enable();
Not sure if that's reliable though, need to look inside async_hooks.
@chalker I think any GC-based resource would have the same issue – I haven’t tried, but an async_hooks.AsyncResource with default options should do the trick just as well?
I am not sure why I missed that issue. Sorry.
@addaleax when you suggest async_hooks.AsyncResource is that the same as @mcollina 's currentAsyncResource()?
@vdeturckheim I’m not sure I understand the question – the two concepts are very, very different; one is a JS class to create custom async resource types, the other is a function that gives you the currently active resource.
@rgrove For the record, I've also encountered the same issue this week. Ended up with the same conclusions. To contain the leak, I'm cleaning the active domain (Sentry is the only one using domains in my stack) calling this function at the end of my request handlers:
import domain from 'domain'
export default function domainCleanup() {
const activeDomain = domain.active
if (activeDomain) {
const members = activeDomain.members.slice()
members.forEach(member => activeDomain.remove(member))
const eventNames = activeDomain.eventNames()
eventNames.forEach(eventName => activeDomain.removeAllListeners(eventName))
activeDomain.__SENTRY__ = null
delete activeDomain.__SENTRY__
}
}
@addaleax The thing is that, using the profiler, I can still see the pairing map referencing each empty domain. I thought that by clearing the members, the request would have been garbaged (which is the case), leading the attached promised to be also collected and, ultimately, clearing the map entry. But that is not what I've observed. Didn't look at lib/domain.js though...
Ok, so looking at lib/domain.js, it seems that if any asynchronous resources is not garbaged, it keeps a reference to the domain through the pairing global. So I need to find the resource in question. Using an asyncId, is there a way find an asynchronous resource ?
Despite what the profiler said, the memory curve stayed flat for 10 hours, so something must collect those domains.
@addaleax @ChALkeR @mcollina @vdeturckheim Is someone looking into a fix for this? (If not, should we label help wanted? Maybe at a minimum we can get a known_issues test together for this?)
@mmarchini is taking up the fix I started around May and trying to taking over the finishing line. This is going to take a while unfortunately. I'm +1 in getting a failing test in known_issues in the meanwhile.
@mcollina Are you talking about currentResource()? If so, yes, that is going take time, and I’d go look into my patch from above again?
@addaleax +1 on that.
Based on my tests, you don't even need to attach request object manually to the domain. Just create a promise inside run. Minimal repro case:
const domain = require("domain");
const app = require("express")();
app.use((req, res, next) => {
let requestDomain = domain.create();
requestDomain.on("error", next);
requestDomain.run(next);
});
app.get("/", (req, res) => {
req.aPromise = new Promise(resolve => resolve("A".repeat(10 * 1024 * 1024)));
res.send("hello world");
});
app.listen(5000);
Also not sure why it happens, but requestDomain.on("error", next); line is required in order to trigger this leak. Without it, everything is collected just fine.
I’ve opened a PR with the solution from above in https://github.com/nodejs/node/pull/25993 … still not a great solution, though.
Most helpful comment
I’ve opened a PR with the solution from above in https://github.com/nodejs/node/pull/25993 … still not a great solution, though.