Given the recent PR ( #5020 ) I wanted to see if promises could enable core dump based debugging.
It looks like currently when an unhandledRejection occurs the stack is gone. This is very similar to how the stack is gone when an uncaughtException occurs.
We've added --abort-on-uncaught-exception flag to allow us to abort the process with the correct stack in C++. We need to implement --abort-on-unhandled-rejection flag as well.
See foo.js example
process.on('unhandledRejection', function onError(err) {
throw err;
});
function foo() {
new Promise(function reject(re, rj) { rj(new Error('sad')) });
}
foo();
raynos at raynos-SVS15127PXB ~
$ node foo.js
/home/raynos/foo.js:2
throw err;
^
Error: sad
at reject (/home/raynos/foo.js:6:46)
at foo (/home/raynos/foo.js:6:5)
at Object.<anonymous> (/home/raynos/foo.js:8:1)
at Module._compile (module.js:435:26)
at Object.Module._extensions..js (module.js:442:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:313:12)
at Function.Module.runMain (module.js:467:10)
at startup (node.js:136:18)
at node.js:963:3
Note the error has the correct stacktrace including foo and reject
raynos at raynos-SVS15127PXB ~
$ node --abort-on-uncaught-exception foo.js
Uncaught Error: sad
FROM
process.onError (/home/raynos/foo.js:2:6)
emitTwo (events.js:87:13)
process.emit (events.js:172:7)
emitPendingUnhandledRejections (node.js:500:24)
runMicrotasksCallback (node.js:329:11)
doNTCallback0 (node.js:419:9)
process._tickCallback (node.js:348:13)
Function.Module.runMain (module.js:469:11)
startup (node.js:136:18)
node.js:963:3
Illegal instruction
Note that the actual stack of the program when in the unhandledRejection listener is wrong. This means we cannot just process.abort() in there since that's absolutely useless, I want to stack that mattered.
I considered this a MUSTFIX before we land promises into core, if we cannot debug promises then there is not point in using it. The only good feature nodejs has is extreme debuggability.
cc: @chrisdickinson @misterdjules
isolate->SetPromiseRejectCallback(reinterpret_cast<v8::PromiseRejectCallback>(abort)) is theoretically all you need. The call chain leading up to abort() is the stack trace you're looking for, if I understand you right.
Correct. The stack leading upto abort() should include the line on which the error was thrown.
See below an example of errors and --abort-on-uncaught-exception notice, how the abort stack contains foo function.
raynos at raynos-SVS15127PXB ~
$ node foo.js
/home/raynos/foo.js:6
throw new Error('sad');
^
Error: sad
at foo (/home/raynos/foo.js:6:11)
at Object.<anonymous> (/home/raynos/foo.js:8:1)
at Module._compile (module.js:435:26)
at Object.Module._extensions..js (module.js:442:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:313:12)
at Function.Module.runMain (module.js:467:10)
at startup (node.js:136:18)
at node.js:963:3
raynos at raynos-SVS15127PXB ~
$ node --abort-on-uncaught-exception foo.js
Uncaught Error: sad
FROM
foo (/home/raynos/foo.js:6:5)
Object.<anonymous> (/home/raynos/foo.js:8:1)
Module._compile (module.js:435:26)
Object.Module._extensions..js (module.js:442:10)
Module.load (module.js:356:32)
Function.Module._load (module.js:313:12)
Function.Module.runMain (module.js:467:10)
startup (node.js:136:18)
node.js:963:3
Illegal instruction
raynos at raynos-SVS15127PXB ~
$ cat foo.js
process.on('unhandledException', function onError(err) {
throw err;
});
function foo() {
throw new Error('sad');
}
foo();
Thank you @Raynos for opening this issue! My plan was to give a shot at implementing that to validate/invalidate the concerns I expressed earlier. It shouldn't be too long.
Just to make sure we're on the same page, and that I'm not missing anything, here's the tiny change that I made to node's master:
diff --git a/src/node.cc b/src/node.cc
index ab58627..a91ece3 100644
--- a/src/node.cc
+++ b/src/node.cc
@@ -1011,6 +1011,9 @@ static bool ShouldAbortOnUncaughtException(Isolate* isolate) {
return isEmittingTopLevelDomainError || !DomainsStackHasErrorHandler(env);
}
+static void AbortOnPromiseRejection(PromiseRejectMessage message) {
+ abort();
+}
void SetupDomainUse(const FunctionCallbackInfo<Value>& args) {
Environment* env = Environment::GetCurrent(args);
@@ -1117,7 +1120,6 @@ void SetupPromises(const FunctionCallbackInfo<Value>& args) {
CHECK(args[0]->IsFunction());
- isolate->SetPromiseRejectCallback(PromiseRejectCallback);
env->set_promise_reject_function(args[0].As<Function>());
env->process_object()->Delete(
@@ -4136,6 +4138,8 @@ static void StartNodeInstance(void* arg) {
isolate->SetAbortOnUncaughtExceptionCallback(
ShouldAbortOnUncaughtException);
+ isolate->SetPromiseRejectCallback(
+ AbortOnPromiseRejection);
// Start debug agent when argv has --debug
if (instance_data->use_debug_agent())
and here's what I ran:
[jgilli@dev ~/node]$ cat /tmp/test-promise-throw.js
function bar() {
console.log('boom!');
throw new Error('boom');
}
function foo() {
bar();
}
function boom() {
return new Promise(function (fulfill, reject) {
foo();
});
}
boom();
[jgilli@dev ~/node]$ ./node /tmp/test-promise-throw.js
boom!
Abort (core dumped)
[jgilli@dev ~/node]$
and finally here's the callstack that I get from the core file that was just generated:
[root@00-0c-29-44-bf-9f ~]# mdb /zones/e1a453d7-9b7b-4721-aea6-791d38603cee/cores/core.node.38463
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> ::load /root/mdb_v8_amd64.so
mdb_v8 version: 1.1.1 (release, from 28cedf2)
V8 version: 4.8.271.17
Autoconfigured V8 support from target
C++ symbol demangling enabled
> ::jsstack
native: libc.so.1`_lwp_kill+0xa
native: libc.so.1`raise+0x20
native: libc.so.1`abort+0x98
(1 internal frame elided)
native: v8::internal::Isolate::ReportPromiseReject+0x59
native: v8::internal::Runtime_PromiseRejectEvent+0x110
(1 internal frame elided)
js: rejectPromise
js: Promise
(1 internal frame elided)
js: boom
js: <anonymous> (as <anon>)
(1 internal frame elided)
js: <anonymous> (as Module._compile)
js: <anonymous> (as Module._extensions..js)
js: <anonymous> (as Module.load)
js: <anonymous> (as Module._load)
js: <anonymous> (as Module.runMain)
js: startup
js: <anonymous> (as <anon>)
(1 internal frame elided)
(1 internal frame elided)
native: v8::internal::_GLOBAL__N_1::Invoke+0xb3
native: v8::internal::Execution::Call+0x62
native: v8::Function::Call+0xf6
native: v8::Function::Call+0x41
native: node::LoadEnvironment+0x1e8
native: node::Start+0x518
native: _start+0x6c
>
The problem as I see it is that foo and bar should be in that call stack for post-mortem debugging to be useful.
Correct, foo and bar need to be in that call stack.
The reason they need to be in the call stack is far more evident if we use ::jsstack -v
Seeing the arguments to functions on the stack is absolutely critical to debugging what is going on in bar and foo.
Take for example the classic recursion problem, your trying to debug an infinite recursion exception about "Maximum call stack" you really want to see the functions that are being called recursively. Seeing the nearest promise that caught the recursion gives you less information then preferable.
In fact seeing the stack to the nearest promise is probably as good as looking at the err.stack string in terms of debugging ability.
Thanks for doing this!
This touches on something interesting. You're instrumenting the executor, which runs before new Promise is resolved and returned, and catches any thrown exception and turns it into a rejection โ essentially Promise runs function Promise(executor) { try { executor(resolve, reject) } catch (err) { reject(err) }. While I don't think that's going to change (I could be wrong, it might be something to bring up with the V8 team), it's not quite the same thing as a rejection that happens explicitly, or one that happens as the result of a throw in a success handler. The latter is probably the most important to test, since it's the case that stands to become the most common should the promises API land in core.
For reference, that would look something like this:
'use strict'
const fs = require('fs')
function bar () {
throw new Error('unhandled rejection');
}
function foo () {
return bar();
}
const p = new Promise(function (resolve, reject) {
fs.readFile(__filename, (err, data) => { err ? reject(err) : resolve(data) })
}).then(() => {
foo()
});
I am curious: do you think the overlap between the post mortem debugging and promise-using audiences is such that there are a lot of folks that desire this debugging to be perfect? Or can the promise-using audience rely on advances in the Chrome dev tools, while non-promise-users rely on lldb and mdb + core dumps?
@chrisdickinson in terms of overlap and usages of nodejs in companies:
do you think the overlap between the post mortem debugging and promise-using audiences is such that there are a lot of folks that desire this debugging to be perfect
I think the overlap probably exists in enterprises.
Take for example Uber. We probably have about ~5 nodejs engineers that use mdb and we have ~200 nodejs engineers in total. For the five of us to be able to use mdb we have to tell 200 engineers that do not care about mdb that they cannot use promises.
This definitely shows itself when you have 1 or 2 people writing for example, an "rpc" library or a high performance "json api" library that want to be able to do post-mortem debugging on their library which is embedded in >50 services and touched by >200 engineers.
This becomes especially difficult if an application developer uses a database ORM library that has promises and then calls response.send(obj) inside a then() block and complains to a library engineer that implemented response about a bug or a crash.
Our current response would be to try and debug their issue and then tell them that there is not enough information and we have to add a change their application to use libraries and techniques that enable post mortem debugging.
This particular dynamic of application developers and library authors wrt Promises has not happened at Uber yet, but it will be far more likely to happen if promises are in core.
You're instrumenting the executor
I thought this was what the promisify function is currently doing in #5020. So for instance, if fs.readFile throws on an incorrect argument, my understanding is that the current implementation of promisify will catch that error (and thus unwind the stack) and then turn it into an operational error. So it seems to me it's a use case worth considering.
it's not quite the same thing as a rejection that happens explicitly, or one that happens as the result of a throw in a success handler. The latter is probably the most important to test, since it's the case that stands to become the most common should the promises API land in core.
I would think it's another data point and a great test to add to document and try to improve the post-mortem debugging support for promises. EDIT: since it suffers from similar issues as the previous test case.
I am curious: do you think the overlap between the post mortem debugging and promise-using audiences is such that there are a lot of folks that desire this debugging to be perfect?
I don't think it's about "perfect" debugging, it's about being able to debug anything at all using post-mortem debugging. As a lot of applications are built on top of third-party npm modules, it seems very likely (if not certain) that users of post-mortem debuggers _will_ need to debug applications that use the promises-based APIs of node core using post-mortem debugging tools, or they'll need to make code changes to these modules. Making that more likely to happen _before_ thinking about solutions to the issues mentioned in this thread would seem to be going against the work done by the @nodejs/post-mortem WG.
Or can the promise-using audience rely on advances in the Chrome dev tools
What do Chrome dev tools provide in terms of debugging promises? That sounds interesting!
while non-promise-users rely on lldb and mdb + core dumps?
I don't think post-mortem debugging users will have the luxury of not using any module making use of the new promises-based node core APIs.
I don't think post-mortem debugging users will have the luxury of not using any module making use of the new promises-based node core APIs.
Even if post mortem debugging users have that luxury and control over their dependency tree (some of us do). The moment a caller wraps the callsite entry point into their code with a try/catch it hinders the ability to debug any bugs that occur in production using post mortem techniques.
Sincere thanks to you all for taking the time to dig into this. Promises in core with good dumps would of course be incredible.
promisify will catch that error (and thus unwind the stack)
What do you mean by this? Throwing/catching doesn't affect an error's .stack. It's created when you do new Error() (which does Error.captureStackTrace) and can only changed by the user explicitly after that.
@petkaantonov "Unwinding the stack" is a different case than the captured stack trace โ it means that when the process aborts, the abort call is at the top of stack so that the resulting core file can be inspected to find the original stack leading to the exception.
Oh I see. Btw the PromiseRejectMessage used in unhandled promise rejection has a stack_trace_ property (GetStackTrace()) that we are not using. That's probably why the stack trace reported is wrong/not as helpful as possible.
We need to implement
--abort-on-unhandled-rejectionflag as well.
This flag doesn't make any sense with code you don't control. One needs to use a workaround for certain cases to avoid false positives, that cannot be expected from "low quality" npm modules.
I think it's not possible to implement this. Thrown errors are caught no matter what. Explicitly rejecting should work, but it would result in false positives. For example Promise.reject().catch() would abort instantly
@chrisdickinson
the abort call is at the top of stack so that the resulting core file can be inspected to find the original stack leading to the exception.
The abort() call (or whatever we use to make the process dump core) will always be at the top of the stack, or near the top, because that's what makes a process dump core. What we want to make sure of is that _the function that threw the uncaught error_ is at the top of the stack.
@petkaantonov
Oh I see. Btw the PromiseRejectMessage used in unhandled promise rejection has a stack_trace_ property (GetStackTrace()) that we are not using. That's probably why the stack trace reported is wrong/not as helpful as possible.
The first message in this issue illustrates that one cannot use a process' unhandledRejection event listener to create core dumps that are usable for post-mortem debugging, not that the stack trace is incorrectly reported. The stack trace is correctly reported, it's just that at the time the process' unhandledRejection event listener run, the information that would allow one to use a post-mortem debugger to root cause a problem, such as which functions were on the stack and their parameters, is gone.
@petkaantonov
We need to implement --abort-on-unhandled-rejection flag as well.
This flag doesn't make any sense with code you don't control. One needs to use a workaround for certain cases to avoid false positives, that cannot be expected from "low quality" npm modules.
Could you please be more explicit? What "certain cases" do you have in mind? Is it the Promise.reject().catch() that @vkurchatkin mentioned?
If you don't immediately call .catch() on a promise, and it rejects before you do, then that rejection will be falsely reported as "unhandledRejection". That's why there is the event "rejectionHandled".
Right, this is impossible, it's not an artifact and it'll be the same way with any asynchronous error - those just get silenced with callbacks where promises provide a unhandledRejection hook to help spot the vast majority of them so you're not left in the dark.
Of course, you can access the stack inside the unhandledRejection event handler. You can just log the stack there and then throw.
@petkaantonov
If you don't immediately call .catch() on a promise, and it rejects before you do, then that rejection will be falsely reported as "unhandledRejection". That's why there is the event "rejectionHandled".
Thank you, that's helpful to know! I'll definitely add that to the elements to consider when writing the results of my research.
As @misterdjules notes, yes, let's keep this conversation polite & productive.
It looks like there are a couple misunderstandings on both sides, so I'd like to define some terms here:
unhandledRejection: This term refers to an event fired by Node on behalf of V8. Unlike uncaughtException, it does not represent an unrecoverable failure. A promise will trigger this event when it is rejected if there are no handlers for rejection. Other code may still have a reference to this promise when it becomes rejected, and may add handlers at some point later. At this point, the rejection becomes handled and the rejectionHandled event is fired. In a program using promises, it's an accepted practice to track a Set of rejected promises, noting when they resolved. Many promise users use this technique, other promise users prefer to avoid triggering unhandled rejection โ it's possible to use promises in both fashions, and there are trade offs and use cases for both approaches. This is why Node's default unhandledRejection handler does not terminate the program โ the user needs to tell Node which tradeoffs are more acceptable for their program.
"Unwinding the Stack": This is separate from a stack trace โ which gives an indication of "where" the error occurred, but less information about "how" an control passed to that point. When abort is called, the stack up until the abort call will be present in the resulting core dump file. Debuggers like lldb and mdb can inspect this stack, including being able to pull out the arguments that were passed to the frame โ they get more of an idea of "how" the error occurred after the fact than what a trace provides. When an exception is thrown, it unwinds the stack to the nearest catch clause. If an abort is called as a descendent stack frame from the catch clause, despite receiving _trace_ information, the argument information is lost.
I understand post-mortem debugging very well, and I care for its users, which is why I wanted to put some effort in understanding how the project could understand better how to use them together, and as a result help users of promises too.
Promise rejection in general case is not an indication of "mortem".
@vkurchatkin Continuing the discussion from https://github.com/nodejs/node/pull/5020#issuecomment-180563910:
The code sample you pasted uses try/catch explicitly, and thus doesn't abort.
No, it's code of the lib. User doesn't explicitly catch anything. So there is no guarantee that lib doesn't break post mortem even if it doesn't use promises
Right, the point is not necessarily that it's in a library or not, it's about differentiating programmer errors and operational errors. Users of post-mortem debugging handle programmer errors by generating a core file that can be inspected later to root-cause the problem.
The issue with promises is that so far and If I understand them correctly, programmer errors and operational errors are not handled differently, and so it's impossible to differentiate them. Thus it seems promises in their current implementation are not compatible with post-mortem debugging.
Adding a promises-based API in core means that core accepts and supports not being able to differentiate between the two types of errors, which I feel contradicts the work of the @nodejs/post-mortem WG and the current support for post-mortem debugging.
Similar (not identical) issues have been caused by the addition of domains in core. Luckily, some work has been done to mitigate these issues, maybe the same work can be done with promises at least to inform the project on whether or not we want to accept promises-based API in core?
@petkaantonov
Promise rejection in general case is not an indication of "mortem".
Right, and that's not what I suggested. See my response above to @vkurchatkin:
[Post-mortem debugging is] about differentiating programmer errors and operational errors. Users of post-mortem debugging handle programmer errors by generating a core file that can be inspected later to root-cause the problem.
The issue with promises is that so far and If I understand them correctly, programmer errors and operational errors are not handled differently, and so it's impossible to differentiate them.
The issue with promises is that so far and If I understand them correctly, programmer errors and operational errors are not handled differently, and so it's impossible to differentiate them.
You can do different kind of error handling depending on the context, the errors type and properties of the error.
Suppose the program is leaking database handles. You will get handle allocation error, in this case you would abort the process and restart, to free all the leaked handles..
Suppose you get TypeError or ReferenceError or such. There might not be any need to abort, you can drop whatever context was related to this (e.g. one request).
Suppose you get typical operational error like TimeoutError, in this case you would retry. Or if you have already done e.g. 5 retries, you would report it to the user.
@misterdjules:
[Post-mortem debugging is] about differentiating programmer errors and operational errors. Users of post-mortem debugging handle programmer errors by generating a core file that can be inspected later to root-cause the problem.
I sort of disagree with this definition: this describes an aspect of post-mortem debugging, but not post-mortem debugging as whole. I would broaden the definition โ post-mortem debugging is about having the necessary tools and information available to piece back the series of events leading to an unexpected state in the program after the program has terminated. Determining the provenance of an error is a task one should be able to perform, but it's not the whole of the thing.
post-mortem debugging is about having the necessary tools and information available to piece back the series of events leading to an unexpected state in the program after the program has terminated
But even currently you have no such thing since you only have the last event? In that case I'd argue the current promise setups in userland are already doing this much better with long stack traces that you can log in the unhandledRejection handler. This is currently unimplemented by Node.js for native promises, but it's in DevTools so I assume there is API for it.
@petkaantonov
Suppose the program is leaking database handles. You will get handle allocation error, in this case you would abort the process and restart, to free all the leaked handles..
The problem(s) I'm describing is that, when using promises, I haven't found a way so far to call abort() (or any other API that makes a process generate a core dump) in a way that generates a core dump that preserves the call stack _at the exact same state when the error occurred_.
@petkaantonov: I think this is hitting on the crux of the matter. Chrome Developer Tools have been adding very useful debugging tools for Promises. These include long stack traces, settling times, and automatic unhandledRejection/rejectionHandled tracking. I think this points at the need for collaboration between the dev tools team, post-mortem WG, and CTC (which may exist already, but I'm not sure โ @misterdjules could probably speak to that!)
@chrisdickinson
I sort of disagree with this definition
And you're absolutely correct, I wasn't trying to _define_ post-mortem debugging. As you said, I was describing one aspect of post-mortem debugging that seems to be conflicting with promises. My apologies for the misunderstanding, English is not my native language (as you probably already realized :) ).
The problem(s) I'm describing is that, when using promises, I haven't found a way so far to call abort() (or any other API that makes a process generate a core dump) in a way that generates a core dump that preserves the call stack _at the exact same state when the error occurred.
Is this about the C++ stack? For JS stack as I mentioned you can log the stack trace before aborting, and it will potentially not only be for the last event only but the whole chain of events.
@petkaantonov: The stack at process.abort() time will include C++ _and_ JS frames, plus all inspectable arguments to each frame. The trace will not contain this information โ it can only point to "where", now "how."
@misterdjules:
And you're absolutely correct, I wasn't trying to define post-mortem debugging. As you said, I was describing one aspect of post-mortem debugging that seems to be conflicting with promises. My apologies for the misunderstanding, English is not my native language (as you probably already realized :) ).
Ah! My apologies! Sorry for misunderstanding.
@chrisdickinson About arguments I see. It looks like there is no Error.prepareStackTrace anymore either. For C++ frames I imagine those are just noise to users or why would I care about them as application developer?
when using promises, I haven't found a way so far to call abort() (or any other API that makes a process generate a core dump) in a way that generates a core dump that preserves the call stack _at the exact same state when the error occurred.
You can't do that reliably with callbacks either
@misterdjules:
The problem(s) I'm describing is that, when using promises, I haven't found a way so far to call abort() (or any other API that makes a process generate a core dump) in a way that generates a core dump that preserves the call stack _at the exact same state when the error occurred.
This is a tradeoff of the primitive. If you picture the typical Node program as a directed graph, where graph nodes represent asynchronous operations and edges represent the transfer of control, callbacks and promises both provide a way to draw that graph. They make some different tradeoffs with regards to error propagation, though. Callbacks, in particular, make it possible to differentiate between propagating an error โ via cb(err), and panicking, that is, throw err. Panic semantics mean that every node of the async operation graph has an implicit edge to the "uncaught exception" handler. Additionally, propagation of an error can only advance one node away from the current node at a time, and each node must choose what to do with the error.
Promises zig where callbacks zag. In the graph they draw, there is no _implicit_ edge to the unhandledRejection handler โ those edges are _always_ explicit, in that you have to reject before adding a handler in order to reach that point, and once that edge is drawn, it will _always_ be traversed. Further unlike callbacks, operation nodes may "opt in" to error handling. The flip side is that a node may propagate an error along more than one edge at a time โ it flows until a node "pays attention" to errors. If there are multiple outgoing edges from a node, every edge will have to be handled, or else a connection is drawn to the unhandled rejection node. The implication that is most salient to this conversation, though, is that promises have no "panic" โ no way of breaking entirely out of the operation graph from any operation.
@vkurchatkin: mdb (and lldb-v8) can do some pretty amazing things with core dump information โ it's possible to inspect the error that crashed the program from inside a debugger shell running against such a core.
@petkaantonov: The C++ frames can be useful in some cases โ you can hop through those frames in the debugger to figure out specifics about how the event loop was entered, or how native modules may have manipulated data in frames between two JS frames.
I've tried the following in Chrome:
setTimeout(function() {
var obj = {};
Promise.resolve().then(function() {
var badProp = 'stuff' + Math.random();
obj[badProp].foo = 'bar';
}).then(() => {});
}, 1000);
If you turn Pause on exceptions on, it actually does the right thing and allows to inspect local vars (badProp). So I guess there could be a way to do this after all.
Yeah looks like SetDebugEventListener from here is an untapped gold mine https://github.com/nodejs/node/blob/master/deps/v8/include/v8-debug.h
@vkurchatkin: Are you familiar with mdb? If not โ it's pretty cool and worth a look. Its abilities appear to have directly informed the goals of the post-mortem WG. Perhaps @misterdjules has a link to a quick demo? If you're already familiar please ignore this comment!
@chrisdickinson it doesn't work on linux or OS X as far as I know
@vkurchatkin Yeah โ that's the difficult part. You can kind of get an idea for some of its potential through lldb-v8, but it might be better to watch a quick video on what it can do to get an idea of where @misterdjules (and the post-mortem WG) is coming from.
@chrisdickinson I tried it, but it seems out of date. https://github.com/indutny/llnode seems promising but requires compiling lldb. Anyway, I've seen a bunch of videos, so I know what this is about
@vkurchatkin Ah! My apologies. Please disregard!
@vkurchatkin
when using promises, I haven't found a way so far to call abort() (or any other API that makes a process generate a core dump) in a way that generates a core dump that preserves the call stack _at the exact same state when the error occurred.
You can't do that reliably with callbacks either
Yes, that's what's done in Isolate::Throw, and that's how --abort-on-uncaught-exception is implemented.
Yes, that's what's done in Isolate::Throw, and that's how --abort-on-uncaught-exception is implemented.
This doesn't work if someone catches and rethrows an error
You can handle the Exception DebugEvent and manually PrintStackTrace and abort() there
I don't think it's something that can be turned on in production. It looks like it debugger analyses every throw. If it happens in a context of a promise it tries to predict if it will be caught: https://github.com/nodejs/node/blob/master/deps/v8/src/debug/debug.cc#L1799
Yes, that's what's done in Isolate::Throw, and that's how --abort-on-uncaught-exception is implemented.
This doesn't work if someone catches and rethrows an error
Absolutely, which is why users of post-mortem debugging carefully use try/catch blocks and throw statements. The problem that we're describing here is that promises implicitly add that try/catch block, and thus don't give you a choice of what errors are caught or uncaught.
@misterdjules: How feasible do you think it would be for the V8 stack trace capture mechanism to additionally capture parameter values? If feasible, do you think that would make the case where the core contains an error object (but an undesirable stack state) more tenable?
Absolutely, which is why users of post-mortem debugging carefully use try/catch blocks and throw statements
If you need to carefully use try catch and throw anyway, then what is the point of this issue? You dont need to be particularly careful about not using promises.
@chrisdickson @mrdjules @vkurchatin
As an aside, Uber has a hacked up private fork of mdb that kind of works on Linux. Might be worth starting a conversation with post-mortem about Linux support.
@petkaantonov
Absolutely, which is why users of post-mortem debugging carefully use try/catch blocks and throw statements
If you need to carefully use try catch and throw anyway, then what is the point of this issue? You dont need to be particularly careful about not using promises.
Because so far a lot of node's core (domains is a counter example of that) has been written in a way that core APIs throw when an error cannot be dealt with (on a programmer error, on an 'error' event being emitted but no handler being set, etc.). Users of post-mortem debugging rely on that convention to make their process abort and generate a core dump when such an error is thrown, and they _can't handle it themselves with a try/catch block_.
Because promises add an implicit try/catch block around users' code, it seems that convention isn't respected anymore.
@Raynos
As an aside, Uber has a hacked up private fork of mdb that kind of works on Linux. Might be worth starting a conversation with post-mortem about Linux support.
That would be great! Would you mind opening an issue in nodejs/post-mortem?
Because promises add an implicit try/catch block around users' code, it seems that convention isn't respected anymore.
Yes but you need to use promises in order for this to happen. If you dont use promises then you are not affected. So non promise users who are not affected shouldnt block promise progress for thos who actually use them.
@petkaantonov
Because promises add an implicit try/catch block around users' code, it seems that convention isn't respected anymore.
Yes but you need to use promises in order for this to happen. If you dont use promises then you are not affected.
What if I want to use the new promises-based core APIs and post-mortem debugging? How do we communicate that the two are basically incompatible? Like I said before, domains have similar issues with post-mortem debugging. I've fixed some of these issues and had to explain inconsistencies to many users. I think having two incompatible tools in the core runtime brings a lot of confusion and in the long term doesn't serve users and the project. Does it hurt to try to find a way to make them work together? That's all I'm trying to do here.
And what if I can't avoid using the new promises-based core APIs? If core provides a promises-based API for most of its modules, then I would expect at least a significant part of the modules ecosystem to use them, otherwise, why bother doing it? A lot of node programs are built on top of small third-party modules, and post-mortem debugging users like myself unfortunately won't have the time to rewrite code that migrates to using core's promises based APIs.
Of course any of those third-party modules could today wrap all their code in a giant try/catch, thus having a negative impact on post-mortem debugging too, but that's not what most of them are doing, because it's not the convention that has been built so far. By moving to using core's promises based APIs though, it seems that these modules, and thus the applications that use them, would get closer to that. Is it worth it to discuss and investigate how we can understand that better and maybe find solutions if we determine there are problems to consider? I think so.
So non promise users who are not affected shouldnt block promise progress for thos who actually use them.
I am not blocking anything, in fact I _cannot_ block anything because that's not how the project works.
I have raised questions around the impact of #5020 on post-mortem debugging, and I've provided some facts and code to illustrate these questions. We've already made progress in this discussion, and I'm sure we can make a lot more progress in the weeks to come to answers some of these questions. I believe answering these questions only helps the project and its users.
Like @chrisdickinson mentioned before, the worse thing that can happen by doing that is that we understand better the potential problems, and by doing so better serve our users and the project.
@misterdjules Indeed, I'm happy you broached this concern! I'm looking into what information V8 is privy to when generating stack traces, and it _looks_ like it may contain the parameter values for each frame. If this true it opens an interesting venue for investigation: if we can preserve this information on the error object somehow instead of throwing it away when capturing the trace, this might greatly improve post-mortem capabilities in the general case, while resolving many of the concerns post-mortem has with the Promises PR.
Specifically, this would relax the need for a process to abort at a given location on the stack โ if the process aborts anywhere with an Error object on-stack, one could inspect the Error object for the frame information that would previously be discovered via ::jsstack. Aside from parameter information, is there any other per-frame info that is necessary from the point of view of the post-mortem WG?
@chrisdickinson
Aside from parameter information, is there any other per-frame info that is necessary from the point of view of the post-mortem WG?
It's not really that post-mortem debuggers need a specific set of data and we could try to provide that data in different ways, but more that the core files they inspect need to be generated as soon as possible after the error happened, and without loosing _any_ state.
For instance, what if I want to inspect the content of a buffer instance? That content is in the heap, how do I know that that content has not been overwritten by some code that runs after the error has happened and before the process aborts if between these two events node and V8 can basically do a lot of extra processing, including garbage collection, memory allocation/deallocation, etc?
What happens if I want to inspect the content of the stack frames but this content has been overwritten by calls to functions that handle errors after the uncaught error was thrown and before the process aborts?
What if I want to inspect the state of globals objects that may have changed in the meantime too? These are just examples of the same idea that core files should reflect the _exact_ state of the process at the time of the error.
Does that answer your question?
Just for fun, I experimented with a small hack that provides us with the ability to dump a core file at the time an uncaught error is thrown from within a promise's resolver.
This hack currently covers only one specific use case that was mentioned in this issue, but I believe if we determine it's an approach worth considering, it wouldn't be too difficult to try to cover all other use cases.
As seen in node's output when running this script and in mdb's ::jsstack's output, the core file was generated at the time the error was thrown, and thus all the frames that are needed are on the stack.
I don't know yet if this hack can be considered a valid proof of concept, most importantly because contrary to the current behavior of --abort-on-uncaught-exception and throwing outside of promises, this would make throwing from within promises completely different depending on whether the flag is on or not.
At the very least, all other cases where an error can be thrown from a promise (not only from the resolver) would need to be handled, but it can be a starting point for discussing a potential fix for this problem.
@misterdjules Maybe I missed it in the discussion but is there a reason something like this wouldn't work?
diff --git a/src/node.cc b/src/node.cc
index 20be309..dde2be1 100644
--- a/src/node.cc
+++ b/src/node.cc
@@ -145,6 +145,7 @@ static bool use_debug_agent = false;
static bool debug_wait_connect = false;
static int debug_port = 5858;
static bool prof_process = false;
+static bool v8_abort_on_uncaught_exception = false;
static bool v8_is_profiling = false;
static bool node_is_initialized = false;
static node_module* modpending;
@@ -1094,6 +1095,9 @@ void SetupNextTick(const FunctionCallbackInfo<Value>& args) {
}
void PromiseRejectCallback(PromiseRejectMessage message) {
+ if (v8_abort_on_uncaught_exception)
+ abort();
+
Local<Promise> promise = message.GetPromise();
Isolate* isolate = promise->GetIsolate();
Local<Value> value = message.GetValue();
@@ -3864,10 +3868,12 @@ void Init(int* argc,
// behavior but it could also interfere with the user's intentions in ways
// we fail to anticipate. Dillema.
for (int i = 1; i < v8_argc; ++i) {
- if (strncmp(v8_argv[i], "--prof", sizeof("--prof") - 1) == 0) {
- v8_is_profiling = true;
- break;
- }
+ const char* const arg = v8_argv[i];
+#define V(flag, var) if (strncmp(arg, flag, sizeof(flag) - 1) == 0) var = true
+ V("--abort-on-uncaught-exception", v8_abort_on_uncaught_exception);
+ V("--abort_on_uncaught_exception", v8_abort_on_uncaught_exception);
+ V("--prof", v8_is_profiling);
+#undef V
}
#ifdef __POSIX__
The issue I see with your changes to promise.js is that they look to be breaking promise debugging when --abort_on_uncaught_exception is on.
@bnoordhuis That would abort() even when the rejection originated from explicit reject() call and not just from thrown exceptions. Such reject calls are normally coming from "operational errors" where abort() is usually inappropriate.
@bnoordhuis
Maybe I missed it in the discussion but is there a reason something like this wouldn't work?
In addition to what @petkaantonov said, I had experimented with such a solution earlier and described why it isn't suitable.
The issue I see with your changes to promise.js is that they look to be breaking promise debugging when --abort_on_uncaught_exception is on.
OK, I'll look into that when I have more time tomorrow. Thank you for taking a look and giving your feedback!
@misterdjules: Yes โ though it seems to me that in many cases, aborting at the top of stack may not be possible. I _suspect_ that this will be true of most promise-based programs. However, perhaps we can broaden the applicability of post-mortem debugging techniques to include programs that use catch (or promises) by recording more information when generating error objects.
The ephemeral stack data would be preserved and persisted by the error object, and the heap _should_ be kept static for the most part. Only a small gap is introduced between entering the exceptional state and the abort process, while relaxing the need to exit at top of stack somewhat.
In a promise chain like so:
A = promise()
B = A.then(op)
C = B.then(op)
If A rejects, the error will be propagated _on next tick_ to B. If B rejects, the error will be propagated _on next tick_ to C. C, having no handlers, calls the unhandledRejection event handler. By this time, the stack from A has unwound. However, we _are_ left with an Error object, and that Error object _could_ be made to record the additional stack information we desire at the VM level.
(I posted to this effect in the post-mortem WG thread as well.)
V8 fires "unhandled rejection" and "rejection handled" events even when the debugger is not active. When the debugger is active however, V8 collects additional information when Promises are executed. I believe the problems you are talking about are all solved problems if the debugger is active. See https://github.com/v8/v8/tree/master/test/mjsunit/es6/debug-promises
Short explanation: promise rejections are treated the same way as exception events in the debug listener. Rejected promises that have a reject handler are considered caught exceptions, those without are considered uncaught. The exception event is fired at the point of the rejection, so the stack has not been unwound yet. The promise object itself can be retrieved from the event data object.
You can also use the promise mirror to get some reflection: https://github.com/v8/v8/blob/master/test/mjsunit/es6/mirror-promises.js
Another gem you might be interested in: Chrome DevTools offers async stack traces. With that, you can actually see the asynchronous call chain leading up to Promise rejections.
@hashseed
I believe the problems you are talking about are all solved problems if the debugger is active
This issue is about being able to investigate _production_ issues using core dumps. Enabling the debugger _in production_ is, as far as I know, not standard practice. I would imagine that the impact on performance would not be acceptable, but I'd welcome any data that suggests otherwise.
Another gem you might be interested in: Chrome DevTools offers async stack traces. With that, you can actually see the asynchronous call chain leading up to Promise rejections.
Does that imply live debugging? If so that is outside of the scope of this issue, which is about post-mortem debugging in production.
I see. Then the debugger indeed won't be a help. Note however that I don't expect the debugger to impact performance aside from calling to the callback on exception events though. JIT code only run slow if a break point has been set to it.
Closing in favour of https://github.com/nodejs/post-mortem/issues/16. For posterity, issue title is misleading: unhandled rejections in fact DO have the correct stack traces (that is, error.stack properties).
Most helpful comment
@chrisdickinson in terms of overlap and usages of nodejs in companies:
I think the overlap probably exists in enterprises.
Take for example
Uber. We probably have about ~5 nodejs engineers that usemdband we have ~200 nodejs engineers in total. For the five of us to be able to usemdbwe have to tell 200 engineers that do not care aboutmdbthat they cannot use promises.This definitely shows itself when you have 1 or 2 people writing for example, an "rpc" library or a high performance "json api" library that want to be able to do post-mortem debugging on their library which is embedded in >50 services and touched by >200 engineers.
This becomes especially difficult if an application developer uses a database ORM library that has promises and then calls
response.send(obj)inside athen()block and complains to a library engineer that implementedresponseabout a bug or a crash.Our current response would be to try and debug their issue and then tell them that there is not enough information and we have to add a change their application to use libraries and techniques that enable post mortem debugging.
This particular dynamic of application developers and library authors wrt Promises has not happened at Uber yet, but it will be far more likely to happen if promises are in core.