function foo()
try
error("error in foo")
catch e
end
end
function bar()
try
error("error in bar")
catch e
foo()
rethrow(e)
end
end
bar()
produces the misleading
ERROR: error in bar
Stacktrace:
[1] foo() at ./REPL[1]:3
[2] bar() at ./REPL[2]:5
This just hit me in a really wicked situation: The stacktrace actually lead me to a line erroneously throwing the reported exception. I fixed it (to throw the correct exception), but the same error kept showing, leaving me completely puzzled for quite some time...
We should somehow clarify that if you care about the backtrace of an exception, you need to call catch_backtrace immediately.
I've been trying to figure out why the backtraces are mostly corrupted when my Retry.jl package is used. I have always assumed that my naive macro hacking was messing up the line number metadata in the AST somehow. I spent a long time chasing that down, but I now think the root of the problem is this issue.
It seems there are three problems: the backtrace can be empty; the rethrown backtrace can be wrong; or the rethrown error and backtrace can be wrong.
The minimal cases I've boiled it down to follow:
test1() -- works_Julia Version 0.7.0-DEV.2098 (2017-10-10 11:37 UTC)_
function f()
throw("Error in f()!")
end
function test1()
try
f()
catch e
if e == "foo"
println("Got foo!")
end
rethrow(e)
end
end
ERROR: LoadError: "Error in f()!"
Stacktrace:
[1] f() at /Users/sam/julia/v0.5/Retry/src/extest.jl:2
[2] test1() at /Users/sam/julia/v0.5/Retry/src/extest.jl:8
...
in expression starting at /Users/sam/julia/v0.5/Retry/src/extest.jl:50
md5-030ecba154a2af0ddd85287fa1f06262
catch!
ERROR: LoadError: "Error in f()!"
in expression starting at /Users/sam/julia/v0.5/Retry/src/extest.jl:50
### `test3()` -- Wrong stacktrace
```julia
function isfoo(x)
try
x.code == "foo"
catch e
false
end
end
function test3()
try
f()
catch e
if isfoo(e)
println("Got foo!")
end
rethrow(e)
end
end
ERROR: LoadError: "Error in f()!"
Stacktrace:
[1] isfoo(::String) at /Users/sam/julia/v0.5/Retry/src/extest.jl:29
[2] test3() at /Users/sam/julia/v0.5/Retry/src/extest.jl:43
...
in expression starting at /Users/sam/julia/v0.5/Retry/src/extest.jl:50
md5-eaba1d49975ac58dc4697fe857c3c82d
ERROR: LoadError: type String has no field code
Stacktrace:
[1] isfoo(::String) at /Users/sam/julia/v0.5/Retry/src/extest.jl:29
[2] test3() at /Users/sam/julia/v0.5/Retry/src/extest.jl:43
```
@JeffBezanson I'm not sure how to apply your advice that "you need to call catch_backtrace immediately".
I see that the runtime calls catch_backtrace() in _start().
It seems like maybe I need to do tmp = catch_backtrace() at the top of catch and set_global_catch_backtrace(tmp) before rethrow() to undo any backtrace corruption that happens in the catch block, but I can't see anything like set_global_catch_backtrace.
I found that exec_program in repl.c does a manual save and restore of the ptls->bt_data.
Maybe record_backtrace should save the old ptls->bt_data before overwriting it if there is already a current ptls->exception_in_transit ?
Based on reading the issues references above, and looking at the code,
It seems that rethrow only works if the catch block does not:
Is that correct? @vtjnash @yuyichao @Keno @s2maki
related to #20784 ?
I've definitely seen the case for test4 cause the wrong exception to be thrown, and the missing stack trace problem from test2. Our code that exhibits this behavior definitely seems to fit these examples.
I think the fastest way to fix this is to add a method of rethrow that accepts a backtrace; that will at least allow it to be preserved manually via
catch e
bt = catch_backtrace()
...
rethrow(e, bt)
end
Here's a patch for 0.6:
--- a/src/task.c
+++ b/src/task.c
@@ -576,6 +576,16 @@ JL_DLLEXPORT void jl_rethrow_other(jl_value_t *e)
throw_internal(e);
}
+JL_DLLEXPORT void jl_rethrow_other_bt(jl_value_t *e, jl_value_t *bt)
+{
+ size_t btsz = jl_array_len(bt);
+ size_t sz = btsz > JL_MAX_BT_SIZE ? JL_MAX_BT_SIZE : btsz;
+ jl_ptls_t ptls = jl_get_ptls_states();
+ memcpy(ptls->bt_data, jl_array_data(bt), sz * sizeof(void*));
+ ptls->bt_size = sz;
+ throw_internal(e);
+}
+
JL_DLLEXPORT jl_task_t *jl_new_task(jl_function_t *start, size_t ssize)
With wrapper:
rethrow(e, bt::Array{Ptr{Void},1}) = ccall(:jl_rethrow_other_bt, Union{}, (Any,Any), e, bt)
This can be added to 0.7 as well though the patch will be significantly different. In fact we should probably fully deprecate rethrow to throw (allowing 1 or 2 arguments).
Naively it seems possible to fix this and make catch_backtrace/rethrow reliable by saving and restoring the backtrace data appropriately - somewhat similarly to jl_apply_with_saved_exception_state but with various complications.
Is it worth considering this approach or is there some fundamental reason why it won't work?
The problem is that collecting the backtrace is expensive, so you only want to do that if you're actually gonna save it. It is quite possible however to have some sort of magic symbol that the compiler looks for to see if it needs to save the backtrace (which could but need not be syntactically exposed). @JeffBezanson said there was an issue about that at some point, but I couldn't find it.
I spent a few hours poking around in the runtime. The backtrace may be expensive, but it looks like the cost of recording it is unconditionally borne inside jl_throw. The only way to prevent that is to somehow have a flag set in the error handler state. A syntactic hint like this old proposal: https://gist.github.com/vtjnash/9cd53434cc7f7e2f7a29 would allow that.
But that seems like a distraction - what I'm proposing is that if there's a current backtrace, we should record it when entering a try block, and restore it when leaving. Seems like a bit of pointer shuffling may be all that's required in jl_enter_handler/jl_pop_handler? And it should only hurt if you've already thrown an exception.
For context, I'm just mulling over what to do about Jeff's comment at https://github.com/JuliaLang/julia/pull/25370#discussion_r160065706 and wondering if a solution to the underlying problem is within reach.
all that's required
Hah, I should have known better :-) . I started prototyping a solution, but restoring the exception can't happen inside jl_pop_handler / jl_eh_restore_state because that occurs before the code in the catch block (duh!). There's also some careful gc rooting required.
I now think we should generalize the exception_in_transit to be a stack of (exception,backtrace) states in transit. If you throw Exception2 while handling Exception1, you'd like to know about Exception1 as the root cause while further handling Exception2. This would mean jl_throw needs to push the new exception onto the stack. The lowering of catch blocks would need a final step added to pop the exception stack back to the state it was when the try was entered. I think I can see a trick to give JL_CATCH the same behavior.
To address the task switching problems, the current exception stack would need to be saved and restored during context switching which means more copies of bt_data floating around.
If we did all this, I believe catch_backtrace() can be made reliable and there would be no measurable performance penalty outside a catch block.
@c42f a stack of backtraces is what I had in mind https://github.com/JuliaLang/julia/issues/19979#issuecomment-344448380 ...
Maybe record_backtrace should save the old ptls->bt_data before overwriting it if there is already a current ptls->exception_in_transit ?
Yes, a stack of backtraces is a good idea. That would let us keep catch_backtrace and rethrow, and fix finally as well. rethrow() is also more efficient than having the user (or compiler) call catch_backtrace() and pass the result manually, since that requires allocating more julia objects to reflect the data safely.
This reminds me of an old niggling thought: can we do better that the catch + rethrow API?
Can we do it in a way that discourages accidental loss (non-rethrowing) of important errors?
Can it be done in a way that makes requests for type-based catch redundant?
peek e as an alternative to catch e but with no need to rethrow. i.e. the error still propagates up the stack.
try
...
peek e
@warn "my code caused an error" e
end
catch e where automatic rethrow is the default unless delete!(e) is called
try
...
catch e
e isa IOError && e.code == :FOO && delete!(e)
end
@samoconnor that's an interesting idea; I think the key thing you're doing with the peek is capturing state from the scope where the peek occurs, and communicating it to an outer scope.
I think a better way to do this might be to add a new exception to the stack - that is, instead of the peek, to simply have:
try
...
catch
throw(MyException("my code caused an error"))
end
Notably, because this throw occurs before the end of the catch block, the previous exceptions in the stack are preserved. So in the outer scope you have the full context, but you can also capture and transport arbitrary information in MyException.
In general I think the pattern of logging an exception and rethrowing it isn't great: Somewhere at an outer scope the exception will also be caught and reported in some way (probably via logging but without rethrowing). Logging and rethrowing just leads to duplicate error information due to several catch/log/rethrows down the stack. This is very confusing in log files. I'd say it's much better to preserve the information and make it available as a full stack of exceptions at outer scope instead.
I now think we should generalize the exception_in_transit to be a stack of (exception,backtrace) states in transit.
Hi @c42f,
Have you been able to make any progress towards this?
I have a protocol layer stack where there are multiple layers that need to: catch errors, inspect error type, clean up layer-local resources, then retrhow. As things stand, the stack traces are almost always corrupted or missing by the time the exceptions reach the application layer. As things stand, I often find myself commenting-out all the intermediate try/catches to get a clean look at the original stack trace. This is especially frustrating because of catch catching all exception types. I have this hierarchy of fine grained IO exception handlers, but I have to disable it all if I make a typo in some code that leads to an argument error, or and undefined var error or a bounds error.
It would be _really awesome_ to have rethrow working :)
(This reminds me of: https://github.com/JuliaLang/julia/pull/15906)
It's an interesting problem but I haven't made any implementation progress. My original motivation was to solve https://github.com/JuliaLang/julia/pull/25370#discussion_r160065706, but in the end it didn't seem that fixing catch_backtrace() would help much. (Not to detract from the host of other good reasons to do this.)
I had a 24 hour plane journey so I started looking into this in more detail. Here's some notes I made for myself about how exceptions and backtraces are implemented in the runtime (perhaps could become devdocs if there's interest?)
In the runtime, backtraces are recorded by calling the low level function
jl_unw_stepn. jl_unw_stepn can fill in two C arrays:
ip which is an array of native instruction pointers mixed with interpretersp which is an array of pointers to native stack frames containing theFor use from julia code, jl_unw_stepn is wrapped by jl_backtrace_from_here
in order to return arrays of julia types. For use internally by the runtime,
jl_unw_stepn is called directly from various functions. For instance,
throwing an exception using jl_throw records the backtrace into a raw ip
buffer attached to the thread local storage. This is a bare minimum of raw
information for efficiency and needs to be further converted into julia-level
information when calling catch_backtrace and associated functions.
The julia runtime executes programs as a mixture of interpreted and compiled
code. Therefore, a comprehensive instruction pointer array must contains a
mixture of interpreter program counters and native instruction pointers:
(uintptr_t)-1, followed by a jl_value_t pointer to theMethodInstance being interpreted, and an integer program counter indexingvoid pointer to the nativeTo get stack traces to work when interpreted and compiled code are mixed
requires a bit of black magic to detect the interpreter frames using the raw
instruction pointer, and extract the interpreter state from the native stack.
This is done by always entering the interperter via a special thunk
enter_interpreter_frame, whose code address can be known to the runtime. The
stack layout in this hand written assembly function is also carefully
controlled so that the interpreter state can be extracted in a well defined
way.
Julia exception handling is built on top of the C setjmp/longjmp mechanism
with some additional logic in jl_eh_restore_state to restore the state of the
task when a longjmp occurs. This additional logic includes:
The state to be restored must be recorded at the try site and be accessible at
the catch site both in C and in julia. In both cases it's ultimately stored on
the stack in a jl_handler_t which is initialized by jl_enter_handler and
consumed by jl_eh_restore_state.
The C mechanism is as follows:
JL_TRY macro introduces a local variable of type jl_handler_t,setjmp, and calls jl_eh_restore_state if noJL_CATCH adds the else clause for entry when an exception triggers alongjmp. This includes a call to jl_eh_restore_state which uses thejl_handler_t variable to restore task local state.The julia codegen mechnism essentially mirrors the C version, but is a little
more complicated as lowering, code generation, and an LLVM optimization pass
are involved. Briefly, lowering turns try/catch into
Expr(:enter) and Expr(:leave) in the lowered linearized code. These are
then translated during LLVM codegen into calls to the C runtime functions given
above, by way of an intermediate dummy function @julia.enter and the LLVM
pass LowerExcHandlers. (Note that local jumps caused by break and @goto
also abandon a try block and are also lowered to Expr(:leave).)
When an exception is raised with jl_throw:
jl_handler_t for the current task is used to get the approriatejmp_buf which will reset execution to the current exception handler up theThe runtime can raise exceptions using jl_throw; this works the same way as
from julia.
There's also some special circumstances which require extra care:
These are caught by the runtime using signal handlers (for example, stack
overflows are detected by protecting a memory page on the known stack boundary,
and catching the SIGSEGV which results if the stack spills onto this page).
Crucially, all code inside a signal handler must be async-signal-safe - for
example, we cannot allocate memory using malloc or call any function which
does so.
To follow on from the above, a possible implementation plan:
First, fix the task-vs-thread local state bugs in the current implementation. This should give experience going to the next step:
exception_in_transit from thread local to task local storage in _jl_task_t, orTo tie things off, replace the task-based exception/backtrace with a stack:
try is entered but no exception occurs?I've been slowly poking away at this. It looks reasonably easy to preserve the exception and backtrace data during task switching - that's just a few changes to the runtime C code.
The more tricky problem is to insert an exception stack pop intrinsic (which I've currently called Expr(:pop_exc)) into the lowered code and more particularly to get it flowing through the various layers of optimization correctly. The desired lowering is something along the lines of
Before()
try
InTry()
catch
InCatch()
end
After()
lowers to
Before()
enter c1
InTry()
leave 1
goto e1
c1:
leave 1
InCatch()
pop_exc #<- Needs to pop exception stack back to the state at associated `enter`
e1:
After()
The state of the stack at enter can be recorded in the jl_handler_t and will be dynamically available when the associated leave executes. We need a little of this state available in the associated pop_exc.
I've got three ideas:
enter in the pop_exc expression but it ended up feeling slightly unnatural to propagate that label during SSAIR processing, particularly since it's not used as a jump target.enter, and make pop_exc take that token as an argument. As I understand it, this is the kind of thing that the LLVM Token type was introduced for, and would then have a nice lowering to LLVM IR before we get to the LowerExcHandlers pass.LowerExcHandlers, provided the IR isn't reordered by any optimizations. I'm starting to think this would be the easiest as LowerExcHandlers already has similar assumptions.@keno any thoughts?
It turns out that of the ideas above, producing a token from enter plays extremely nicely with the implicit SSA variables in SSAIR, but also is easy to work with in both the interpreter and in codegen. So I'm going with that for now.
Here's a PR which fixes this: #28878
Fixed in #28878
Most helpful comment
Hah, I should have known better :-) . I started prototyping a solution, but restoring the exception can't happen inside
jl_pop_handler/jl_eh_restore_statebecause that occurs before the code in the catch block (duh!). There's also some careful gc rooting required.I now think we should generalize the
exception_in_transitto be a stack of (exception,backtrace) states in transit. If you throwException2while handlingException1, you'd like to know aboutException1as the root cause while further handlingException2. This would meanjl_throwneeds to push the new exception onto the stack. The lowering ofcatchblocks would need a final step added to pop the exception stack back to the state it was when thetrywas entered. I think I can see a trick to giveJL_CATCHthe same behavior.To address the task switching problems, the current exception stack would need to be saved and restored during context switching which means more copies of
bt_datafloating around.If we did all this, I believe
catch_backtrace()can be made reliable and there would be no measurable performance penalty outside a catch block.