Crystal: Improve backtraces

Created on 9 Jun 2016  路  18Comments  路  Source: crystal-lang/crystal

While porting some Ruby code to Crystal, I ran some specs and found that the error backtraces were missing huge chunks of code frames. It was very painful to debug the issues due to incomplete backtraces, often missing line numbers. Here's an example:

       cast to Int failed
       [4525793250] *CallStack::unwind:Array(Pointer(Void)) +82
       [4525793153] *CallStack#initialize<CallStack>:Array(Pointer(Void)) +17
       [4525793112] *CallStack::new:CallStack +40
       [4526322281] *Exception@Exception#initialize<TypeCastError, String>:CallStack +41
       [4526322205] *TypeCastError#initialize<TypeCastError, String>:CallStack +29
       [4526322148] *TypeCastError::new<String>:TypeCastError +100
       [4525582152] ~proc(HTTP::Server::Context -> String)@./src/sidekiq/web.cr:88 +2072
       [4525566338] ~proc(HTTP::Server::Context -> String)@./libs/kemal/kemal/route.cr:10 +66
       [4526712301] *Kemal::RouteHandler#process_request<Kemal::RouteHandler, HTTP::Server::Context>:HTTP::Server::Context +125
       [4526712157] *Kemal::RouteHandler#call<Kemal::RouteHandler, HTTP::Server::Context>:HTTP::Server::Context +93
       [4525709936] *get<String, Nil, Nil>:(HTTP::Server::Response::Output | IO::FileDescriptor+ | Int32 | Nil) +192
       [4526809829] */Users/mike/src/sidekiq.cr/spec/web_spec.cr#get<String>:(HTTP::Server::Response::Output | IO::FileDescriptor+ | Int32 | Nil) +21
       [4525728014] ~proc( -> Void)@./spec/web_spec.cr:286 +2718
       [4525503099] *it<String, String, Int32, &( -> Void)>:(Array(Spec::Result) | Bool | Nil) +427
       [4525704007] ~proc( -> Void)@./spec/web_spec.cr:4 +1831
       [4525994230] *Spec::RootContext::describe<String, String, Int32, &( -> Void)>:Spec::Context+ +310
       [4525501977] *describe<String, String, Int32, &( -> Void)>:Spec::Context+ +9
       [4525431833] __crystal_main +31705
       [4525483016] main +40

Note the top element in my code: sidekiq/web.cr:88:

get "/busy" do |x|
  ecr("busy")
end

Great. ECR pushes the entire template into the code. None of it is visible in backtrace. So I throw in some raise "boom" in the template to bisect down to this line:

    <% x.processes.each do |process| %>

This calls Sidekiq::ProcessSet#each:

    def each
      Sidekiq.redis do |conn|
        prcs = conn.smembers("processes")
        procs = [] of String
        prcs.each { |x| procs << x.as(String) }
        procs.sort!

        # We're making a tradeoff here between consuming more memory instead of
        # making more roundtrips to Redis, but if you have hundreds or thousands of workers,
        # you'll be happier this way
        results = conn.pipelined do |ppp|
          procs.each do |key|
            ppp.hmget(key, "info", "busy", "beat", "quiet")
          end
        end.as(Array(Redis::RedisValue))

        results.each do |x|
          packet = x.as(Array(Redis::RedisValue))
          info = packet[0].as(String)
          busy = packet[1].as(Int).to_i64  # <========= ACTUAL BUG
          beat = packet[2].as(String).to_f64
          quiet = packet[3]?.to_s == "true"

          hash = JSON.parse(info).as_h
          hash["busy"] = busy
          hash["beat"] = beat
          hash["quiet"] = quiet
          yield Process.new(hash)
        end
      end

      nil
    end

This is one example of how debugging is painful today due to huge gaps in error backtraces. The equivalent Ruby backtrace would have 4-6 frames between these two lines and make for a 20 second bugfix instead of a 20 minute bughunt.

       [4526322148] *TypeCastError::new<String>:TypeCastError +100
       [4525582152] ~proc(HTTP::Server::Context -> String)@./src/sidekiq/web.cr:88 +2072

Are there any easy wins to improve the situation?

feature compiler

Most helpful comment

Note that gdb or lldb often help to debug. They need a binary compiled with --debug and can even access the file and line numbers (thought the line number is often a little off) for better traces. You may even display the source code with gdb -tui. It won't help with inlined blocks, and support for displaying variables is missing (except for basic primitives), but it may help following what's happening and where. For example:

$ crystal build --debug foo.cr
$ gdb -tui foo
(gdb) break foo.cr:5
(gdb) run
(...)
Breakpoint 1, add () at foo.cr:6
(gdb) bt
#0  add () at foo.cr:6
#1  0x0000000000423424 in __crystal_main () at foo.cr:8
#2  0x0000000000425ba9 in main () at .crystal/macro73205920.cr:11
(gdb) step
main () at .crystal/macro73205920.cr:3
(gdb) 

Note: the fact that gdb can access the file:line information means that we should be capable to do the same for exception backtraces in Crystal (but as @waj said earlier: it's hard).

All 18 comments

In master the "cast failed" exception now includes the file and line where it happens, so that's an improvement :-)

The main issue here is that blocks are always inlined so that's why call like each don't appear in the stack trace: they aren't calls.

I don't know if there's a way to fake the stack trace or similar, when generating code. I remember @waj told me something like that is possible when generating debug info, so in the future compiling with --debug maybe will solve this.

Not inlining block is far from trivial. We'd have to create procs for every block, closure local variables and pass them to those procs. I also think that will make things not just slow, maybe horribly slow (imagine that happening every time you iterate an array, or do 5.times), but of course without trying it's impossible to know.

Another possibility is to let raise receive __FILE__ and __LINE__ and set that to an Exception's information, and show that on error. This is a small improvement but doesn't work well when an exception is raised inside someone else's code, for example doing [1, 2][3] would show the file and line for Array, which isn't very interesting... and forwarding file and line number for every array access doesn't sound good/efficient either.

Another side note: did you try searching .as(Int) to find the bug more easily? Searching in sidekiq.cr for that shows me 4 matches, so adding puts in those locations would have spotted the problem very quickly.

(of course the above is not an excuse to not improve the language 馃槉 )

Well, keep in mind if you make it a non-default switch, I can turn it on for those times when I need precise traces, e.g. when targeting individual spec failures: crystal spec --slow file.cr:line. I couldn't care less about a 10x speed loss if my debugging goes from 20 minutes to 20 seconds.

For this issue, I made up the error. The actual cause was nil data in Redis that didn't cast correctly to a String.

Another thing that might be nice would be to include what it got when it failed to cast. That might be a separate issue though

@paulcsmith That's already fixed too, the error now is "failed to cast String to Int"

@asterite I meant including the actual value. Like failed to cast "my_string" (String) to Int. Sometimes including the actual value received can help debugging. But maybe it already does that?

Note that gdb or lldb often help to debug. They need a binary compiled with --debug and can even access the file and line numbers (thought the line number is often a little off) for better traces. You may even display the source code with gdb -tui. It won't help with inlined blocks, and support for displaying variables is missing (except for basic primitives), but it may help following what's happening and where. For example:

$ crystal build --debug foo.cr
$ gdb -tui foo
(gdb) break foo.cr:5
(gdb) run
(...)
Breakpoint 1, add () at foo.cr:6
(gdb) bt
#0  add () at foo.cr:6
#1  0x0000000000423424 in __crystal_main () at foo.cr:8
#2  0x0000000000425ba9 in main () at .crystal/macro73205920.cr:11
(gdb) step
main () at .crystal/macro73205920.cr:3
(gdb) 

Note: the fact that gdb can access the file:line information means that we should be capable to do the same for exception backtraces in Crystal (but as @waj said earlier: it's hard).

@ysbaddaden that's really handy information. Thank you so much!

OK I was referred here from the mailing list.

Doing some initial experimentation, I don't get back line numbers even from simple code that doesn't use blocks, ex:

def go1
  go2
end

def go2
  raise "boom"
end

go1

output:

./test
boom (Exception)
[4532799234] *CallStack::unwind:Array(Pointer(Void)) +82
[4532799137] *CallStack#initialize:Array(Pointer(Void)) +17
[4532799096] *CallStack::new:CallStack +40
[4532779209] *raise<Exception>:NoReturn +25
[4532779169] *raise<String>:NoReturn +17
[4532791299] *go2:NoReturn +19
[4532791273] *go1:NoReturn +9
[4532777806] __crystal_main +926
[4532790984] main +40

I assume this is the same problem or should I file a different issue here?
Thanks!

@rdp not exactly, this issue is more about inlined blocks. Line numbers in exceptions is coming, as RX14 pointed, in #3303.

Stop spamming @codenoid, you've been reminded at least once already - https://github.com/crystal-lang/crystal/issues/1077#issuecomment-313645715

-_- . this comment written before #1077

yeah, consequences of double-posting in wrong places, well...

I'm not sure whether this is the place where my problem belongs to (I mean "Improve backtraces" is a broad field).

However, I just created an issue on spec2.cr about missing the most relevant part of the callstack and now I saw this issue here and think it might be caused by something of the core part of crystal.
Because spec2.cr isn't doing anything relevant with the callstack and even if I print CallStack.new.printable_backtrace within the expectation method the important line is missing. So it might be something in CallStack (or at least related), right?

I have a minimal example in case it might be helpful.

EDIT: I don't have this issue anymore, since I moved to :mega: Spectator

My particular example is working now, at least I get "most" line numbers, thanks! :)

reduced example (I think):

$ cat 2785.cr 
require "ecr"
class Greeting
  ECR.def_to_s "greeting.ecr"
end
Greeting.new().to_s
$ cat greeting.ecr 
1
2
3
<%= raise "boom" %>
$ crystal 2785.cr 
Unhandled exception: boom (Exception)
  from 2785.cr:4:5 in 'to_s' # it lost the location within greeting.ecr file
  from /usr/share/crystal/src/object.cr:96:7 in 'to_s'
  from 2785.cr:7:1 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:97:5 in 'main_user_code'
  from /usr/share/crystal/src/crystal/main.cr:86:7 in 'main'
  from /usr/share/crystal/src/crystal/main.cr:106:3 in 'main'
  from __libc_start_main
  from _start
  from ???

I think crystal can generate "expanded" files with all the macros, maybe it could base, or optionally base, the backtrace off the temp expanded file??

Was this page helpful?
0 / 5 - 0 ratings