Julia: Emitting runtime codegen debug symbols to MacOS profiling utilities

Created on 26 Jun 2020  Â·  12Comments  Â·  Source: JuliaLang/julia

On Linux, julia is able to emit debug symbols for the dynamically compiled code generated by LLVM, such that you get nice, correct, beautiful stack traces when running julia in gdb, or looking at perf top on a running instance.

Can we do the same for macOS? It would be amazing to be able to point Instruments at a running julia process to get more information on why it's just spinning and not printing anything.

What would the requirements for this be? Are there fundamental blockers? @vtjnash mentioned that the Linux kernel provided an API we call to register the generated code, which makes this easy, and MacOS provides no similar interface. But maybe there are still ways to get this benefit on macOS?

Thanks much! :)

Most helpful comment

I remember trying to fix this in LLDB years ago, but I don't remember what the problem was. You can probably find the old email threads.

All 12 comments

I don’t think the kernel does anything. It’s gdb and glibc. Does gdb not use the same api on Mac?

Yeah, it's a GDB interface. No idea what Instruments does here or if it has a similar interface.

I don't think this works on GDB on the mac, either? I thought i remembered this was a linux-specific thing?

It certainly doesn't generate useful symbols in lldb:

julia> for _ in 1:100 peakflops() end
Process 64294 stopped
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff67bf7dd9 libsystem_platform.dylib`_platform_bzero$VARIANT$Haswell + 57
    frame #1: 0x0000000126eb79ec libopenblas64_.0.3.9.dylib`dgemm_beta_HASWELL + 156
    frame #2: 0x00000001243229be libopenblas64_.0.3.9.dylib`inner_thread + 414
    frame #3: 0x0000000124505685 libopenblas64_.0.3.9.dylib`exec_blas + 117
    frame #4: 0x000000012432257f libopenblas64_.0.3.9.dylib`dgemm_thread_nn + 991
    frame #5: 0x00000001241f4df5 libopenblas64_.0.3.9.dylib`dgemm_64_ + 581
    frame #6: 0x000000015440b4e5
    frame #7: 0x000000015440ba13
    frame #8: 0x000000015440c02b
    frame #9: 0x000000015440c4c0
    frame #10: 0x000000015440642e
    frame #11: 0x00000001001c107d libjulia.1.5.dylib`jl_toplevel_eval_flex(m=0x000000012026c330, e=<unavailable>, fast=<unavailable>, expanded=2000) at toplevel.c:790:19 [opt]
    frame #12: 0x00000001001c2085 libjulia.1.5.dylib`jl_toplevel_eval_in [inlined] jl_toplevel_eval(m=0x00000001187fdb50, v=0x00000001207d0110) at toplevel.c:849:12 [opt]

I think Instruments is likely doing everything through llvm / lldb.

I can try with GDB to see if it works there? I would think that since llvm is the one doing the codegen, it should plug in nicely with lldb?

I remember trying to fix this in LLDB years ago, but I don't remember what the problem was. You can probably find the old email threads.

It would be awesome if we could fix this, yeah. It's been an oft-repeated pain point at RelationalAI, and would be a nice thing to have :)

Just noting, though, that i just finished installing and signing gdb (which was quite a convoluted process in itself), and it doesn't work there either, so i do think there's more to this than just a GDB api:

julia> for _ in 1:100 peakflops() end
^C
Thread 4 received signal SIGINT, Interrupt.
[Switching to Thread 0xd0f of process 79346]
0x00007fff67b4933a in ?? ()
(gdb) bt
#0  0x00007fff67b4933a in ?? ()
#1  0x00007fff67c05e60 in ?? ()
#2  0x00000001003fde38 in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb)

That just looks like GDB not understanding the OS X compact unwind info inside OpenBLAS

But there's no julia frames in there anywhere? I doesn't look like it's anything to do with OpenBLAS?:

julia> f() = while true end
f (generic function with 1 method)

julia> f()
^C[New Thread 0x100f of process 13641]
[New Thread 0x1803 of process 13641]
[New Thread 0x1903 of process 13641]
[New Thread 0x1a03 of process 13641]
[New Thread 0x1b03 of process 13641]
[New Thread 0x2703 of process 13641]
[New Thread 0x2803 of process 13641]

Thread 4 received signal SIGINT, Interrupt.
[Switching to Thread 0x100f of process 13641]
0x00007fff67b4933a in ?? ()
(gdb) bt
#0  0x00007fff67b4933a in ?? ()
#1  0x00007fff67c05e60 in ?? ()
#2  0x00000001003fde38 in ?? ()
#3  0x0000000000000000 in ?? ()
(gdb)

You may be looking at the wrong thread.

Blech now i can't get gdb to start at all. it's realllly not well supported on macos, especially given the latest security settings. i had to like create a certificate and sign it and yuck.

Anyway, the tl;dr here is that it would be really amazing if we could kickstart work to look into generating these symbols for the lldb / macos profiling tools. Would save us a lot of time, to be sure. Thanks for the discussion so far, @Keno <3

We'd happily offer a bounty for this, it would save us so much time to have instant visibility into performance issues.

I'm not sure what the best ways to do code bounties are. I know that this was discussed some here: https://discourse.julialang.org/t/compiler-work-priorities/17623

Some options include:

I think Julia Dynamics had some success with bountysource, per this thread?:
https://discourse.julialang.org/t/great-news-for-dynamics-in-julia/17321

Was this page helpful?
0 / 5 - 0 ratings