Crystal: GC related crash using C libraries integration with GTK

Created on 4 May 2020  路  5Comments  路  Source: crystal-lang/crystal

Hi,

I'm experiencing a cash that may be a problem in Crystal or in GTK integration with Crystal GC. The original problem was reported at https://github.com/jhass/crystal-gobject/issues/57, and I tried to reduce the issue to something that could be reproduced with a crystal build command without any dependencies.

The issue

Code bellow just call GTK3 C functions directly, GTK3 doesn't call any Crystal code, but the process crash inside GTK when we call LibGtkSource.gtk_source_buffer_set_language(buffer, lang). A C version of this program works and can be checked out in the original bug report.

Now the interesting part, if we run the program with GC disabled the crash doesn't happen.

There's an stack overflow page about something that seems related to this https://stackoverflow.com/q/43141659/2199687. So is this an issue on Crystal GC? GSourceView4? GLib?

Crash environment

Crash tested on ArchLinux x86_64 with packages:

  • crystal 0.34.0-2
  • gc 8.0.4-3
  • llvm-libs-10.0.0-1
  • gtk3 1:3.24.20-1
  • gtksourceview4 4.6.0-1

Crashing code

@[Link("gtk-3")]
lib LibGtk
  fun gtk_init(argc : Int32*, argv : UInt8***) : Void
  fun gtk_main : Void
  fun gtk_builder_new_from_string(string : UInt8*, length : Int64) : Void*
  fun gtk_builder_connect_signals(this : Void*, user_data : Void*) : Void
  fun gtk_builder_get_object(this : Void*, name : UInt8*) : Void*
  fun gtk_widget_show_all(this : Void*) : Void
  fun gtk_text_view_get_buffer(this : Void*) : Void*
end

@[Link("gtksourceview-4")]
lib LibGtkSource
  fun init = gtk_source_init : Void
  fun gtk_source_language_manager_get_default : Void*
  fun gtk_source_language_manager_guess_language(this : Void*, filename : UInt8*, content_type : UInt8*) : Void*
  fun gtk_source_buffer_set_language(this : Void*, language : Void*) : Void
end

xml = <<-EOF
<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated with glade 3.22.2 -->
<interface>
  <requires lib="gtk+" version="3.20"/>
  <requires lib="gtksourceview" version="4.0"/>
  <object class="GtkApplicationWindow" id="main_window">
    <property name="can_focus">False</property>
    <property name="default_width">800</property>
    <property name="default_height">600</property>
    <signal name="destroy" handler="gtk_main_quit" swapped="no"/>
    <child type="titlebar">
      <placeholder/>
    </child>
    <child>
      <object class="GtkBox">
        <property name="visible">True</property>
        <property name="can_focus">False</property>
        <child>
          <object class="GtkSourceView" id="editor">
            <property name="can_focus">False</property>
          </object>
          <packing>
            <property name="expand">True</property>
            <property name="fill">True</property>
            <property name="position">0</property>
          </packing>
        </child>
      </object>
    </child>
  </object>
</interface>
EOF

LibGtk.gtk_init pointerof(ARGC_UNSAFE), pointerof(ARGV_UNSAFE)
LibGtkSource.init

builder = LibGtk.gtk_builder_new_from_string(xml, -1)
LibGtk.gtk_builder_connect_signals(builder, nil)

# Get editor
editor = LibGtk.gtk_builder_get_object(builder, "editor")
buffer = LibGtk.gtk_text_view_get_buffer(editor)

# Here the problem, seems to be just with markdown syntax.
lang_manager = LibGtkSource.gtk_source_language_manager_get_default
lang = LibGtkSource.gtk_source_language_manager_guess_language(lang_manager, "README.md", nil)
puts "******* Hold on! prepare for a crash!!! ********"
LibGtkSource.gtk_source_buffer_set_language(buffer, lang)

# Show main window
main_window = LibGtk.gtk_builder_get_object(builder, "main_window")
LibGtk.gtk_widget_show_all(main_window)
LibGtk.gtk_main

Stack trace plus GC output

$ GC_PRINT_VERBOSE_STATS=1 ./pure_crash 
Grow heap to 64 KiB after 0 bytes allocated
Number of processors = 8
Started 7 mark helper threads
Initiating full world-stop collection!
0 bytes in heap blacklisted for interior pointers

--> Marking for collection #1 after 0 allocated bytes
Pushed 1 thread stacks
Starting marking for mark phase number 0
Starting mark helper 0
Starting mark helper 1
Starting mark helper 2
Starting mark helper 3
Starting mark helper 4
Starting mark helper 5
Starting mark helper 6
Starting mark helper 7
Finished mark helper 2
Finished mark helper 0
Finished mark helper 4
Finished mark helper 6
Finished mark helper 1
Finished mark helper 7
Finished mark helper 5
Finished mark helper 3
Finished marking for mark phase number 0
GC #1 freed 0 bytes, heap 64 KiB (+ 0 KiB unmapped)
World-stopped marking took 1 msecs (1 in average)
Bytes recovered before sweep - f.l. count = 0
In-use heap: 0% (0 KiB pointers + 0 KiB other)
Immediately reclaimed 0 bytes, heapsize: 65536 bytes (0 unmapped)
0 finalization entries; 0/0 short/long disappearing links alive
0 finalization-ready objects; 0/0 short/long links cleared
Finalize plus initiate sweep took 0 + 0 msecs
Complete collection took 1 msecs
Adding block map for size of 2 granules (32 bytes)
Adding block map for size of 1 granules (16 bytes)
Adding block map for size of 3 granules (48 bytes)
Adding block map for size of 4 granules (64 bytes)
Grew fo table to 1 entries
Adding block map for size of 15 granules (240 bytes)
Grew fo table to 2 entries
Adding block map for size of 11 granules (176 bytes)
Grew fo table to 4 entries
Adding block map for size of 7 granules (112 bytes)
Adding block map for size of 5 granules (80 bytes)
Grew fo table to 8 entries
******* Hold on! prepare for a crash!!! ********
Adding block map for size of 9 granules (144 bytes)
Adding block map for size of 8 granules (128 bytes)
Adding block map for size of 16 granules (256 bytes)
Adding block map for size of 14 granules (224 bytes)
Adding block map for size of 28 granules (448 bytes)
Grow heap to 128 KiB after 9264 bytes allocated
Adding block map for size of 18 granules (288 bytes)
Adding block map for size of 13 granules (208 bytes)
Adding block map for size of 6 granules (96 bytes)
Adding block map for size of 12 granules (192 bytes)
Adding block map for size of 20 granules (320 bytes)
Adding block map for size of 21 granules (336 bytes)
Adding block map for size of 10 granules (160 bytes)
Adding block map for size of 19 granules (304 bytes)
Adding block map for size of 128 granules (2048 bytes)
Adding block map for size of 64 granules (1024 bytes)
Grow heap to 192 KiB after 77680 bytes allocated
Adding block map for size of 0 granules (0 bytes)
Adding block map for size of 42 granules (672 bytes)
Grow heap to 264 KiB after 133456 bytes allocated
Grow heap to 356 KiB after 207840 bytes allocated
Adding block map for size of 84 granules (1344 bytes)
Adding block map for size of 22 granules (352 bytes)
Adding block map for size of 17 granules (272 bytes)
Initiating full world-stop collection!
0 bytes in heap blacklisted for interior pointers

--> Marking for collection #2 after 337872 allocated bytes
Pushed 1 thread stacks
Starting marking for mark phase number 1
Starting mark helper 0
Starting mark helper 1
Starting mark helper 2
Starting mark helper 3
Starting mark helper 4
Starting mark helper 5
Starting mark helper 6
Starting mark helper 7
Finished mark helper 3
Finished mark helper 4
Finished mark helper 6
Finished mark helper 0
Finished mark helper 7
Finished mark helper 1
Finished mark helper 2
Finished mark helper 5
Finished marking for mark phase number 1
GC #2 freed 0 bytes, heap 356 KiB (+ 0 KiB unmapped)
World-stopped marking took 1 msecs (1 in average)
Bytes recovered before sweep - f.l. count = -75136
In-use heap: 4% (15 KiB pointers + 0 KiB other)
Immediately reclaimed 200976 bytes, heapsize: 364544 bytes (0 unmapped)
7 finalization entries; 0/0 short/long disappearing links alive
0 finalization-ready objects; 0/0 short/long links cleared
Finalize plus initiate sweep took 0 + 0 msecs
Complete collection took 1 msecs
Adding block map for size of 50 granules (800 bytes)
Invalid memory access (signal 11) at address 0x0
[0x562d2bf13156] *CallStack::print_backtrace:Int32 +118
[0x562d2bf05cbe] __crystal_sigfault_handler +286
[0x7f1713770800] ???
[0x7f171395a6e4] pcre_exec +3156
[0x7f17139fc31d] g_match_info_next +157
[0x7f17139fd510] g_regex_match_full +128
[0x7f17139fd6b9] g_regex_replace_eval +201
[0x7f171487241d] ???
[0x7f171487eab7] ???
[0x7f171487dc4f] ??? (5 times)
[0x7f17148a8f4d] ???
[0x7f17148c36d6] gtk_source_buffer_set_language +294
[0x562d2bef7bd7] __crystal_main +1367
[0x562d2bf6aa76] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +6
[0x562d2bf6a90c] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +44
[0x562d2bf031b6] main +6
[0x7f171353c023] __libc_start_main +243
[0x562d2bef75ae] _start +46
[0x0] ???

Expected behavior

Code should show a blank window and not crash.

Most helpful comment

Alright, little update here too.

So, if you wondered where #9248, #9253 and #9255 came from, it was to make this little piece possible https://gist.github.com/jhass/53b56308b1d4013c32cf2325d800e130

What this does is shim all mallocish and pthread_ functions to redirect them to their GC_ variants. This works and shadows the ones from libc and libpthread because the linker starts the search for the symbols in the main binary. And because glibc exposes alternative symbol names for the core malloc functions, which allows to wrap them without dlopen, which in turn calls malloc, thus infinite recursion if you try.

What's that to with this issue? Well it's actually surprising it took us this long to run into something like this. bdwgc actually really doesn't like coexistence with libraries that spawn threads of their own without telling it about them. On Darwin and Windows it can use OS APIs to listen to these threads, on Linux it has header files to shim calls or discouraged and slow compile time options to workaround the issues. I didn't look into BSD, I suppose that's an entire rabbit hole of its own.

How something like Inkscape works with bdwgc is still a mystery to me. The responses I got on their IRC when asking about their integration were a literal, full quote "don't" and secondly, paraphrasing: "I don't think anybody that did that is still working on the project, we really want to move away from it".

So I would conclude three things from this journey:

First if we want to be a binding friendly language, we really really need to get our own GC going, or at least swap out bdwgc.

Second we need to get our story around pure/C ABI function pointers straight. They need their own type, representing them as Proc is just plain wrong. Those three pull requests are just the minimum I needed get stuff working, there's a fourth issue I could workaround (see the comments in the gist above). There's just too many places in codegen, and probably semantics, where you need to clearly discern whether your target and source is a Crystal proc or C ABI function pointer, to properly type and convert or even avoid to convert them, and you just don't have enough information at hand to do it, because it all looks like some "Proc". My pull requests all feel hacks and workarounds this issue rather than a proper solution to them, because of this design issue.

Third, and this plays heavily into the first point, if we want to be a language where low level things are possible we need globals back. Our current global state mechanisms, constants and class variables, both allocate memory for initialization to any value, because they go through __crystal_once. In my gist above I had to include a C file just to declare a thread local global with a default value, which is just silly. I'm in no way arguing to reintroduce globals as a first level feature that a regular Crystal user would ever care about, but we need some allocation free way to declare global state. This could be as simple as allowing to declare global variables in lib sections, instead of just referencing them.

All 5 comments

Additional info: Inkscape [1] uses the same GC as Crystal and GTK, and it works.

[1] https://gitlab.com/inkscape/inkscape

puts "******* Hold on! prepare for a crash!!! ********"
LibGtkSource.gtk_source_buffer_set_language(buffer, lang)

What are the values of buffer and lang here? My guess is that either of those is a null pointer?

In fact Invalid memory access (signal 11) at address 0x0 means a null pointer exception. Maybe check that lang and buffer are not null before setting those? Maybe the code is slightly different from what you are doing in those other languages?

By the way, I just installed all the necessary libraries in mac and ran the code and it worked fine, I could see an empty blank window.

Yes it only crashes on Linux. Please read the issue linked in top :)

My current guess is that GtkTextView (GtkSourceView's parent) spawns threads that allocate memory, which is not seen by bdwgc because those obviously don't use GC_phtread_create and friends. Thus the GC doesn't see the memory allocated there and clobbers it. It's definitely not a simple null pointer :) On Darwin and Windows bdwgc seems to use hooks that don't require redefining pthread_create. Also note the Immediately reclaimed 200976 bytes from the second collection cycle, right before the crashing library call. I suspect this is clobbering up some internal context object in the structures behind the pointer we get in crystal land.

I tried playing a bit with using https://www.gnu.org/software/libc/manual/html_node/Hooks-for-Malloc.html but that only lead to funnier crashes.

Then I tried redefining fun malloc and friends directly, shadowing glibc's malloc, which lead to even funnier crashes. bdwgc spawning its threads as part of its initialization calls malloc internally, so you kinda want to redirect that to the original malloc and do that once per thread, so you need some kind of thread local state which you cannot really access because Crystal has no global variables and it's just too early for the constant/class var initialization code to work (no operational GC yet).

Then I thought about redefining pthread_create and friends in a similar manner. That currently got me stuck on a whole cluster of codegen bugs in the kind of https://carc.in/#/r/90n0 (the "no prelude" LLVM IR to that: https://p.jhass.eu/7x.ll)

Alright, little update here too.

So, if you wondered where #9248, #9253 and #9255 came from, it was to make this little piece possible https://gist.github.com/jhass/53b56308b1d4013c32cf2325d800e130

What this does is shim all mallocish and pthread_ functions to redirect them to their GC_ variants. This works and shadows the ones from libc and libpthread because the linker starts the search for the symbols in the main binary. And because glibc exposes alternative symbol names for the core malloc functions, which allows to wrap them without dlopen, which in turn calls malloc, thus infinite recursion if you try.

What's that to with this issue? Well it's actually surprising it took us this long to run into something like this. bdwgc actually really doesn't like coexistence with libraries that spawn threads of their own without telling it about them. On Darwin and Windows it can use OS APIs to listen to these threads, on Linux it has header files to shim calls or discouraged and slow compile time options to workaround the issues. I didn't look into BSD, I suppose that's an entire rabbit hole of its own.

How something like Inkscape works with bdwgc is still a mystery to me. The responses I got on their IRC when asking about their integration were a literal, full quote "don't" and secondly, paraphrasing: "I don't think anybody that did that is still working on the project, we really want to move away from it".

So I would conclude three things from this journey:

First if we want to be a binding friendly language, we really really need to get our own GC going, or at least swap out bdwgc.

Second we need to get our story around pure/C ABI function pointers straight. They need their own type, representing them as Proc is just plain wrong. Those three pull requests are just the minimum I needed get stuff working, there's a fourth issue I could workaround (see the comments in the gist above). There's just too many places in codegen, and probably semantics, where you need to clearly discern whether your target and source is a Crystal proc or C ABI function pointer, to properly type and convert or even avoid to convert them, and you just don't have enough information at hand to do it, because it all looks like some "Proc". My pull requests all feel hacks and workarounds this issue rather than a proper solution to them, because of this design issue.

Third, and this plays heavily into the first point, if we want to be a language where low level things are possible we need globals back. Our current global state mechanisms, constants and class variables, both allocate memory for initialization to any value, because they go through __crystal_once. In my gist above I had to include a C file just to declare a thread local global with a default value, which is just silly. I'm in no way arguing to reintroduce globals as a first level feature that a regular Crystal user would ever care about, but we need some allocation free way to declare global state. This could be as simple as allowing to declare global variables in lib sections, instead of just referencing them.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

oprypin picture oprypin  路  3Comments

cjgajard picture cjgajard  路  3Comments

ArthurZ picture ArthurZ  路  3Comments

pbrusco picture pbrusco  路  3Comments

will picture will  路  3Comments