Crystal: Concurrent invocations of puts seem to interfere with each other

Created on 20 Jul 2019  Â·  41Comments  Â·  Source: crystal-lang/crystal

Since I am not sure if this is a Crystal issue or Kemal issue, I have reported this to the Kemal team as well (kemalcr/kemal#543).

Description

A simple server responds to get requests with random numbers in JSON format. When the server is running on a Raspberry Pi and ab tool is used to issue sequential requests to the server in quick succession, the output of the server on standard output is garbled.

I am reporting the issue here as I am not sure if this is a crystal issue or a kemal issue.

Steps to Reproduce

  1. Execute the server following the instructions in README.md
  2. Execute ab -n 10 -c 1 http://127.0.0.1:1234/random?num=30

Expected behavior: The following or something similar should be seen in the terminal executing the server.

8.108ms
8.051ms
8.403ms
8.242ms
8.209ms
8.437ms
8.095ms
7.988ms
8.126ms
8.087ms

Actual behavior: The following is seen in the terminal executing the server.

-159659774543452127413503ApacheBench/2.3127.0.0.1:1234/random?num=30��<�pyR�p��%�h2.3
                                                                                     /get/random34
                                                                                                  /get/rams
-1358916334152017894057263��7R뱵��Al`|��34
                                          /get/random096127.0.0.1:41128 127.0.0.1
                                                                                 /get/random2.3ms
321435393.761ms
-0.000ms
-0.000ms
4076574188583769414937274��&��5���Qg����2.3
                                           /get/random34
                                                        /get/random30
                                                                     /get/random127.0.0.1:41152 127.0ms
-902563470739474328704365�^��b�Z'�v�;�ق2ms
-108291519172398526629057��1m�EU]�K�����9�2.3
                                             /get/random34
                                                          /get/randomms
0.000ms
-0.000ms

Reproduces how often: Always

Versions

Crystal 0.29.0 (2019-07-20)

Kemal 0.25.2

LLVM: 6.0.1
Default target: arm-unknown-linux-gnueabihf

Linux master10 4.14.34-hypriotos-v7+ #1 SMP Sun Apr 22 14:57:31 UTC 2018 armv7l GNU/Linux

Additional Information

The object file of the crystal compiler was compiled on Ubuntu/Laptop and then linked to create the executable on Hypriot/RaspberryPi.

All 41 comments

I observed similar garbled output even after isolating puts into a dedicated fiber.

Totally offtopic, but there's no need to do Random.new every time. You can just use the top-level rand method.

I can't reproduce this on Mac OSX.

I don't think Crystal works fine on a Raspberry Pi. At least it's not something we officially support it so it'll be very hard for us to give help on this.

I haven't observed this issue on Linux. Just on Raspberry Pi 3. Since Crystal works on Linux and RaspPi runs Linux, I figured Crystal would work on RaspPi if Crystal could be cross compiled to run on RaspPi (even if this configuration is not supported). Clearly, that is not the case :)

i feel like this is a unique and specific use-case. it also can't be reproduced on WSL or native Debian

I am not sure if this an issue with cross compilation (that is causing the compiled program to output garbage) or the Crystal compiler on Raspberry Pi. Any suggestions to verify this is welcome.

So far, I have tried building Crystal on Raspberry Pi 3B by 1) using the cross-compiled compiler and 2) bootstrapping the compiler (crystal-lang/boostrap-script). First approach failed due to insufficient memory while the second failed due to failure in building ruby in the first stage of bootstrapping.

That said, I am looking for a way to compile Crystal programs to run on Raspberry Pi 3B. Doing so on Raspberry Pi would be ideal.

@rvprasad Do yo know what architecture is Raspberry Pi running on? Or check if this gives true?

{{ puts flag?(:aarch64) }}

If so, #7983 might fix this issue.

Rasp Pi 3B uses Armv7 (v7l) processor. (The target triple is armv6-unknown-linux-gnueabihf.) So, the above test fails.

I did some digging with gdb. For this, I changed

puts "%.3fms" % (elapsed_time.total_nanoseconds / 1e6)

to

tmp1 = "%.3fms" % (elapsed_time.total_nanoseconds / 1e6)
puts tmp1

When a single request was submitted to the server, I found all of the strings and puts output were well-formed.

(gdb) info locals
ret = 0x245be8
elapsed_time = {seconds = 0, nanoseconds = 190677}
tmp1 = 0x260e60
(gdb) print &(tmp1->c)
$5 = (UInt8 *) 0x260e6c "-0.000ms"
(gdb) print &(ret->c)
$9 = (UInt8 *) 0x245bf4 "[680012]"

However, while elapsed_time.total_nanoseconds was 190677, the second string was "-0.000ms". This is wrong.

When two concurrent requests were submitted to the server, ret was well-formed but tmp1 was not.

(gdb) info locals
ret = 0x245990
elapsed_time = {seconds = 0, nanoseconds = 191874}
tmp1 = 0x23de10
(gdb) print &(tmp1->c)
$10 = (UInt8 *) 0x23de1c "-501870678723681322067296"
(gdb) print &(ret->c)
$11 = (UInt8 *) 0x24599c "[926334]"

Changing puts "%.3fms" % (elapsed_time.total_nanoseconds / 1e6) to puts "#{elapsed_time.total_nanoseconds / 1e6}ms" reduces the frequency of garbled output and the amount of garbled output in each instance.

There is an issue with hard/soft floats on ARM 32 targets. Specifying the target CPU while compiling may help LLVM backend to generate proper code (or not).

@ysbaddaden if you are suggesting floating point calculations (not data movements) involving floating point values) pose issues on ARM 32 targets, then I doubt that is the case as I was able to generate garbled output with the below minimal program.

require "kemal"

get "/random" do |env|
  tmp1 = "%fms" % 3.333
  puts tmp1
  tmp2 = "%dms" % 3.333
  puts tmp2
  ""
end

Kemal.config.logging = false
Kemal.config.port = 1234
puts "Serving at 0.0.0.0:1234"
Kemal.run

Submitting a single request to the above server produces the following output:

$ ./server 
Serving at 0.0.0.0:1234
-158972576233719173059H�/getget

get/random/getget

/ws/randomms
3ms

To cross-compile the compiler,

  1. Ran ./bin/crystal build src/compiler/crystal.cr --warnings all --cross-compile --target "armv6-unknown-linux-gnueabihf" -s -D without_openssl -D without_zlib on a Ubuntu box.
  2. Copied crystal.o onto a Raspberry Pi 3B running Raspbian 9.9 (from Hypriot).
  3. Ran sudo cc 'crystal.o' -o '/usr/lib/crystal/bin/crystal' -rdynamic /usr/share/crystal/src/llvm/ext/llvm_ext.o `/usr/bin/llvm-config-6.0 --libs --system-libs --ldflags 2> /dev/null` -lstdc++ -lpcre -lm -lgc /usr/share/crystal/src/ext/libcrystal.a -levent -lrt -ldl -lpthread -L/usr/lib -L/usr/local/lib after copying crystal/src into /usr/share/crystal/src (following the instructions here).

It seems the format specifier for float (%f) is broken.

Well, the stringifier for floats uses float arithmetic and that might be broken as @ysbaddaden mentioned. Why do you doubt that this could be the issue? Your example does actually support that.

For %f we call sprintf, there might be an issue there. But it's hard if we can't reproduce it on Mac and Linux.

@straight-shoota TIL stringifying float values involves floating point calculations/arithmetic operations :) That said, it seems the floating point operations (e.g., the division operation worked fine) in the application code seems to work fine while the ones used to stringifying floats seem to be broken. Nevertheless, I will check this.

Don't use floats, at all. Use integers instead —and don't round a float!

The problem with hard/soft floats is that the caller/called ABI doesn't expect the value in the same registers (cpu integers vs fpu registers), which results in garbage.

The format_buf and float args to the invocation of LibC.snprintfat src/string/formatter.cr:265 are valid but the value filled into temp_buf is invalid. This is in line with @asterite's comment.

@asterite I was able to reproduce the issue in a Raspbian container running on a Linux host. The instructions to use the Docker image to create a compiler is available in docker-files repo. Once you have built the compiler and shards in the container, the issue can be repro-ed by building and executing the example. You will need to install curl in the container (via apt install curl) to repro the issue.

Is the linux host running on armv6 as well or is it an x86 architecture? Raspbian has an x86 variant, so I'm not sure.

The Linux host is a x86 machine while the Raspbian is running on armv7l emulator.

Please read the following:

Likely a duplicate of #6954

All ARM isues are related to a hard/soft float issue. Specifying a CPU when compiling may fix the issue (it did for my qemu raspbian image) or not (fails on scaleway armv7 server, and reportedly rpi3).

Reproducing isn't an issue. It's quite easy to. The problem is understanding why LLVM won't enable hard floats when compiling. Maybe we miss some LLVM attributes on functions? I recently noticed that clang sets a bunch of them, related to FP and soft-float.

Maybe I'm saying something stupid, but for hard float you should specify cpu, fpu and float abi: -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16

@konovod I tried doing this today but I gave up as I could not figure out how to pass these flags to LLVM compiler from the command line interface of crystal compiler.

@ysbaddaden Thanks for the pointer. I read thru some of these while digging around. BTW, what is the best way to pass the compiler options mentioned by @konovod to LLVM via crystals CLI? That would allow more experimentation.

The format_buf and float args to the invocation of LibC.snprintfat src/string/formatter.cr:265 are valid but the value filled into temp_buf is invalid.

What do you mean?

Maybe we just miss the "use-soft-float"="false" attribute on every functions? It should always be false, unless the target's arch is arm and the ABI is *eabi.

Maybe we miss the ability to trigger thumb mode; when -mthumb is specified clang changes the target from armv7-linux-gnueabihf to thumbv7-linux-gnueabihf for example (i.e. replaces arm with thumb).

@asterite I wasn't sure at which program point were the memory addresses getting messed up. So, I dug around and found that format_buf (the format string) and float (the value to be printed) arguments passed to LibC.snprintf at src/string/formatter.cr:265 were indeed the values they should have been. Assuming the other input argument len was valid, the fault resulting in the garbled output is encountered sometime after the LibC.snprintf is invoked at src/string/formatter.cr:265. This kinda aligns with @ysbaddaden suggestion -- ABI mismatch between the app and libc.

The failure occurs only when the foo() is invoked within the context of a Kemal Handler (at call2 and not at call1). This seems to suggest the issue may be specific to Kemal and not related to ABI or hard/soft floats. [I have mentioned this in kemalcr/kemal/issues/543.]

require "kemal"

def foo() 
  tp1 = "%fms" % 3.333
  puts tp1
  tp2 = "%dms" % 3.333
  puts tp2
end 

foo()   # call1

get "/random" do |env|
  foo() # call2
  ""
end

Kemal.config.logging = false
Kemal.config.port = 1234
puts "Serving at 0.0.0.0:1234"
Kemal.run

Output upon requesting http://127.0.0.1:1234/random

root@6f9b4df16a4b:/t/t# ./t
3.333000ms
3ms
Serving at 0.0.0.0:1234
-158972575668831573782�{#/getget

get/random/getget

/ws/randomms
3ms

What happens if you just use http server from the standard library? I don't think kemal is involved here.

The below snippet fails as well.

require "http/server"

def foo() 
  tp1 = "%fms" % 3.333
  puts tp1
  tp2 = "%dms" % 3.333
  puts tp2
end 

foo()   # call1

server = HTTP::Server.new do |ctxt|
  foo() # call2
  ctxt.response.print "Boo"
end

address = server.bind_tcp 1234
puts "Serving at #{address}"
server.listen

Output

3.333000ms
3ms
Serving at 127.0.0.1:1234
-158972575660657402946.

keep-alive

Keep-Alive

ConnectionHoidentity*/*ms
3ms

Cool! Now you could try reducing it to using spawn and Fiber.yield in between calls, and spawning a lot of fibers and see if it still reproduces.

The less code we have that reproduces the problem, the easier to find the cause.

(I'll send such code later today)

@rvprasad Try this:

chan = Channel(Nil).new

1000.times do |i|
  spawn do
    tp1 = "%fms" % 3.333
    puts tp1
    Fiber.yield
    tp2 = "%dms" % 3.333
    puts tp2
    chan.send(nil)
  end
end

1000.times do
  chan.receive
end

@asterite Thanks. I was able to repro the issue in the Docker container.

spawn do
  tp1 = "%fms" % 3.333
  puts tp1
end

sleep(10)

What output you get with that program?

It produces the following output in a container.

root@4b026a742082:/t/t# crystal run -d s.cr 
Using compiled compiler at `.build/crystal'
-158972575643887689414`?� I�l@Z~?6�~΄/t/t//t/t@@ms

But it works fine if you don't do it inside spawn?

Yes.

Also, output is fine when spawn is replaced with 1.times.

@ysbaddaden In the below snippet, foo prints 1.000000 2.000000 when invoked in mydo but it prints 0.000000 0.000000 when invoked in the fiber. Since the same code fragment provides both "correct" and "incorrect" outcomes, I doubt if this is an ABI related issue but...

The float argument to corresponding calls to LibC.snprintf in string/formatter.c in the above invocations are identical, i.e., 1.0 and 2.0. So, it seems the context swapping is interfering with the call to an external library function.

def foo
  i = {1.0, 2.0}
  printf("%f %f\n", *i)
end

def mydo(&x)
  x.call()
end

mydo(&->foo)
spawn foo

sleep(2)

Any ideas on what/where might be the issue?

We save/restore FPU registers on ARM when switching fibers, and it leads to the same hard/soft float codegen bug.

I want to experiment with different LLVM compiler flags during code generation, e.g., related to FPU and NEON. Any ideas where and how these flags can be added/modified/specified?

Was this page helpful?
0 / 5 - 0 ratings