Crystal: Index out of bounds (IndexError) with HTTP::Server

Created on 25 Sep 2019  路  17Comments  路  Source: crystal-lang/crystal

Hi there!

After updating my Kemal web-app on Heroku with Crystal 0.31.0 I started to get a lot of error messages Index out of bounds (IndexError).

Minimal reproducible code:

# main.cr
require "http/server"

server = HTTP::Server.new([
  HTTP::LogHandler.new
]) do |context|
  context.response.content_type = "text/plain"
  context.response.print "Hello world!\n" * 1000 # << HERE
end

puts "Listening on http://0.0.0.0:3000"

server.listen(3000)

It is reproducible only with --release flag and large body in response.

crystal build --release main.cr

Each request causes an error

GET / - Unhandled exception:
Index out of bounds (IndexError)
  from /snap/crystal/118/share/crystal/src/crystal/thread_local_value.cr:26:7 in 'unbuffered_write'
  from /snap/crystal/118/share/crystal/src/http/server/response.cr:217:11 in 'unbuffered_write'
  from /snap/crystal/118/share/crystal/src/http/server/handler.cr:0:3 in 'call'
  from /snap/crystal/118/share/crystal/src/http/server/request_processor.cr:48:11 in '->'
  from /snap/crystal/118/share/crystal/src/fiber.cr:255:3 in 'run'
  from ???

P.S.

crystal -v                         
Crystal 0.31.0 [8d4715a59] (2019-09-24)

LLVM: 8.0.0
Default target: x86_64-unknown-linux-gnu
bug topicconcurrency

Most helpful comment

llvm 8.0.0 or 8.0.1 makes no difference. But the release mode does as @mamantoha said.
The error is also present in llvm 4. I checked since that was updated recently on the linux builds.

All 17 comments

BTW, can't reproduce on macOS.

crystal -v
Crystal 0.31.0 (2019-09-23)

LLVM: 8.0.1
Default target: x86_64-apple-macosx

Notice it's LLVM: 8.0.1 on MacOS but LLVM: 8.0.0 on Linux.

llvm 8.0.0 or 8.0.1 makes no difference. But the release mode does as @mamantoha said.
The error is also present in llvm 4. I checked since that was updated recently on the linux builds.

I don't get it still. My guess is that there is something odd with the state of the Slice somewhere.

  • When adding some dprintf along the way the problems is solved 卢卢. Patch
  • When wrapping the raise in check_size the problem is also solved 卢卢. Patch
  • It does not seems to be an issue of inline vs call. Using @[NoInline] in that method didn't solved the issue.

Any other guesses? Otherwise, maybe the second patch is a workaround for now. It is only affecting the path that will raise an exception...

@bcardiff Is this an error on Crystal's side? It only happens in linux? Is it related to some LLVM version?

I鈥檓 not sure.

I didn鈥檛 bisect on older crystal versions.

I checked llvm-ir (Everything seems reasonable there), but I didn鈥檛 dare to check asm. It already felt as an escape room.

If I'm seeing the stack trace correctly, the error might be triggering here:

https://github.com/crystal-lang/crystal/blob/6128faa77a9f9eb22ea23883870ab1f671dce939/src/io/evented.cr#L75

I would try adding, just before that line:

LibC.printf("slice.size: %d written: %d\n", slice.size, bytes_written.to_i!)

and seeing if there's anything strange there.

(I can't try it because on mac it seems to work fine)

From there it goes to Slice#+ which call check_size.

I could reproduce it on docker.

From there it goes to Slice#+ which call check_size.

And that's where the exception happens, right?

My idea is to put debug info before that call to see if there's something wrong. Maybe bytes_written is something that's not -1 but also something that's not good, I don't know.

Would you mind checking to see if:

# https://github.com/crystal-lang/crystal/blob/master/src/crystal/thread_local_value.cr
struct Crystal::ThreadLocalValue(T)
  def get(&block : -> T)
    th = Thread.current
    if !@values[th]?
      @values[th] = yield
    else
      @values[th]
    end
  end
end

require "http/server"

server = HTTP::Server.new([
  HTTP::LogHandler.new
]) do |context|
  context.response.content_type = "text/plain"
  context.response.print "Hello world!\n" * 1000 # << HERE
end

puts "Listening on http://0.0.0.0:3000"

server.listen(3000)

"fixes" the issue?

@bcardiff Could you give me step by step instructions to be able to reproduce the bug? I'm not super familiar with Docker not with which Dockerfile I should be using. I'd like to try to debug the problem and see if I can find something.

@omarroth I can confirm this "fixes" an issue

@mamantoha Could you also try this? It's slightly different than the other comment (which is not thread-safe).

# https://github.com/crystal-lang/crystal/blob/master/src/crystal/thread_local_value.cr
struct Crystal::ThreadLocalValue(T)
  def get(&block : -> T)
    @mutex.sync do
      th = Thread.current
      @values.fetch(th) do
        @values[th] = yield
      end
    end
  end
end

require "http/server"

server = HTTP::Server.new([
  HTTP::LogHandler.new
]) do |context|
  context.response.content_type = "text/plain"
  context.response.print "Hello world!\n" * 1000 # << HERE
end

puts "Listening on http://0.0.0.0:3000"

server.listen(3000)

@asterite that does not fix the issue. I confirm that would seem to be gone by @omarroth patch but also with the following

struct Crystal::ThreadLocalValue(T)
  def get(&block : -> T)
    th = Thread.current
    @mutex.sync do
      if !@values[th]?
        @values[th] = yield
      else
        @values[th]
      end
    end
  end
end

my guess is that the double lookup in the hash is making things slightly slower and hiding the underlying issue.

@asterite to reproduce it from mac

$ docker run -it --rm --privileged --pid=host -v $(pwd):/src -w /src crystallang/crystal:0.30.1 bash
  > apt-get update && apt-get install -y software-properties-common lsb-release wget curl
  > curl https://apt.llvm.org/llvm.sh -o llvm.ignoreme.sh
  > chmod +x llvm.ignoreme.sh
  > ./llvm.ignoreme.sh 8
  > make clean clean_cache crystal
  > ./bin/crystal run --release http.ignoreme.cr

$ docker exec -it $(docker ps -q) bash
  > curl http://localhost:3000

(sorry I have a global *.ignoreme.* rule)
I can reproduce it directly from 0.31.0 image also.

8239 was updated with yet another workaround that works on my side. If somebody else wants to confirm with a 馃帀 reaction it would be great.

Using git bisect, 19d2a44b40039350567c15def4b2825c02c9bb58 is the first commit where I'm able to produce the issue.

However, it still occurs when reverting the above commit on master, so I don't expect that it actually produced a regression. As @bcardiff suggested the actual issue is likely hidden by various tweaks to performance.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

grosser picture grosser  路  3Comments

nabeelomer picture nabeelomer  路  3Comments

Papierkorb picture Papierkorb  路  3Comments

lbguilherme picture lbguilherme  路  3Comments

oprypin picture oprypin  路  3Comments