caddy+QUIC crash when serving static content

Created on 9 Feb 2017  路  33Comments  路  Source: caddyserver/caddy

1. What version of Caddy are you running (caddy -version)?

Compiled from source, (untracked dev build).

2. What are you trying to do?

Serve web pages with QUIC over lossy and slow network connections.

3. What is your entire Caddyfile?

$hostname:443
root $webRoot
gzip
log $logFolder/caddy.access.log
errors $logFolder/caddy.error.log

tls $certFolder/cert.pem $certFolder/key.pem {
    protocols tls1.2 tls1.2
}

4. How did you run Caddy (give the full command and describe the execution environment)?

caddy -quic -conf="a_folder/config/netem.Caddyfile" -pidfile="a_pidfile"

5. What did you expect to see?

Caddy serving web pages without crashing, as it does with the same settings but using HTTP/1.1 or HTTP/2.

6. What did you see instead (give full error messages and/or log)?

goroutine 2716 [running]:
panic(0x9cd760, 0xc4200120f0)
    /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/lucas-clemente/quic-go.(*stream).AddStreamFrame(0x0, 0xc420edd260, 0x0, 0x0)
    /home/fille/go/src/github.com/lucas-clemente/quic-go/stream.go:238 +0x48
github.com/lucas-clemente/quic-go.(*stream).CloseRemote(0x0, 0x0)
    /home/fille/go/src/github.com/lucas-clemente/quic-go/stream.go:255 +0x79
github.com/lucas-clemente/quic-go/h2quic.(*Server).handleRequest(0xc420225720, 0xd28660, 0xc4209bb860, 0xd2d5c0, 0xc4202ba120, 0xc4206f2f50, 0xc420068270, 0xc420068340, 0x0, 0x0)
    /home/fille/go/src/github.com/lucas-clemente/quic-go/h2quic/server.go:160 +0x762
github.com/lucas-clemente/quic-go/h2quic.(*Server).handleStream.func1(0xc420225720, 0xd28660, 0xc4209bb860, 0xd2d5c0, 0xc4202ba120, 0xc420068270, 0xc420068340)
    /home/fille/go/src/github.com/lucas-clemente/quic-go/h2quic/server.go:108 +0xbf
created by github.com/lucas-clemente/quic-go/h2quic.(*Server).handleStream
    /home/fille/go/src/github.com/lucas-clemente/quic-go/h2quic/server.go:119 +0x116

7. How can someone who is starting from scratch reproduce this behavior as minimally as possible?

Load web pages with QUIC continuously over bad network links (happens on many different web pages). I'm using traffic control within Linux in order to create different network conditions.

I've created a simple proof-of-concept which can spawn a shell with poor network conditions. The problem with caddy is something that possibly only occurs when the client resides on poor networks, but I'll need to figure this out by testing. The tool I've created is only tested on Ubuntu 16.10: https://github.com/hansfilipelo/netem

I'll attach a list of my test domains that I've scraped and are serving as static content with Caddy.
urls.txt

Most helpful comment

@hansfilipelo I was about to write the same about my comment - reading it now it sounds angry to me as well ;) Turns out I didn't read your steps carefully enough, so of course I couldn't reproduce the issue. Sorry for that!

I'm now loading the wgetted version of login.tmall.com via QUIC. So thanks for the detailed steps.

All 33 comments

I've confirmed that this happens on regular network links as well, not just limited links.

How long into a request does it take for this panic to occur?

It differs. Never instantly and sometimes after 20 site loads, sometimes after 150.

Have you tried disabling timeouts? timeouts none

@mholt In my Caddyfile?

Yes :)

@mholt Running tests right now. First of all, thanks for your help.

What does the option do? It never times out a connection from any client?

The option did not solve my issue.

What does the option do? It never times out a connection from any client?

Right.

The option did not solve my issue.

Okay. Is there a particular request that causes this? Can give the request and response?

Also /cc'ing @marten-seemann

Digging through the access logs makes it clear that it is a certain request that seems to trigger this crash. I'll make some more tests and deliver a Chromium trace for when it happens.

Is there anything you would like me to get for you? Can I trace the go-application in some way that would help you out? I can't promise anything (it depends on what you ask for) but I mean - a more stable web server who implements QUIC is better for the sake of both the Internet and mankind.

Here's a chrome://net-internals dump of when it happens.

net-internals-log.json.tar.gz

Which commit of QUIC did you build Caddy with? Can you send me the ID?

Could you also enable QUIC logging (as described in https://github.com/mholt/caddy/wiki/QUIC#troubleshooting) and post the log file here?

Updated to latest right now, ce7d3db1be57b04975922a3988c02df8f993293f, and am still able to reproduce the issue. I'll look deeper into the issue tomorrow and see if I can send you a log or help in any other way.

Here's the end of the output from Caddy, with the log alternative enables, when the crash occurs:

    Estimated RTT: 2ms
    <- &frames.StopWaitingFrame{LeastUnacked: 0x70}
<- Reading packet 0xfe (42 bytes) for connection 3b6845569e5ab6c3 @ 10:57:41.228
    <- &frames.AckFrame{LargestAcked: 0x1ee, LowestAcked: 0xcf, AckRanges: []frames.AckRange(nil), DelayTime: 5碌s}
    Estimated RTT: 2ms
    <- &frames.StopWaitingFrame{LeastUnacked: 0x70}
<- Reading packet 0xff (42 bytes) for connection 3b6845569e5ab6c3 @ 10:57:41.228
    <- &frames.AckFrame{LargestAcked: 0x1f0, LowestAcked: 0xcf, AckRanges: []frames.AckRange(nil), DelayTime: 5碌s}
    Estimated RTT: 2ms
    <- &frames.StopWaitingFrame{LeastUnacked: 0x70}
<- Reading packet 0x100 (42 bytes) for connection 3b6845569e5ab6c3 @ 10:57:41.228
    <- &frames.AckFrame{LargestAcked: 0x1f2, LowestAcked: 0xcf, AckRanges: []frames.AckRange(nil), DelayTime: 7碌s}
    Estimated RTT: 2ms
    <- &frames.StopWaitingFrame{LeastUnacked: 0x70}
<- Reading packet 0x101 (44 bytes) for connection 3b6845569e5ab6c3 @ 10:57:41.228
    <- &frames.AckFrame{LargestAcked: 0x1f5, LowestAcked: 0xf8, AckRanges: []frames.AckRange(nil), DelayTime: 7碌s}
    Estimated RTT: 2ms
    <- &frames.StopWaitingFrame{LeastUnacked: 0x84}
<- Reading packet 0x102 (107 bytes) for connection 3b6845569e5ab6c3 @ 10:57:41.248
    <- &frames.AckFrame{LargestAcked: 0x1fb, LowestAcked: 0x1e9, AckRanges: []frames.AckRange(nil), DelayTime: 19.504ms}
    Estimated RTT: 2ms
    <- &frames.StopWaitingFrame{LeastUnacked: 0xfc}
    <- &frames.StreamFrame{StreamID: 3, FinBit: false, Offset: 0x3a8, Data length: 0x32, Offset + Data length: 0x3da}
<- Reading packet 0x103 (39 bytes) for connection 3b6845569e5ab6c3 @ 10:57:41.248
    <- &frames.RstStreamFrame{StreamID:0x15, ErrorCode:0x6, ByteOffset:0x0}
-> Sending packet 0x1fc (42 bytes) @ 10:57:41.248
    -> &frames.StopWaitingFrame{LeastUnacked: 0x1fc, PacketNumberLen: 0x2}
    -> &frames.AckFrame{LargestAcked: 0x103, LowestAcked: 0xfc, AckRanges: []frames.AckRange(nil), DelayTime: 14.091碌s}
    -> &frames.RstStreamFrame{StreamID:0x15, ErrorCode:0x0, ByteOffset:0x0}
GET web.hfelo.se/login.tmall.com/index.html, on data stream 21
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xf8 pc=0x8af4c8]

goroutine 138 [running]:
panic(0x9cda80, 0xc420012100)
    /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/lucas-clemente/quic-go.(*stream).AddStreamFrame(0x0, 0xc4202f7b30, 0x0, 0x0)
    /home/fille/go/src/github.com/lucas-clemente/quic-go/stream.go:238 +0x48
github.com/lucas-clemente/quic-go.(*stream).CloseRemote(0x0, 0x0)
    /home/fille/go/src/github.com/lucas-clemente/quic-go/stream.go:255 +0x79
github.com/lucas-clemente/quic-go/h2quic.(*Server).handleRequest(0xc42021d520, 0xd2a660, 0xc4202c0b60, 0xd2f5c0, 0xc420438b40, 0xc4201a2000, 0xc4202a6410, 0xc4202a64e0, 0x0, 0x0)
    /home/fille/go/src/github.com/lucas-clemente/quic-go/h2quic/server.go:160 +0x762
github.com/lucas-clemente/quic-go/h2quic.(*Server).handleStream.func1(0xc42021d520, 0xd2a660, 0xc4202c0b60, 0xd2f5c0, 0xc420438b40, 0xc4202a6410, 0xc4202a64e0)
    /home/fille/go/src/github.com/lucas-clemente/quic-go/h2quic/server.go:108 +0xbf
created by github.com/lucas-clemente/quic-go/h2quic.(*Server).handleStream
    /home/fille/go/src/github.com/lucas-clemente/quic-go/h2quic/server.go:119 +0x116

@hansfilipelo Thank you. Can you please provide the complete log file?

@marten-seemann
caddylog.log.gz

@hansfilipelo Thanks for the log file, that was really helpful.

Can you please rebuild Caddy (don't forget to get the QUIC update first) and try again? I'm pretty certain the bug is fixed now, but I'd like to be 100% sure.

@marten-seemann I'm not understanding "get the QUIC update first". Could you specify it more clearly?

I've fetched the latest quic-go in $GOPATH/src/github.com/lucas-clemente/quic-go as well as caddy itself and then rebuilt caddy. I am still able to replicate the issue but I'm possibly doing something wrong.

@hansfilipelo Run go get -u github.com/lucas-clemente/quic-go then rebuild Caddy. Doing a git pull isn't enough because it doesn't rebuild the package.

Running the following:

go get -u github.com/lucas-clemente/quic-go
go get -u github.com/mholt/caddy

I am still able to reproduce the issue using caddy in $GOPATH/bin. Same result if I compile with ./build.sh and use the output file from it directly.

This is probably unrelated, but for Caddy, you have to go get -u github.com/mholt/caddy/caddy because of the import structure.

Try clearing $GOPATH/pkg and then run go install on those two packages (including the corrected caddy package path).

I am impressed by your patience with me. Still having the same issue after running:

rm -r $GOPATH/pkg/*
go get -u github.com/mholt/caddy/caddy
go install github.com/mholt/caddy/caddy

@hansfilipelo Also make sure you're running the correct Caddy binary, the one in $GOPATH/bin. which caddy will tell you which one you're running, assuming you're running Caddy with caddy.

@mholt I am running that binary, yes. I specify it with a hard coded path just to be sure when testing this.

@hansfilipelo: Is there any way I can easily run your test?

@marten-seemann It is reproducible when using a saved static version of the site "login.tmall.com". I'll usually have to try fetching the site ~4-6 times for the error to occur. Steps to reproduce:

  1. Use the following ~/.wgetrc:
header = Accept-Language: en-us,en;q=0.5
header = Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
header = Connection: keep-alive
user_agent = Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0
referer = /
robots = off
  1. Fetch login.tmall.com with:
wget --page-requisites --html-extension --convert-links --span-hosts --tries=2 --timeout=10 login.tmall.com
  1. Serve this static site with caddy over QUIC to a browser with QUIC support (Chromium/Chrome/Opera).

@hansfilipelo Sorry, I can't reproduce the issue. I'm not sure I got the setup right, because this is not a minimal working example. Can you please provide a minimal example, i.e. one where I 1. don't have to download any website first, 2. I don't have to edit my /etc/hosts and 3. I don't have to generate certificates first. Please also post the Caddyfile for this example.

@marten-seemann This is a minimal example.

  1. Can't really fetch this site with Caddy over QUIC without downloading it first. I'm not the developer of that site and have no affiliation with tmall.com and I do not even know what company tmall is - hence I have no idea what they are doing to trigger the caddy crash.

  2. You can serve the downloaded site on any domain, including your own that has a valid certificate (which you will need since you are running QUIC in a Chromium based browser) so you do not need to edit your hosts file. This is thanks to wget rewriting links on the page you download to relative links.

  3. See 2.

Caddyfile is posted in the first post of the issue but here it is again with comments:

$hostname:443 # Here you put your hostname
root $webRoot # This is where you'll store the index of the site you've fetched with wget
gzip
log $logFolder/caddy.access.log
errors $logFolder/caddy.error.log # Log lines probably does not matter

tls $certFolder/cert.pem $certFolder/key.pem { # These could be generated automatically if DNS is set globally
    protocols tls1.2 tls1.2
}

Did not mean to make my comment sound angry - but reading it I can see that it is. I'm sorry for that. The internet does not need more angry comments and I'm not angry - rather grateful for getting help in solving this issue.

Remember that you can serve the site through a sub folder as well (which I'm doing). So you can store it in $webRoot/login.tmall.com for example and then use your browser to fetch yourdomain.tld/login.tmall.com.

One could also start Chromium with the flag --origin-to-force-quic-on=yourdomain.tld:443

@hansfilipelo I was about to write the same about my comment - reading it now it sounds angry to me as well ;) Turns out I didn't read your steps carefully enough, so of course I couldn't reproduce the issue. Sorry for that!

I'm now loading the wgetted version of login.tmall.com via QUIC. So thanks for the detailed steps.

That was a bit tricky!
I think this issue should be fixed now, I don't get any crashes more. Can you please test again, @hansfilipelo? Thanks for that catch!

I am unable to replicate the issue now. Great work @marten-seemann.

Fixed with commit c7b4ad6e in quic-go (https://github.com/lucas-clemente/quic-go/commit/c7b4ad6e80bcc617ad58db5486c279eafa74415a).

Wow, thanks both of you. @marten-seemann I'm continuously impressed with your work!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xfzka picture xfzka  路  3Comments

ericmdantas picture ericmdantas  路  3Comments

SteffenDE picture SteffenDE  路  3Comments

jgsqware picture jgsqware  路  3Comments

treviser picture treviser  路  3Comments