Caddy: Frequent hangs when using http/2 push

Created on 2 Dec 2020  路  6Comments  路  Source: caddyserver/caddy

My team and I use Caddy as a reverse proxy, and we rely on HTTP/2 Push a lot. We've started with v2.2.1, on certain configurations we experienced random hangs.

We thought that the problem was fixed with this merge request: https://github.com/caddyserver/caddy/pull/3875 (more details can be found in the net/http issue: https://github.com/golang/go/issues/42534)
Because it didn't occur for past couple weeks, since it was merged. But today we've found another configuration that makes it very easy to reproduce the problem again, on any of the recent Caddy versions.

Caddy behavior when hangs happen

  • Web browser hangs infinitely on some resources during webpage assets download:
    image
  • When it happens, Caddy doesn't output any error messages
  • During that hang, if I try to download those "pending" assets with a different browser or console curl - it might work for some assets (usually, smaller ones), but for many assets, it hangs infinitely, i.e. when the problem happens for one user - i basically makes the whole website inaccessible for other users too, till that one user hits "stop" button
  • When I hit the "stop" button in a browser tab with hanging downloads, Caddy outputs lots of errors to console:
    ERROR http.handlers.reverse_proxy aborting with incomplete response {"error": "http2: stream closed"}

How to reproduce

It depends on the proxied website and caddy config, and some random factors, thus it occurs with different frequency on different hardware. The steps are:

  1. Start caddy with the configuration provided below
  2. Open developer tools network tab, to visually see the hangs; checking the "disable cache" toggle will help reproduce the problem faster, but is not necessary
  3. Navigate to / page in a browser (i.e., https://terem-pro.localhost)
  4. Wait till it fully loads
  5. If it didn't hang on step 4 - hit f5, and again, wait till it fully loads; repeat several times if needed

On our test server, it usually hangs after 2-3 reloads. On some devices, it might require 10-15 attempts but still hangs at some point.

Caddy version

Reproduces on:

  • v2.2.1
  • v2.3.0-beta.1
  • current head of master (4cff36d731390915649261f0e9c088be0eeafcf1), "caddyauth: Use buffered channel passed to signal.Notify"

Built with:
CADDY_RACE_DETECTOR=1 xcaddy build <revision>

Caddy configuration

https://terem-pro.localhost {                                                 
    handle {                                                                                                                                                           
        reverse_proxy https://www.terem-pro.ru {
            header_up host {http.reverse_proxy.upstream.host}
        }                                

        push / {                         
            /local/components/terem/catalog.list/templates/index.best.seller/style.css
            /local/components/terem/new_services.content/templates/home.banner.lots/style.css
            /local/components/terem/slider.blocks/templates/slider.useful/style.css
            /local/components/terem/standard.blocks/templates/call.action.white/style.css
            /local/components/terem/review.list/templates/carousel.home/style.css
            /local/components/terem/standard.blocks/templates/promo.red.home/style.css
            /local/components/terem/promotion.list/templates/home.slider/style.css
            /local/components/terem/form.form/templates/template.pdf/style.css
            /local/templates/terem/components/bitrix/menu/template.header.menu.top/desktop-menu.css
            /local/components/terem/form.form/templates/template.taxi/style.css
            /assets/resources/css/home.css
            /local/templates/terem/components/bitrix/menu/template.header.menu-mobile/style_menu.css
            /local/components/terem/catalog.type.list/templates/.default/style.css
            /assets/resources/css/styles.css
            /bitrix/cache/css/s1/terem/template_ad73b02503569e1113abf0b013fdbb28/template_ad73b02503569e1113abf0b013fdbb28_v1.css?16067202133580
            /bitrix/cache/css/s1/terem/page_074396ca6d41424fe878cb365c109aa1/page_074396ca6d41424fe878cb365c109aa1_v1.css?160672023225970
        }
    }
}

System environment:

Both test server and my PC run on Ubuntu 20.04.1 LTS, x86_64, No-docker Caddy installation

Highlights

  • it allows an easy Denial of Service attack: a single client makes the whole server non-functional
  • there seems to be no timeout, so it might keep the server locked for a while
  • there's no log message when it hangs (only a message when user hits stop, but it's not very useful, because the same message appears when user just hits stop in the middle of a normal transfer); thus, this one or other similar situations might be happening on production servers right now, and if it happens with a low enough frequency - it might be tough to catch the problem or distinguish it from just a random network glitch
bug upstream

Most helpful comment

Will be fixed in Go 1.16.

All 6 comments

There's an awful lot going on here... can you simplify it down more?

How can we reproduce it? You haven't provided the site files so I can't use that config to reproduce the behavior.

Does it happen without reverse proxying?

Does it happen without reverse proxying to an external Russian server?

It's likely that this is a bug in the Go x/net libraries, as our code doesn't deal with the details of HTTP/2 streams.

Have you been able to verify this is not a browser bug?

Okay, I took a few minutes and was able to reproduce this in Chrome -- I had mistakenly thought you were trying to push local static resources but then I realized you had no file_server -- you're proxying everything to an external server, including pushes. I noticed that none of the requests stuck with "pending" in the Status column have "push" in the Initiator column -- in other words, it does not appear to be the push that is hanging.

When you close the browser it then says "http2: stream closed" (as you noted) and also "client disconnected" errors. If the response is incomplete, this is most definitely a bug in either:

  • The Go x/net libraries
  • The browser
  • Or the remote/upstream Russian server

Or a combination of all three.

Edit: Enable debug mode for details about what Caddy is pushing:

{
    debug
}
...

Also, given that Chrome intends to remove server push I doubt that there will be much interest in fixing any lingering bugs, even upstream.

I reopened the ticket in Go's repository.

Will be fixed in Go 1.16.

Thanks so much for the time spent in addressing the issue! And thanks for understanding.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kilpatty picture kilpatty  路  3Comments

crvv picture crvv  路  3Comments

PhilmacFLy picture PhilmacFLy  路  3Comments

dafanasiev picture dafanasiev  路  3Comments

klaasel picture klaasel  路  3Comments