Linkerd2: Not respecting SETTINGS_HEADER_TABLE_SIZE when set to 0

Created on 25 Jul 2019  路  6Comments  路  Source: linkerd/linkerd2

Bug Report

What is the issue?

When using the nginx Kubernetes Ingress with gRPC, multiple unary calls will result in a 502 Bad Gateway response from the nginx ingress. The logs from the ingress controller yield the following error: upstream sent invalid http2 table index: 66 where upstream is a ASP.NET Core 3.0-Preview7 service but there is a Linkerd sidecar proxy in front of it. Upon further investigation, it seems that this explicitly not supported by nginx and nginx informs upstream services of this by stating that SETTINGS_HEADER_TABLE_SIZE is 0 (source).

nginx explicitly announces that it does not support dynamic header compression by sending the SETTINGS_HEADER_TABLE_SIZE value set to 0, see 鈥媓ere. Any attempt of an upstream server to use indexes from the dynamic range is a bug in the upstream server (note that at least grpc-go implementation is known to be buggy, see commit log in 2713b2dbf5bb).

Removing the linkerd sidecar proxy from the equation resolves the issue as HTTP/2 dynamic HPACK doesn't seem to be supported by Kestral (dotnet) anyways.

How can it be reproduced?

  1. Deploy a gRPC service with a unary endpoint.
  2. Create a nginx ingress that is configured for gRPC. See example
  3. Create a client application, call the unary endpoint multiple times with the same shared client.
  4. 502 Bad Gateway is returned

Environment

  • Kubernetes Version: 1.13.7
  • Cluster Environment: AKS
  • Host OS: Linux (ubuntu 16.04)
  • Linkerd version: 19.6.4

Additional context

This was an issue resolved in the the go-grpc lib: go-grpc/1928

areproxy bug

All 6 comments

@kdelorey Thanks for the excellent bug report!

@seanmonstar Do you have time to dig into this on the hyper/h2 side?

An expected fix has been merged to the proxy and should be in the next edge release.

@olix0r @seanmonstar Awesome job! Thanks for such a quick turn around!

I'll give it a shot when the edge release rolls around, I assume in a week or so?

@kdelorey i'd expect an edge release next thursday; but I've also just pushed a proxy image so that you can test the fix by setting the pod annotation config.linkerd.io/proxy-version: fix-3141-0 (also via linkerd inject --proxy-version=fix-3141-0).

@olix0r thanks for pushing that image, I gave it a quick test and it seems to have resolved our issue. Closing 馃憤.

I was doing additional testing of something else where a client within the cluster (no ingress involved) was making gRPC Unary calls at a high rate and after 2 minutes was receiving problems where the header seemed to be corrupted. I tried using the fix-3141-0 of Linkerd and it seems to have resolved the issue. So this fix has improved stability not only with a ingresses, but also with internal cluster traffic where services are talking directly.

Was this page helpful?
0 / 5 - 0 ratings