Linkerd2: Not respecting SETTINGS_HEADER_TABLE_SIZE when set to 0

Created on 25 Jul 2019 · 6Comments · Source: linkerd/linkerd2

Bug Report

What is the issue?

When using the nginx Kubernetes Ingress with gRPC, multiple unary calls will result in a 502 Bad Gateway response from the nginx ingress. The logs from the ingress controller yield the following error: upstream sent invalid http2 table index: 66 where upstream is a ASP.NET Core 3.0-Preview7 service but there is a Linkerd sidecar proxy in front of it. Upon further investigation, it seems that this explicitly not supported by nginx and nginx informs upstream services of this by stating that SETTINGS_HEADER_TABLE_SIZE is 0 (source).

nginx explicitly announces that it does not support dynamic header compression by sending the SETTINGS_HEADER_TABLE_SIZE value set to 0, see here. Any attempt of an upstream server to use indexes from the dynamic range is a bug in the upstream server (note that at least grpc-go implementation is known to be buggy, see commit log in 2713b2dbf5bb).

Removing the linkerd sidecar proxy from the equation resolves the issue as HTTP/2 dynamic HPACK doesn't seem to be supported by Kestral (dotnet) anyways.

How can it be reproduced?

Deploy a gRPC service with a unary endpoint.
Create a nginx ingress that is configured for gRPC. See example
Create a client application, call the unary endpoint multiple times with the same shared client.
502 Bad Gateway is returned

Environment

Kubernetes Version: 1.13.7
Cluster Environment: AKS
Host OS: Linux (ubuntu 16.04)
Linkerd version: 19.6.4

Additional context

This was an issue resolved in the the go-grpc lib: go-grpc/1928

areproxy bug

Source

kdelorey

All 6 comments

@kdelorey Thanks for the excellent bug report!

@seanmonstar Do you have time to dig into this on the hyper/h2 side?

olix0r on 25 Jul 2019

👍1

An expected fix has been merged to the proxy and should be in the next edge release.

seanmonstar on 26 Jul 2019

@olix0r @seanmonstar Awesome job! Thanks for such a quick turn around!

I'll give it a shot when the edge release rolls around, I assume in a week or so?

kdelorey on 26 Jul 2019

@kdelorey i'd expect an edge release next thursday; but I've also just pushed a proxy image so that you can test the fix by setting the pod annotation config.linkerd.io/proxy-version: fix-3141-0 (also via linkerd inject --proxy-version=fix-3141-0).

olix0r on 28 Jul 2019

@olix0r thanks for pushing that image, I gave it a quick test and it seems to have resolved our issue. Closing 👍.

kdelorey on 28 Jul 2019

I was doing additional testing of something else where a client within the cluster (no ingress involved) was making gRPC Unary calls at a high rate and after 2 minutes was receiving problems where the header seemed to be corrupted. I tried using the fix-3141-0 of Linkerd and it seems to have resolved the issue. So this fix has improved stability not only with a ingresses, but also with internal cluster traffic where services are talking directly.

kdelorey on 31 Jul 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings