When using the nginx Kubernetes Ingress with gRPC, multiple unary calls will result in a 502 Bad Gateway response from the nginx ingress. The logs from the ingress controller yield the following error: upstream sent invalid http2 table index: 66 where upstream is a ASP.NET Core 3.0-Preview7 service but there is a Linkerd sidecar proxy in front of it. Upon further investigation, it seems that this explicitly not supported by nginx and nginx informs upstream services of this by stating that SETTINGS_HEADER_TABLE_SIZE is 0 (source).
nginx explicitly announces that it does not support dynamic header compression by sending the SETTINGS_HEADER_TABLE_SIZE value set to 0, see 鈥媓ere. Any attempt of an upstream server to use indexes from the dynamic range is a bug in the upstream server (note that at least grpc-go implementation is known to be buggy, see commit log in 2713b2dbf5bb).
Removing the linkerd sidecar proxy from the equation resolves the issue as HTTP/2 dynamic HPACK doesn't seem to be supported by Kestral (dotnet) anyways.
This was an issue resolved in the the go-grpc lib: go-grpc/1928
@kdelorey Thanks for the excellent bug report!
@seanmonstar Do you have time to dig into this on the hyper/h2 side?
An expected fix has been merged to the proxy and should be in the next edge release.
@olix0r @seanmonstar Awesome job! Thanks for such a quick turn around!
I'll give it a shot when the edge release rolls around, I assume in a week or so?
@kdelorey i'd expect an edge release next thursday; but I've also just pushed a proxy image so that you can test the fix by setting the pod annotation config.linkerd.io/proxy-version: fix-3141-0 (also via linkerd inject --proxy-version=fix-3141-0).
@olix0r thanks for pushing that image, I gave it a quick test and it seems to have resolved our issue. Closing 馃憤.
I was doing additional testing of something else where a client within the cluster (no ingress involved) was making gRPC Unary calls at a high rate and after 2 minutes was receiving problems where the header seemed to be corrupted. I tried using the fix-3141-0 of Linkerd and it seems to have resolved the issue. So this fix has improved stability not only with a ingresses, but also with internal cluster traffic where services are talking directly.