Linkerd2: exchanging gRPC metadata is inconsistent

Created on 10 May 2019  ยท  9Comments  ยท  Source: linkerd/linkerd2

Bug Report

What is the issue?

It seems like linkerd-proxy doesn't deal with array metadata consistently.

How can it be reproduced?

Logs, error output, etc

The headers that linkerd-proxy receives from a request:

{
  "x-token": "",
  "x-trace-id": "94b5687f-b966-42ee-b52f-7ddbb3639d3a",
  "actorid": "superagent/ef69e4de-be2d-4778-a2e1-068c3d49dd0b",
  "actorrole": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
  "actorscopes": "api",
  "actorscopes": "metaconsole",
  "actorscopes": "console",
  "originatorid": "superagent/ef69e4de-be2d-4778-a2e1-068c3d49dd0b",
  "originatorrole": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
  "originatorscopes": "api.tenant.get.tenant/075dd0a2-87fc-4406-84c5-32da7454ad26",
  "originatorscopes": "api",
  "originatorscopes": "metaconsole",
  "originatorscopes": "console",
  "supplierid": "supplier/aaaaaaaa-aaaa-1aaa-aaaa-aaaaaaaaaaaa",
  "roles": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
  "roles": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
  "te": "trailers",
  "content-type": "application/grpc",
  "user-agent": "grpc-node/1.20.3 grpc-c/7.0.0 (linux; chttp2; godric)",
  "grpc-accept-encoding": "identity,deflate,gzip",
  "accept-encoding": "identity,gzip",
  "grpc-timeout": "30S",
  "l5d-dst-canonical": "tenant-service.default.svc.cluster.local:9090"
}

Log of the headers received by the app behind linkerd-proxy:

     {
        "x-token": "โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ",
        "x-trace-id": "94b5687f-b966-42ee-b52f-7ddbb3639d3a",
        "actorid": "superagent/ef69e4de-be2d-4778-a2e1-068c3d49dd0b",
        "actorrole": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
        "actorscopes": [
          "api",
          "metaconsole",
          "console"
        ],
        "originatorid": "superagent/ef69e4de-be2d-4778-a2e1-068c3d49dd0b",
        "originatorrole": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
        "originatorscopes": "api.tenant.get.tenant/075dd0a2-87fc-4406-84c5-32da7454ad26",
        "supplierid": [
          "api",
          "metaconsole",
          "console",
          "supplier/aaaaaaaa-aaaa-1aaa-aaaa-aaaaaaaaaaaa"
        ],
        "roles": [
          "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
          "role/c3cad570-b112-4273-86b9-89d9fb4dee0b"
        ],
        "user-agent": "grpc-node/1.20.3 grpc-c/7.0.0 (linux; chttp2; godric)",
        "l5d-dst-canonical": "tenant-service.default.svc.cluster.local:9090"
      }

linkerd check output

kubernetes-api
--------------
โˆš can initialize the client
โˆš can query the Kubernetes API

kubernetes-version
------------------
โˆš is running the minimum Kubernetes API version
โˆš is running the minimum kubectl version

linkerd-existence
-----------------
โˆš control plane namespace exists
โˆš controller pod is running
โˆš can initialize the client
โˆš can query the control plane API

linkerd-api
-----------
โˆš control plane pods are ready
โˆš control plane self-check
โˆš [kubernetes] control plane can talk to Kubernetes
โˆš [prometheus] control plane can talk to Prometheus
โˆš no invalid service profiles

linkerd-version
---------------
โˆš can determine the latest version
โˆš cli is up-to-date

control-plane-version
---------------------
โˆš control plane is up-to-date
โˆš control plane and cli versions match

Status check results are โˆš

Environment

  • Kubernetes Version:v1.14.1
  • Cluster Environment: (GKE, AKS, kops, ...): v1.12.7-gke.10
  • Host OS:
  • Linkerd version: stable-2.3.0 and edge-19.5.1

Possible solution

Additional context

The communication between meshed gRPC services that do not exchange metadata works without any issues.
Trying to run the same command multiple times goes past the metadata error however it fails due to headers getting mutated.

areproxy bug

Most helpful comment

I've since found a bug in the hpack library we use (though the bug caused a crash), which makes me wonder if it's related to CONTINUATION frames that include headers with the same name, when another HEADERS frame causes the table to evict that index. If so, that'd explain why values are appearing with the wrong name. Still investigating.

All 9 comments

I'm trying to replicate this in a test case, and haven't yet been able to trigger it. Granted, the client and servers are using the same http2 library, and I don't know if that's part of it. What is your server app software using? Is it grpc-node, like the client?

Hi @seanmonstar, you are correct, our server app is running grpc-node too. I will try and create a simple setup that replicates the issue.

I've since found a bug in the hpack library we use (though the bug caused a crash), which makes me wonder if it's related to CONTINUATION frames that include headers with the same name, when another HEADERS frame causes the table to evict that index. If so, that'd explain why values are appearing with the wrong name. Still investigating.

We have a fix in review at https://github.com/hyperium/h2/pull/356. I expect this to be in the edge-19.5.3 release this week, barring any unforeseen delays.

Amazing @olix0r thank you for the update :) .

@calinah edge-19.5.3 includes the aformentioned fix. If you get a chance, we'd love to know if this resolves the behavior you've been observing.

:; curl -sL https://run.linkerd.io/install-edge |sh -
...
:; linkerd upgrade |kubectl apply -f -
...
# and then roll injected pods to get the new proxy

@olix0r, @seanmonstar edge-19.5.3 seems to solve the header issue but we're seeing other issues - currently troubleshooting to decide whether the new issues are on our side or linkerd. If you don't mind leaving this ticket open for a bit more while I'm investigating and I'll update it again very soon.

@calinah of course! looking forward to learning what you find

@calinah i'm going to close this bug out since it sounds like the header issue is fixed; but please open another issue if you're still seeing those problems we observed last week!

Was this page helpful?
0 / 5 - 0 ratings