It seems like linkerd-proxy doesn't deal with array metadata consistently.
The headers that linkerd-proxy receives from a request:
{
"x-token": "",
"x-trace-id": "94b5687f-b966-42ee-b52f-7ddbb3639d3a",
"actorid": "superagent/ef69e4de-be2d-4778-a2e1-068c3d49dd0b",
"actorrole": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
"actorscopes": "api",
"actorscopes": "metaconsole",
"actorscopes": "console",
"originatorid": "superagent/ef69e4de-be2d-4778-a2e1-068c3d49dd0b",
"originatorrole": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
"originatorscopes": "api.tenant.get.tenant/075dd0a2-87fc-4406-84c5-32da7454ad26",
"originatorscopes": "api",
"originatorscopes": "metaconsole",
"originatorscopes": "console",
"supplierid": "supplier/aaaaaaaa-aaaa-1aaa-aaaa-aaaaaaaaaaaa",
"roles": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
"roles": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
"te": "trailers",
"content-type": "application/grpc",
"user-agent": "grpc-node/1.20.3 grpc-c/7.0.0 (linux; chttp2; godric)",
"grpc-accept-encoding": "identity,deflate,gzip",
"accept-encoding": "identity,gzip",
"grpc-timeout": "30S",
"l5d-dst-canonical": "tenant-service.default.svc.cluster.local:9090"
}
Log of the headers received by the app behind linkerd-proxy:
{
"x-token": "โโโโโโโโ",
"x-trace-id": "94b5687f-b966-42ee-b52f-7ddbb3639d3a",
"actorid": "superagent/ef69e4de-be2d-4778-a2e1-068c3d49dd0b",
"actorrole": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
"actorscopes": [
"api",
"metaconsole",
"console"
],
"originatorid": "superagent/ef69e4de-be2d-4778-a2e1-068c3d49dd0b",
"originatorrole": "role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
"originatorscopes": "api.tenant.get.tenant/075dd0a2-87fc-4406-84c5-32da7454ad26",
"supplierid": [
"api",
"metaconsole",
"console",
"supplier/aaaaaaaa-aaaa-1aaa-aaaa-aaaaaaaaaaaa"
],
"roles": [
"role/c3cad570-b112-4273-86b9-89d9fb4dee0b",
"role/c3cad570-b112-4273-86b9-89d9fb4dee0b"
],
"user-agent": "grpc-node/1.20.3 grpc-c/7.0.0 (linux; chttp2; godric)",
"l5d-dst-canonical": "tenant-service.default.svc.cluster.local:9090"
}
linkerd check outputkubernetes-api
--------------
โ can initialize the client
โ can query the Kubernetes API
kubernetes-version
------------------
โ is running the minimum Kubernetes API version
โ is running the minimum kubectl version
linkerd-existence
-----------------
โ control plane namespace exists
โ controller pod is running
โ can initialize the client
โ can query the control plane API
linkerd-api
-----------
โ control plane pods are ready
โ control plane self-check
โ [kubernetes] control plane can talk to Kubernetes
โ [prometheus] control plane can talk to Prometheus
โ no invalid service profiles
linkerd-version
---------------
โ can determine the latest version
โ cli is up-to-date
control-plane-version
---------------------
โ control plane is up-to-date
โ control plane and cli versions match
Status check results are โ
The communication between meshed gRPC services that do not exchange metadata works without any issues.
Trying to run the same command multiple times goes past the metadata error however it fails due to headers getting mutated.
I'm trying to replicate this in a test case, and haven't yet been able to trigger it. Granted, the client and servers are using the same http2 library, and I don't know if that's part of it. What is your server app software using? Is it grpc-node, like the client?
Hi @seanmonstar, you are correct, our server app is running grpc-node too. I will try and create a simple setup that replicates the issue.
I've since found a bug in the hpack library we use (though the bug caused a crash), which makes me wonder if it's related to CONTINUATION frames that include headers with the same name, when another HEADERS frame causes the table to evict that index. If so, that'd explain why values are appearing with the wrong name. Still investigating.
We have a fix in review at https://github.com/hyperium/h2/pull/356. I expect this to be in the edge-19.5.3 release this week, barring any unforeseen delays.
Amazing @olix0r thank you for the update :) .
@calinah edge-19.5.3 includes the aformentioned fix. If you get a chance, we'd love to know if this resolves the behavior you've been observing.
:; curl -sL https://run.linkerd.io/install-edge |sh -
...
:; linkerd upgrade |kubectl apply -f -
...
# and then roll injected pods to get the new proxy
@olix0r, @seanmonstar edge-19.5.3 seems to solve the header issue but we're seeing other issues - currently troubleshooting to decide whether the new issues are on our side or linkerd. If you don't mind leaving this ticket open for a bit more while I'm investigating and I'll update it again very soon.
@calinah of course! looking forward to learning what you find
@calinah i'm going to close this bug out since it sounds like the header issue is fixed; but please open another issue if you're still seeing those problems we observed last week!
Most helpful comment
I've since found a bug in the hpack library we use (though the bug caused a crash), which makes me wonder if it's related to CONTINUATION frames that include headers with the same name, when another HEADERS frame causes the table to evict that index. If so, that'd explain why values are appearing with the wrong name. Still investigating.