Hello,
When making HTTP1.1 requests where the src and dst are the same (a pod sending a request to itself) the proxy responds with a 500 status code. If the request is sent to a different pod in the deployment, everything works fine. If you send the request on the loopback address rather than the service DNS name, that is also fine. Is this expected?
$ linkerd version
Client version: v18.8.4
Server version: v18.8.4
Using the default sidecar generated from linkerd inject
Thanks
Well that sounds interesting. Is there anything special with your application? Are you using TLS? Could we see your k8s resource yaml?
So no TLS. It was initially found on a deployment running a Node.js GraphQL application on port 80. I've since reproduced this on every other deployment I've tried.
Here is the simplest reproduction that I've found:
kubectl run nginx --image=nginx --port=80 --replicas=2 -o yaml --dry-run | linkerd inject - | kc apply -f -kubectl expose deploy nginx --port=80 --target-port=80 --type=ClusterIPexec into one of the nginx containers and curl -v nginx - every other request should return 500That's fantastic replication steps, thank you!
Quick update - just upgraded to v18.9.1 and this problem still persists.
I'd be curious to see what linkerd tap deploy nginx shows while the curl command is run. Also, the output of curl localhost:4191/metrics | grep -e request_total -e response_total might be informative.
Hm, so if the dst is a socket address, the proxy will use it directly, which would explain the loopback succeeding. However, if it's hostname, then it will either:
Is it possible to collect debug logs from the proxy? Or do we have an environment that I can poke into and enable them myself?
So I used my steps from above. Then after sending 4 curl -v nginx requests, the tap and prometheus data look like this:
$ linkerd tap deploy nginx
req id=0:71 proxy=out src=10.0.1.2:34890 dst=10.0.1.2:80 tls=no_identity :method=GET :authority=nginx :path=/
end id=0:71 proxy=out src=10.0.1.2:34890 dst=10.0.1.2:80 tls=no_identity reset-error=6 duration=0碌s response-length=0B
req id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity :method=GET :authority=nginx :path=/
req id=0:0 proxy=in src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :method=GET :authority=nginx :path=/
rsp id=0:0 proxy=in src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :status=200 latency=911碌s
end id=0:0 proxy=in src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled duration=27碌s response-length=612B
rsp id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity :status=200 latency=2562碌s
end id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity duration=46碌s response-length=612B
req id=0:73 proxy=out src=10.0.1.2:35184 dst=10.0.1.2:80 tls=no_identity :method=GET :authority=nginx :path=/
end id=0:73 proxy=out src=10.0.1.2:35184 dst=10.0.1.2:80 tls=no_identity reset-error=6 duration=0碌s response-length=0B
req id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity :method=GET :authority=nginx :path=/
req id=0:1 proxy=in src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :method=GET :authority=nginx :path=/
rsp id=0:1 proxy=in src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :status=200 latency=518碌s
end id=0:1 proxy=in src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled duration=40碌s response-length=612B
rsp id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity :status=200 latency=2677碌s
end id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity duration=68碌s response-length=612B
# HELP request_total Total count of HTTP requests.
# TYPE request_total counter
request_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-k6rkx",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery"} 4
request_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-v2rqk",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery"} 2
# HELP response_total Total count of HTTP responses
# TYPE response_total counter
response_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-v2rqk",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery",classification="success",status_code="200"} 2
@olix0r hope that helps!
Having same issue.
We have a pod that acts as an authorization microservice, this pod can make requests to itself to check other permissions, so the hostname is http://authorization, this previously was working, after enabling linkerd2 it stopped working, linkerd-proxy container gives the following error:
ERR! proxy={server=out listen=127.0.0.1:4140 remote=10.16.0.48:60652} linkerd2_proxy::proxy::http::router service error: Error caused by underlying HTTP/2 error: protocol error: frame with invalid size
out and in, so the two proxies were involved.in when the pod is the same, suggesting to me that the proxy never receives the request it should be sending itself.reset-error=6 is a FRAME_SIZE_ERROR from HTTP2, which would be the out proxy making an HTTP2 request to dst, and the bytes it got back are likely not HTTP2, and triggering that error.All this makes me wonder if something is preventing the connection from being redirected to the proxy. Perhaps something in the iptables rules that are setup during proxy-init.
Actually, while there was a proxy change for this, it won't be fixed until the iptables config is changed in this repo also.
Thanks for fixing this!
I'm seeing the same reset-error=6 when trying to load balance gRPC using linkerd2 and nginx ingress.
Steps to recreate here:
https://github.com/glindsell/free-peer/tree/ingress/stream-meshed
@glindsell thanks for putting together a repro and sharing! It's a little hard to tease out a clear problem description from that README, though. Would you mind opening a new issue so that we can make sure we get to the bottom of it?
@olix0r good idea, I've updated the issue which I opened specifically for the purpose of gRPC stream load balancing with this info: