Linkerd2: Requests fail when src & dst are the same

Created on 5 Sep 2018 · 14Comments · Source: linkerd/linkerd2

Hello,

When making HTTP1.1 requests where the src and dst are the same (a pod sending a request to itself) the proxy responds with a 500 status code. If the request is sent to a different pod in the deployment, everything works fine. If you send the request on the loopback address rather than the service DNS name, that is also fine. Is this expected?

$ linkerd version
Client version: v18.8.4
Server version: v18.8.4

Using the default sidecar generated from linkerd inject

Thanks

bug priorittriage

Source

tanuck

All 14 comments

Well that sounds interesting. Is there anything special with your application? Are you using TLS? Could we see your k8s resource yaml?

grampelberg on 5 Sep 2018

So no TLS. It was initially found on a deployment running a Node.js GraphQL application on port 80. I've since reproduced this on every other deployment I've tried.

Here is the simplest reproduction that I've found:

kubectl run nginx --image=nginx --port=80 --replicas=2 -o yaml --dry-run | linkerd inject - | kc apply -f -
kubectl expose deploy nginx --port=80 --target-port=80 --type=ClusterIP
exec into one of the nginx containers and curl -v nginx - every other request should return 500

tanuck on 5 Sep 2018

That's fantastic replication steps, thank you!

grampelberg on 5 Sep 2018

👍1

Quick update - just upgraded to v18.9.1 and this problem still persists.

tanuck on 13 Sep 2018

I'd be curious to see what linkerd tap deploy nginx shows while the curl command is run. Also, the output of curl localhost:4191/metrics | grep -e request_total -e response_total might be informative.

olix0r on 13 Sep 2018

Hm, so if the dst is a socket address, the proxy will use it directly, which would explain the loopback succeeding. However, if it's hostname, then it will either:

If it looks like a service in the cluster, ask the controller for the socket address.
Or perform a system DNS lookup, and try to use that.

Is it possible to collect debug logs from the proxy? Or do we have an environment that I can poke into and enable them myself?

seanmonstar on 13 Sep 2018

So I used my steps from above. Then after sending 4 curl -v nginx requests, the tap and prometheus data look like this:

$ linkerd tap deploy nginx
req id=0:71 proxy=out src=10.0.1.2:34890 dst=10.0.1.2:80 tls=no_identity :method=GET :authority=nginx :path=/
end id=0:71 proxy=out src=10.0.1.2:34890 dst=10.0.1.2:80 tls=no_identity reset-error=6 duration=0µs response-length=0B


req id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity :method=GET :authority=nginx :path=/
req id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :method=GET :authority=nginx :path=/
rsp id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :status=200 latency=911µs
end id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled duration=27µs response-length=612B
rsp id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity :status=200 latency=2562µs
end id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity duration=46µs response-length=612B


req id=0:73 proxy=out src=10.0.1.2:35184 dst=10.0.1.2:80 tls=no_identity :method=GET :authority=nginx :path=/
end id=0:73 proxy=out src=10.0.1.2:35184 dst=10.0.1.2:80 tls=no_identity reset-error=6 duration=0µs response-length=0B


req id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity :method=GET :authority=nginx :path=/
req id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :method=GET :authority=nginx :path=/
rsp id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :status=200 latency=518µs
end id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled duration=40µs response-length=612B
rsp id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity :status=200 latency=2677µs
end id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity duration=68µs response-length=612B

# HELP request_total Total count of HTTP requests.
# TYPE request_total counter
request_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-k6rkx",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery"} 4
request_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-v2rqk",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery"} 2
# HELP response_total Total count of HTTP responses
# TYPE response_total counter
response_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-v2rqk",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery",classification="success",status_code="200"} 2

@olix0r hope that helps!

tanuck on 19 Sep 2018

Having same issue.

We have a pod that acts as an authorization microservice, this pod can make requests to itself to check other permissions, so the hostname is http://authorization, this previously was working, after enabling linkerd2 it stopped working, linkerd-proxy container gives the following error:

ERR! proxy={server=out listen=127.0.0.1:4140 remote=10.16.0.48:60652} linkerd2_proxy::proxy::http::router service error: Error caused by underlying HTTP/2 error: protocol error: frame with invalid size

JCMais on 1 Nov 2018

The tap logs show the request to the other pod both from out and in, so the two proxies were involved.
The tap logs don't show the request in when the pod is the same, suggesting to me that the proxy never receives the request it should be sending itself.
The reset-error=6 is a FRAME_SIZE_ERROR from HTTP2, which would be the out proxy making an HTTP2 request to dst, and the bytes it got back are likely not HTTP2, and triggering that error.
The proxies will speak HTTP2 to each other when they know there is a proxy on the other side, so a connection returning bytes that aren't HTTP2 suggests it's connecting to something else.

All this makes me wonder if something is preventing the connection from being redirected to the proxy. Perhaps something in the iptables rules that are setup during proxy-init.

seanmonstar on 7 Nov 2018

Actually, while there was a proxy change for this, it won't be fixed until the iptables config is changed in this repo also.

seanmonstar on 15 Nov 2018

Thanks for fixing this!

JCMais on 16 Nov 2018

👍1

I'm seeing the same reset-error=6 when trying to load balance gRPC using linkerd2 and nginx ingress.

Steps to recreate here:

https://github.com/glindsell/free-peer/tree/ingress/stream-meshed

glindsell on 21 Mar 2019

@glindsell thanks for putting together a repro and sharing! It's a little hard to tease out a clear problem description from that README, though. Would you mind opening a new issue so that we can make sure we get to the bottom of it?

olix0r on 21 Mar 2019

@olix0r good idea, I've updated the issue which I opened specifically for the purpose of gRPC stream load balancing with this info:

https://github.com/linkerd/linkerd2/issues/2120

glindsell on 22 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings