Linkerd2: Requests fail when src & dst are the same

Created on 5 Sep 2018  路  14Comments  路  Source: linkerd/linkerd2

Hello,

When making HTTP1.1 requests where the src and dst are the same (a pod sending a request to itself) the proxy responds with a 500 status code. If the request is sent to a different pod in the deployment, everything works fine. If you send the request on the loopback address rather than the service DNS name, that is also fine. Is this expected?

$ linkerd version
Client version: v18.8.4
Server version: v18.8.4

Using the default sidecar generated from linkerd inject

Thanks

bug priorittriage

All 14 comments

Well that sounds interesting. Is there anything special with your application? Are you using TLS? Could we see your k8s resource yaml?

So no TLS. It was initially found on a deployment running a Node.js GraphQL application on port 80. I've since reproduced this on every other deployment I've tried.

Here is the simplest reproduction that I've found:

  • kubectl run nginx --image=nginx --port=80 --replicas=2 -o yaml --dry-run | linkerd inject - | kc apply -f -
  • kubectl expose deploy nginx --port=80 --target-port=80 --type=ClusterIP
  • exec into one of the nginx containers and curl -v nginx - every other request should return 500

That's fantastic replication steps, thank you!

Quick update - just upgraded to v18.9.1 and this problem still persists.

I'd be curious to see what linkerd tap deploy nginx shows while the curl command is run. Also, the output of curl localhost:4191/metrics | grep -e request_total -e response_total might be informative.

Hm, so if the dst is a socket address, the proxy will use it directly, which would explain the loopback succeeding. However, if it's hostname, then it will either:

  • If it looks like a service in the cluster, ask the controller for the socket address.
  • Or perform a system DNS lookup, and try to use that.

Is it possible to collect debug logs from the proxy? Or do we have an environment that I can poke into and enable them myself?

So I used my steps from above. Then after sending 4 curl -v nginx requests, the tap and prometheus data look like this:

$ linkerd tap deploy nginx
req id=0:71 proxy=out src=10.0.1.2:34890 dst=10.0.1.2:80 tls=no_identity :method=GET :authority=nginx :path=/
end id=0:71 proxy=out src=10.0.1.2:34890 dst=10.0.1.2:80 tls=no_identity reset-error=6 duration=0碌s response-length=0B


req id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity :method=GET :authority=nginx :path=/
req id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :method=GET :authority=nginx :path=/
rsp id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :status=200 latency=911碌s
end id=0:0 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled duration=27碌s response-length=612B
rsp id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity :status=200 latency=2562碌s
end id=0:72 proxy=out src=10.0.1.2:35054 dst=10.0.2.3:80 tls=no_identity duration=46碌s response-length=612B


req id=0:73 proxy=out src=10.0.1.2:35184 dst=10.0.1.2:80 tls=no_identity :method=GET :authority=nginx :path=/
end id=0:73 proxy=out src=10.0.1.2:35184 dst=10.0.1.2:80 tls=no_identity reset-error=6 duration=0碌s response-length=0B


req id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity :method=GET :authority=nginx :path=/
req id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :method=GET :authority=nginx :path=/
rsp id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled :status=200 latency=518碌s
end id=0:1 proxy=in  src=10.0.1.2:52002 dst=10.0.2.3:80 tls=disabled duration=40碌s response-length=612B
rsp id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity :status=200 latency=2677碌s
end id=0:74 proxy=out src=10.0.1.2:35350 dst=10.0.2.3:80 tls=no_identity duration=68碌s response-length=612B
# HELP request_total Total count of HTTP requests.
# TYPE request_total counter
request_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-k6rkx",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery"} 4
request_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-v2rqk",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery"} 2
# HELP response_total Total count of HTTP responses
# TYPE response_total counter
response_total{authority="nginx",direction="outbound",dst_control_plane_ns="linkerd",dst_deployment="nginx",dst_namespace="default",dst_pod="nginx-665d5c9995-v2rqk",dst_pod_template_hash="2218175551",dst_service="nginx",tls="no_identity",no_tls_reason="not_provided_by_service_discovery",classification="success",status_code="200"} 2

@olix0r hope that helps!

Having same issue.

We have a pod that acts as an authorization microservice, this pod can make requests to itself to check other permissions, so the hostname is http://authorization, this previously was working, after enabling linkerd2 it stopped working, linkerd-proxy container gives the following error:

ERR! proxy={server=out listen=127.0.0.1:4140 remote=10.16.0.48:60652} linkerd2_proxy::proxy::http::router service error: Error caused by underlying HTTP/2 error: protocol error: frame with invalid size
  • The tap logs show the request to the other pod both from out and in, so the two proxies were involved.
  • The tap logs don't show the request in when the pod is the same, suggesting to me that the proxy never receives the request it should be sending itself.
  • The reset-error=6 is a FRAME_SIZE_ERROR from HTTP2, which would be the out proxy making an HTTP2 request to dst, and the bytes it got back are likely not HTTP2, and triggering that error.
  • The proxies will speak HTTP2 to each other when they know there is a proxy on the other side, so a connection returning bytes that aren't HTTP2 suggests it's connecting to something else.

All this makes me wonder if something is preventing the connection from being redirected to the proxy. Perhaps something in the iptables rules that are setup during proxy-init.

Actually, while there was a proxy change for this, it won't be fixed until the iptables config is changed in this repo also.

Thanks for fixing this!

I'm seeing the same reset-error=6 when trying to load balance gRPC using linkerd2 and nginx ingress.

Steps to recreate here:

https://github.com/glindsell/free-peer/tree/ingress/stream-meshed

@glindsell thanks for putting together a repro and sharing! It's a little hard to tease out a clear problem description from that README, though. Would you mind opening a new issue so that we can make sure we get to the bottom of it?

@olix0r good idea, I've updated the issue which I opened specifically for the purpose of gRPC stream load balancing with this info:

https://github.com/linkerd/linkerd2/issues/2120

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wmorgan picture wmorgan  路  3Comments

tustvold picture tustvold  路  4Comments

alpeb picture alpeb  路  3Comments

ihcsim picture ihcsim  路  4Comments

skalinets picture skalinets  路  3Comments