I am running the latest version of linkerd edge 19.1.2 and I am getting this error
WARN admin={bg=resolver} linkerd2_proxy::control::destination::background::destination_set Destination.Get stream errored for NameAddr { name: DnsName(DNSName("cs-ch-domain-manager-v1.content-hub-test.svc.cluster.local.")), port: 8080 }: Grpc(Status { code: Unknown, error_message: "", binary_error_details: b"" })
I just deployed the latest version. Nothing more
output for linkerd logs --control-plane-component controller
linkerd linkerd-controller-7bc49fd77f-lwt8q linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
output for linkerd logs --control-plane-component controller -c proxy-api
linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api time="2019-01-21T13:54:55Z" level=info msg="Stopping watch on endpoint cs-ch-domain-manager-v1.content-hub-test:8080"
linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api W0121 15:57:34.899318 1 reflector.go:341] k8s.io/client-go/informers/factory.go:130: watch of *v1beta2.ReplicaSet ended with: too old resource version: 3417120 (3420499)
linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api time="2019-01-21T17:25:43Z" level=info msg="Establishing watch on endpoint cs-ch-domain-manager-v1.content-hub-test:8080"
linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api time="2019-01-21T17:32:18Z" level=info msg="Stopping watch on endpoint cs-ch-domain-manager-v1.content-hub-test:8080"
linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api W0121 17:49:54.531144 1 reflector.go:341] k8s.io/client-go/informers/factory.go:130: watch of *v1beta2.ReplicaSet ended with: too old resource version: 3437967 (3439015)
linkerd linkerd-controller-7bc49fd77f-lwt8q proxy-api time="2019-01-21T21:32:21Z" level=info msg="Establishing watch on endpoint linkerd-prometheus.linkerd:9090"
(If the output is long, please create a gist and
paste the link here.)
linkerd check outputkubernetes-api
--------------
โ can initialize the client
โ can query the Kubernetes API
kubernetes-version
------------------
โ is running the minimum Kubernetes API version
linkerd-existence
-----------------
โ control plane namespace exists
โ controller pod is running
โ can initialize the client
โ can query the control plane API
linkerd-api
-----------
โ control plane pods are ready
โ can query the control plane API
โ [kubernetes] control plane can talk to Kubernetes
โ [prometheus] control plane can talk to Prometheus
linkerd-service-profile
-----------------------
โ no invalid service profiles
linkerd-version
---------------
โ can determine the latest version
โ cli is up-to-date
control-plane-version
---------------------
โ control plane is up-to-date
Status check results are โ
Any chance that you have some minimal YAML manifests that we can use to reproduce the issue? I am not seeing the errors on EKS 1.11.5. Is this a new installation of edge 19.1.2 or an upgrade from an older version of Linkerd2? Also, do you know if this happens on other EKS clusters too, or just one particular cluster?
I am sorry I can't share our code. We have only one EKS cluster with Linkerd.
However, I can do testing on our side.
I upgraded the linkerd stack for each new small update. I can delete the linkerd stack and try with a new fresh installation. Let me know.
I just deleted the namespace linkerd and re-install and I am getting the following errors.
I don't have this exception 172.20.0.1:443: connect: connection refused from other apps.
linkerd linkerd-controller-7bc49fd77f-j7j8l tap time="2019-01-22T11:34:01Z" level=info msg="running version edge-19.1.2"
linkerd linkerd-controller-7bc49fd77f-j7j8l tap time="2019-01-22T11:34:01Z" level=info msg="waiting for caches to sync"
linkerd linkerd-controller-7bc49fd77f-j7j8l tap E0122 11:34:01.556917 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Pod: Get https://172.20.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 172.20.0.1:443: connect: connection refused
linkerd linkerd-controller-7bc49fd77f-j7j8l tap E0122 11:34:01.557836 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta2.Deployment: Get https://172.20.0.1:443/apis/apps/v1beta2/deployments?limit=500&resourceVersion=0: dial tcp 172.20.0.1:443: connect: connection refused
linkerd linkerd-controller-7bc49fd77f-j7j8l tap E0122 11:34:01.557934 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Service: Get https://172.20.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 172.20.0.1:443: connect: connection refused
linkerd linkerd-controller-7bc49fd77f-j7j8l tap E0122 11:34:01.558011 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.ReplicationController: Get https://172.20.0.1:443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: dial tcp 172.20.0.1:443: connect: connection refused
linkerd linkerd-controller-7bc49fd77f-j7j8l tap E0122 11:34:01.558411 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta2.ReplicaSet: Get https://172.20.0.1:443/apis/apps/v1beta2/replicasets?limit=500&resourceVersion=0: dial tcp 172.20.0.1:443: connect: connection refused
linkerd linkerd-controller-7bc49fd77f-j7j8l tap time="2019-01-22T11:34:02Z" level=info msg="caches synced"
linkerd linkerd-controller-7bc49fd77f-j7j8l tap time="2019-01-22T11:34:02Z" level=info msg="starting gRPC server on 127.0.0.1:8088"
linkerd linkerd-controller-7bc49fd77f-j7j8l tap time="2019-01-22T11:34:02Z" level=info msg="starting admin server on :9998"
linkerd linkerd-controller-7bc49fd77f-j7j8l linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-7bc49fd77f-j7j8l linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-7bc49fd77f-j7j8l linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-7bc49fd77f-j7j8l linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-7bc49fd77f-j7j8l linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-7bc49fd77f-j7j8l linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-7bc49fd77f-j7j8l linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error
So after letting my EKS cluster run overnight, my linkerd proxies in the linkerd and application namespaces are flooded with the WARN admin={bg=resolver} linkerd2_proxy::control::destination::background::destination_set Destination.Get stream errored for NameAddr { name: DnsName(DNSName("linkerd-grafana.linkerd.svc.cluster.local")), port: 3000 }: Grpc(Status { code: Unknown, error_message: "", binary_error_details: b"" }) error. However, my application is still up and running.
fwiw, I was using t3.small workers and I did see an OOM error on one of my nodes, causing application to be unavailable. But the node recovered. I assume you are using bigger machines than mine?
The connect: connection refused errors you saw might just be start up errors. I saw them on GKE and Minikube too when the control plane was first deployed. And yes, definitely seeing many error fetching profile warnings on other platforms too. But I don't think they are related to your problem.
Are all your meshed services unreachable?
Hi,
I was able to reach from ingress the service but I was not able to call an external service. By external, I mean another service running on Kubernetes. However, I was able to call this service from another service that was not meshed by linkerd.
I will retry later today or tomorrow with a fresh install of linkerd. So you know why we are getting all these errors
It seems at least possible that this issue is related to https://github.com/linkerd/linkerd2/issues/2111. I think it probably makes sense for us to spend some more time on #2111 first to see if that bug could explain the behavior reported in this issue.
Interesting. We have grpc service also. I will try to mesh only the services that are not using grpc to see if I am still getting the errors
1) I injected linkerd inside a service and I am getting the same error in the linkerd-proxy which is
WARN admin={bg=resolver} linkerd2_proxy::control::destination::background::destination_set Destination.Get stream errored for NameAddr { name: DnsName(DNSName("cs-ch-domain-manager-v1.content-hub-test.svc.cluster.local.")), port: 8080 }: Grpc(Status { code: Unknown, error_message: "", binary_error_details: b"" })
2) Inside the injected container, I am trying to call another service and I am getting the following error
webmaster@cs-ch-gateway-adapter:/app$ curl http://cs-ch-domain-manager-v1:8080/service/security/clients -v
* Trying 172.20.1.80...
* TCP_NODELAY set
* Connected to cs-ch-domain-manager-v1 (172.20.1.80) port 8080 (#0)
> GET /service/security/clients HTTP/1.1
> Host: cs-ch-domain-manager-v1:8080
> User-Agent: curl/7.62.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< content-length: 0
< date: Wed, 23 Jan 2019 00:07:22 GMT
<
* Connection #0 to host cs-ch-domain-manager-v1 left intact
The error 500 is not returned by the service, it is returned by linkerd-proxy
3) If I am calling the same service through the external route53 - I am getting the response.
4) Other services can communicate with the linkerd injected service. So, only outbound connections are affected
This does _not_ appear to be related to issue #2111; that issue was caused by a client sending requests that were not standards-compliant.
Ok, let me know if I can help on anything to fix this issue
@jmirc just to clarify, you only began seeing this error _after_ upgrading to edge-19.1.2? What version were you on previously?
Yes. I didn't have this issue with the previous version which was edge-19.1.1.
I will retry to install the version edge-19.1.1 this evening to see if I am getting the issue.
Thanks @jmirc! That narrows the search space considerably. ๐
I can confirm that the warnings in the linkerd-proxy container started appearing as of the edge-19.1.2 release.
If I install the edge-19.1.1 release and open the dashboard, in the controller's linkerd-proxy logs, I see:
$ kubectl -n linkerd logs deploy/linkerd-controller linkerd-proxy --timestamps
2019-01-23T20:43:46.8655514Z INFO linkerd2_proxy::app::main using controller at Some(Name(NameAddr { name: DnsName(DNSName("localhost.")), port: 8086 }))
2019-01-23T20:43:46.8656894Z INFO linkerd2_proxy::app::main routing on V4(127.0.0.1:4140)
2019-01-23T20:43:46.8670232Z INFO linkerd2_proxy::app::main proxying on V4(0.0.0.0:4143) to None
2019-01-23T20:43:46.8670578Z INFO linkerd2_proxy::app::main serving Prometheus metrics on V4(0.0.0.0:4191)
2019-01-23T20:43:46.8670856Z INFO linkerd2_proxy::app::main protocol detection disabled for inbound ports {25, 3306}
2019-01-23T20:43:46.8672108Z INFO linkerd2_proxy::app::main protocol detection disabled for outbound ports {25, 3306}
...
Whereas if I install the edge-19.1.2 release and open the dashboard, in the controller's linkerd-proxy logs, I see:
$ kubectl -n linkerd logs deploy/linkerd-controller linkerd-proxy --timestamps
2019-01-23T21:02:39.7247423Z INFO linkerd2_proxy::app::main using controller at Some(Name(NameAddr { name: DnsName(DNSName("localhost.")), port: 8086 }))
2019-01-23T21:02:39.724812Z INFO linkerd2_proxy::app::main routing on V4(127.0.0.1:4140)
2019-01-23T21:02:39.7248405Z INFO linkerd2_proxy::app::main proxying on V4(0.0.0.0:4143) to None
2019-01-23T21:02:39.7248669Z INFO linkerd2_proxy::app::main serving Prometheus metrics on V4(0.0.0.0:4191)
2019-01-23T21:02:39.7248925Z INFO linkerd2_proxy::app::main protocol detection disabled for inbound ports {25, 3306}
2019-01-23T21:02:39.7249283Z INFO linkerd2_proxy::app::main protocol detection disabled for outbound ports {25, 3306}
...
2019-01-23T21:03:01.3305407Z WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
2019-01-23T21:03:07.338784Z WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
2019-01-23T21:03:11.4398961Z WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-prometheus.linkerd.svc.cluster.local:9090: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
2019-01-23T21:03:11.4554226Z WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
2019-01-23T21:03:12.0799139Z WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-controller-api.linkerd.svc.cluster.local:8085: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
I can confirm too. just installed the previous version (edge-19.1.1) and everything works.
INFO linkerd2_proxy::app::main using controller at Some(Name(NameAddr { name: DnsName(DNSName("linkerd-proxy-api.linkerd.svc.cluster.local")), port: 8086 }))
INFO linkerd2_proxy::app::main routing on V4(127.0.0.1:4140)
INFO linkerd2_proxy::app::main proxying on V4(0.0.0.0:4143) to None
INFO linkerd2_proxy::app::main serving Prometheus metrics on V4(0.0.0.0:4191)
INFO linkerd2_proxy::app::main protocol detection disabled for inbound ports {25, 3306}
INFO linkerd2_proxy::app::main protocol detection disabled for outbound ports {25, 3306}
Excellent, thanks again for confirming that for me. I have some theories about possible causes of this issue and I'm looking into them.
Interesting, I've been testing with a fresh linkerd install, and I'm thus far only seeing the
error fetching profile for ...: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
log line, not the
Destination.Get stream errored for NameAddr { ... }: Grpc(Status { code: Unknown, error_message: "", binary_error_details: b"" })
error. I believe the "error fetching profile" warning is relatively benign, and only the failure to get destinations should result in the proxy returning 500 errors.
@hawkw To date with the latest edge version, I've only see the Destination.Get stream errored for NameAddr error in EKS, not GKE or Minikube, after leaving the app running for a few hours. But even with the errors, the emojivoto application seems to work just fine.
@ihcsim _Iiiiiiinteresting!_ I've been testing exclusively in GKE, but I note that @jmirc is also running in EKS. Issue #2115, which AFAICT is just the timeout error, was reported on Azure.
to add, I've also seen (in azure and starting with edge-19.1.2. edge-19.1.1 was fine)
Destination.Get stream errored for NameAddr { ... }: Grpc(Status { code: Unknown, error_message: "", binary_error_details: b"" })
my thoughts are that the error fetching profile is benign but the ingress controller starts returning 500s once Destination.Get stream errored shows up in the logs. The only way to recover is to restart the ingress controller pods
to note: we only have http rest services, no grpc
thanks @jon-walton, that confirms what I've been seeing.
the issue isn't specific to gRPC services, as the proxy itself uses gRPC to talk to the control plane's service discovery API.
FWIW, I've installed edge 19.1.2 on an Azure AKS cluster and am seeing the same kinds of errors.
Some notes:
linked namespace about a week ago.linkerd install | kubectl apply -f - commandHighlights from the controller logs:
linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for some-service-endpointpublic-api E0124 17:53:25.043216 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Pod (and v1.AnythingElse)Full result of linkerd logs --control-plane-component controller are in thie attached file.
I did a cleanup as thorough as I could think of:
k delete ns linkerd
k delete clusterrolebinding linkerd-linkerd-controller
k delete clusterrolebinding linkerd-linkerd-prometheus
k delete clusterrole linkerd-linkerd-controller
k delete clusterrole linkerd-linkerd-prometheus
k delete crd serviceprofiles.linkerd.io
And installed edge-19.1.2 again (linkerd install | kubectl apply -f -)
Now, I'm not seeing the "error fetching profile" errors anymore, just the "Failed to list *v1.Pod/Deployment/..." errors)
I've just tried with stable-2.1.0, and I see the same errors:
k logs linkerd-controller-55cf8bd769-pfbw6 -n linkerd -c public-api
time="2019-01-24T18:35:50Z" level=info msg="running version stable-2.1.0"
time="2019-01-24T18:35:50Z" level=info msg="starting admin server on :9995"
time="2019-01-24T18:35:50Z" level=info msg="starting HTTP server on :8085"
time="2019-01-24T18:35:50Z" level=info msg="waiting for caches to sync"
E0124 18:35:50.653890 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta2.Deployment: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/apis/apps/v1beta2/deployments?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:50.653890 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.ReplicationController: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:50.653958 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Service: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:50.654014 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta2.ReplicaSet: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/apis/apps/v1beta2/replicasets?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:50.654092 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Pod: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:51.662692 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta2.ReplicaSet: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/apis/apps/v1beta2/replicasets?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:51.662692 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.ReplicationController: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:51.662747 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Service: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:51.662792 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Pod: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:51.663056 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta2.Deployment: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/apis/apps/v1beta2/deployments?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:52.669449 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta2.Deployment: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/apis/apps/v1beta2/deployments?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:52.669456 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta2.ReplicaSet: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/apis/apps/v1beta2/replicasets?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:52.669506 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Service: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:52.669549 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.ReplicationController: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:52.669558 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Pod: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:53.677299 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta2.Deployment: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/apis/apps/v1beta2/deployments?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:53.677299 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.ReplicationController: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:53.677385 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Service: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:53.677439 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Pod: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:53.677690 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1beta2.ReplicaSet: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/apis/apps/v1beta2/replicasets?limit=500&resourceVersion=0: dial tcp 52.237.10.158:443: connect: connection refused
E0124 18:35:54.991160 1 reflector.go:205] k8s.io/client-go/informers/factory.go:130: Failed to list *v1.Pod: Get https://studyo-16fb7fc6.hcp.canadacentral.azmk8s.io:443/api/v1/pods?limit=500&resourceVersion=0: EOF
time="2019-01-24T18:35:56Z" level=info msg="caches synced"
@bourquep Thanks for the additional info. I just wanted to chime in and say that those "connection refused" messages that appear prior to the "caches synced" message are (unfortunately) expected. They're a result of the public-api trying to query the kubernetes API before the linkerd-proxy container in the same pod is ready to serve requests. They eventually succeed if you see the "caches synced" message.
For more context, we use k8s.io/client-go to query the kubernetes API, and that package uses glog to log errors when the API is unreachable, before retrying. We would be better off suppressing all of the glog logs, but we have to redirect them to stderr, due to all of the reasons mentioned in kubernetes/kubernetes#61006. Kubernetes recently swapped out glog with it's own fork (called klog ๐) that is apparently more configurable. So it's possible that by updating to a more recent version of client-go we could suppress those message, but we haven't gotten around to it yet.
Aahh, thanks for clarifying that. :)
@bourquep Similar to @klingerf's previous reply, those profiles error fetching profile for some-service-endpoint messages coming from linkerd-proxy are also benign. They indicate that a service profile has not been installed for a service, which is default behavior unless you have explicitly added service profiles. We're working on a fix to make those log lines look less like errors.
A little more info on the error fetching profile for errors. I observed these running locally with edge-19.1.2, and then they went away once I installed service profiles for the control plane...
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-prometheus.linkerd.svc.cluster.local:9090: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-prometheus-74d66f86f6-cc9zv linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-prometheus.linkerd.svc.cluster.local:9090: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-prometheus.linkerd.svc.cluster.local:9090: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-prometheus-74d66f86f6-cc9zv linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-prometheus.linkerd.svc.cluster.local:9090: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-prometheus.linkerd.svc.cluster.local:9090: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-prometheus-74d66f86f6-cc9zv linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-prometheus.linkerd.svc.cluster.local:9090: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-prometheus.linkerd.svc.cluster.local:9090: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-proxy-api.linkerd.svc.cluster.local:8086: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-prometheus-74d66f86f6-cc9zv linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for linkerd-prometheus.linkerd.svc.cluster.local:9090: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd install-sp | kubectl apply -f -
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
linkerd linkerd-controller-b8c4c48c8-x5n4d linkerd-proxy WARN linkerd2_proxy::app::profiles error fetching profile for localhost:6443: Inner(Upstream(Inner(Inner(Error { kind: Timeout(3s) }))))
We've narrowed the _error fetching profile_ warning down to a recent controller change. Though, my understanding is that this warning should not indicate any functional problem for traffic. However, I believe folks have reported seeing communication fail in this situation? Is that true?
The errors related to the Destination.Get stream errored for NameAddr are more concerning, as this implies an error is occurring in an unrelated API path, though.
The edge-19.1.3 release that just went out contains a few changes that should be relevant here: first, we've fixed the issue which was causing the (benign) error fetching profile ... log spam, and second (more importantly), we've improved the diagnostics for gRPC errors --- it should hopefully be much easier to now to track down where the Destination.Get error is coming from on the new version.
I have 45 minutes before my son's hockey game starts, installing now! :)
thanks @bourquep โค๏ธ
Hey, I'm in the middle of nowhere, nothing else to do. :)
https://www.google.com/maps/@46.5172089,-72.7554464,12z
@bourquep not too far from me. I am in Montreal ;) I am starting to test this version
@jmirc Je suis sur la rive-sud de Mtl, mais en tournoi de hockey ร Shawi :)
I finished my test and everything works as expected. No more errors in the log of linkerd-proxy.
My service is able to connect external service.
This new version has fixed all the problems I had previously.
@hawkw sounds like we can close this one out?
@grampelberg I'd prefer to hear back from @bourquep and @jon-walton before closing this, to confirm that the issue has been resolved for everyone affected.
So far so good on my side!
FWIW, I spun up an AKS test cluster with edge-19.1.3 last night, and after 17 hours, I haven't seen any failures either.
Yep, 19.1.3 solved all the issues I was having with 19.1.2, awesome :+1:
That's great to hear! :D
Most helpful comment
I finished my test and everything works as expected. No more errors in the log of linkerd-proxy.
My service is able to connect external service.
This new version has fixed all the problems I had previously.