Linkerd2: Getting started demo does not work

Created on 4 Dec 2018  ยท  17Comments  ยท  Source: linkerd/linkerd2

Bug Report

What is the issue?

Following the getting started demo steps doesn't work as expected on docker for mac edge kubernetes

How can it be reproduced?

Follow the steps as is from here https://linkerd.io/2/getting-started/

Logs, error output, etc

$ kubectl -n linkerd logs -f prometheus-66fb47b7d6-q6phq prometheus

level=error ts=2018-12-04T21:31:53.943962383Z caller=main.go:234 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:306: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/namespaces/linkerd/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused"

$ kubectl -n linkerd logs -f controller-6b4d8db598-qxpcb public-api
time="2018-12-04T21:59:29Z" level=error msg="Query(max(process_start_time_seconds{}) by (pod, namespace)) failed with: Get http://prometheus.linkerd.svc.cluster.local:9090/api/v1/query?query=max%28process_start_time_seconds%7B%7D%29+by+%28pod%2C+namespace%29: dial tcp 127.0.0.1:9090: connect: connection refused"
time="2018-12-04T22:32:31Z" level=error msg="Query(max(process_start_time_seconds{}) by (pod, namespace)) failed with: Get http://prometheus.linkerd.svc.cluster.local:9090/api/v1/query?query=max%28process_start_time_seconds%7B%7D%29+by+%28pod%2C+namespace%29: dial tcp 127.0.0.1:9090: connect: connection refused"

linkerd check output

kubernetes-api: can initialize the client..................................[ok]
kubernetes-api: can query the Kubernetes API...............................[ok]
kubernetes-api: is running the minimum Kubernetes API version..............[ok]
linkerd-api: control plane namespace exists................................[ok]
linkerd-api: control plane pods are ready..................................[ok]
linkerd-api: can initialize the client.....................................[ok]
linkerd-api: can query the control plane API...............................[ok]
linkerd-api[kubernetes]: control plane can talk to Kubernetes..............[ok]
linkerd-api[prometheus]: control plane can talk to Prometheus..............[FAIL] -- Error calling Prometheus from the control plane: Get http://prometheus.linkerd.svc.cluster.local:9090/api/v1/query?query=max%!p(MISSING)rocess_start_time_seconds%!B(MISSING)%!D(MISSING)%!+(MISSING)by+%!p(MISSING)od%!C(MISSING)+namespace%!:(MISSING) dial tcp 127.0.0.1:9090: connect: connection refused

Status check results are [FAIL]

Environment

  • Kubernetes Version:
โžœ  ~ kubectl version --short
Client Version: v1.10.3
Server Version: v1.10.3
  • Cluster Environment: (GKE, AKS, kops, ...)
    Docker for mac edge

  • Host OS: Mac OS X - High Sierra 10.13.4

  • Linkerd version:

โžœ  ~ linkerd version
Client version: stable-2.0.0
Server version: stable-2.0.0

Possible solution

ยฏ_(ใƒ„)_/ยฏ

Additional context

Looks like Linkerd is having trouble accessing the Kubernetes HTTP API and accessing Prometheus either because Prometheus pods aren't set up right or the DNS entry is messed - most likely the former

I have tried with a fresh install of docker edge and I get the same thing each time

bug needmore

Most helpful comment

ok. by seding the install output to change cluster.local to my cluster suffix, and then piping it into kubectl, it now works as expected.

$ linkerd check                                                                                                                                     
kubernetes-api
--------------
โˆš can initialize the client
โˆš can query the Kubernetes API

kubernetes-version
------------------
โˆš is running the minimum Kubernetes API version

linkerd-existence
-----------------
โˆš control plane namespace exists
โˆš controller pod is running
โˆš can initialize the client
โˆš can query the control plane API

linkerd-api
-----------
โˆš control plane pods are ready
โˆš can query the control plane API
โˆš [kubernetes] control plane can talk to Kubernetes
โˆš [prometheus] control plane can talk to Prometheus

linkerd-service-profile
-----------------------
โˆš no invalid service profiles

linkerd-version
---------------
โˆš can determine the latest version
โˆš cli is up-to-date

control-plane-version
---------------------
โˆš control plane is up-to-date
โˆš control plane and cli versions match

Status check results are โˆš

All 17 comments

@asad-awadia any more details? I just tested on a fresh 2.0.0.0-mac82 (29268) (k8s v1.10.3) and everything worked.

There are some connection refused lines, but those are expected as the sidecar proxy starts up.

@grampelberg any other commands you want me to run and paste the output? I don't know what else might be relevant

How are you installing linkerd? What is the output of:

kubectl get componentstatuses,configmaps,endpoints,limitranges,namespaces,nodes,persistentvolumeclaims,persistentvolumes,pods,podtemplates,replicationcontrollers,resourcequotas,secrets,serviceaccounts,services,mutatingwebhookconfigurations,validatingwebhookconfigurations,customresourcedefinitions,apiservices,controllerrevisions,daemonsets,deployments,replicasets,statefulsets,horizontalpodautoscalers,cronjobs,jobs,certificatesigningrequests,stacks,daemonsets,deployments,ingresses,networkpolicies,podsecuritypolicies,replicasets,networkpolicies,poddisruptionbudgets,podsecuritypolicies,clusterrolebindings,clusterroles,rolebindings,roles,storageclasses,volumeattachments --all-namespaces

@grampelberg I am installing it with the steps listed on the getting started page

11384  curl -sL https://run.linkerd.io/install | sh\n
11385  export PATH=$PATH:$HOME/.linkerd2/bin\n
11386  linkerd version\n
11387  linkerd check --pre\n
11388  linkerd install | kubectl apply -f -\n
11389  linkerd check\n

Output of kubectl get ...

That all looks right to me....

What does check look like after you've waited for awhile? Can you run a pod in the linkerd namespace and hit the API server (kubectl would be great)?

What does check look like after you've waited for awhile?

โžœ  ~ linkerd check
kubernetes-api: can initialize the client..................................[ok]
kubernetes-api: can query the Kubernetes API...............................[ok]
kubernetes-api: is running the minimum Kubernetes API version..............[ok]
linkerd-api: control plane namespace exists................................[ok]
linkerd-api: control plane pods are ready..................................[ok]
linkerd-api: can initialize the client.....................................[ok]
linkerd-api: can query the control plane API...............................[ok]
linkerd-api[kubernetes]: control plane can talk to Kubernetes..............[ok]
linkerd-api[prometheus]: control plane can talk to Prometheus..............[FAIL] -- Error calling Prometheus from the control plane: Get http://prometheus.linkerd.svc.cluster.local:9090/api/v1/query?query=max%!p(MISSING)rocess_start_time_seconds%!B(MISSING)%!D(MISSING)%!+(MISSING)by+%!p(MISSING)od%!C(MISSING)+namespace%!:(MISSING) dial tcp 127.0.0.1:9090: connect: connection refused

Status check results are [FAIL]

Can you run a pod in the linkerd namespace

Yes

and hit the API server (kubectl would be great)?

Assuming I understand what you are saying correctly - No I cannot

/usr/src/app # KUBE_TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token)
/usr/src/app # curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" \
>       https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api/v1/namespaces/linkerd/pods/$HOSTNAME

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "pods \"hello-kubernetes-548c65944-d8l5n\" is forbidden: User \"system:anonymous\" cannot get pods in the namespace \"linkerd\"",
  "reason": "Forbidden",
  "details": {
    "name": "hello-kubernetes-548c65944-d8l5n",
    "kind": "pods"
  },
  "code": 403

Have you tried deleting the prometheus and controller pods in the linkerd namespace?

Yes - the pods restart but linkerd check gives the same error

linkerd-api[prometheus]: control plane can talk to Prometheus..............[FAIL] -- Error calling 
Prometheus from the control plane: Get http://prometheus.linkerd.svc.cluster.local:9090/api/v1/query?
query=max%!p(MISSING)rocess_start_time_seconds%!B(MISSING)%!D(MISSING)%!+
(MISSING)by+%!p(MISSING)od%!C(MISSING)+namespace%!:(MISSING) dial tcp 127.0.0.1:9090: 
connect: connection refused

@asad-awadia We released Linkerd stable 2.1 last week. Would you mind trying that out to see if it fixes the issue you were seeing with 2.0 in your environment?

Still happening in version 2.1

@AnujKS i had same issue and root cause in my case was the cluster name was not the default. Resolved by first putting linkerd install > into a filename such as linkerd install > l5d-manifest.yml then sed replacing all cases of cluster.local with my actual cluster name. sed -i t5d-manifest.yml 's/cluster.local/mycluster.io/'

I've installed the latest edge release -- edge-19.1.3 -- on docker for mac, and it's working as expected. Can folks who are still experiencing this issue please try installing edge-19.1.3, and uploading the output of running linkerd logs?

I've just found my way here and am experiencing the same problem.

trying edge-19.1.3 I get the following output from linkerd check

$ linkerd check
kubernetes-api
--------------
โˆš can initialize the client
โˆš can query the Kubernetes API

kubernetes-version
------------------
โˆš is running the minimum Kubernetes API version

linkerd-existence
-----------------
โˆš control plane namespace exists
โˆš controller pod is running
โˆš can initialize the client
โˆš can query the control plane API

linkerd-api
-----------
โˆš control plane pods are ready
โˆš can query the control plane API
โˆš [kubernetes] control plane can talk to Kubernetes
ร— [prometheus] control plane can talk to Prometheus
    Error calling Prometheus from the control plane: Get http://linkerd-prometheus.linkerd.svc.cluster.local:9090/api/v1/query?query=max%!p(MISSING)rocess_start_time_seconds%!B(MISSING)%!D(MISSING)%!+(MISSING)by+%!p(MISSING)od%!C(MISSING)+namespace%!:(MISSING) dial tcp: lookup linkerd-prometheus.linkerd.svc.cluster.local on 10.3.0.10:53: no such host

Status check results are ร—

linkerd.log

That's using linkerd install --ha --image-pull-policy=Always --tls=optional --proxy-auto-inject | kubectl apply -f - to install, but I've tried with a vanilla install and get the same results.

Everything looks to be working ok.

$ k -n linkerd get pods                                      
NAME                                     READY   STATUS    RESTARTS   AGE
linkerd-ca-695544fb88-7mnpc              2/2     Running   0          19m
linkerd-controller-987b67747-n64nx       4/4     Running   0          7m29s
linkerd-grafana-8647775547-8thdj         2/2     Running   0          7m28s
linkerd-prometheus-6db7f49dc6-k9sr8      2/2     Running   0          7m28s
linkerd-proxy-injector-cc997fb65-lsprp   2/2     Running   1          19m
linkerd-web-74c8dc89fb-w9xxk             2/2     Running   0          7m29s

@pms1969 can you curl that endpoint from inside a container running in linkerd-controller?

@grampelberg this is what I get:

$ k -n linkerd exec -ti linkerd-controller-987b67747-n64nx -c linkerd-proxy -- sh                                                                   
$ nslookup linkerd-prometheus.linkerd.svc.cluster.local                           
Server:         10.3.0.10
Address:        10.3.0.10#53

** server can't find linkerd-prometheus.linkerd.svc.cluster.local: NXDOMAIN

$ nslookup linkerd-prometheus.linkerd.svc.myco.local       
Server:         10.3.0.10
Address:        10.3.0.10#53

Name:   linkerd-prometheus.linkerd.svc.myco.local
Address: 10.3.79.5

it's using a different cluster suffix than is expected by linkerd. Any way to change that?

ok. by seding the install output to change cluster.local to my cluster suffix, and then piping it into kubectl, it now works as expected.

$ linkerd check                                                                                                                                     
kubernetes-api
--------------
โˆš can initialize the client
โˆš can query the Kubernetes API

kubernetes-version
------------------
โˆš is running the minimum Kubernetes API version

linkerd-existence
-----------------
โˆš control plane namespace exists
โˆš controller pod is running
โˆš can initialize the client
โˆš can query the control plane API

linkerd-api
-----------
โˆš control plane pods are ready
โˆš can query the control plane API
โˆš [kubernetes] control plane can talk to Kubernetes
โˆš [prometheus] control plane can talk to Prometheus

linkerd-service-profile
-----------------------
โˆš no invalid service profiles

linkerd-version
---------------
โˆš can determine the latest version
โˆš cli is up-to-date

control-plane-version
---------------------
โˆš control plane is up-to-date
โˆš control plane and cli versions match

Status check results are โˆš

That explains it, please see #1720 and #1741. I'm going to close this out for now. Please open up a separate issue if you're still having troubles!

Was this page helpful?
0 / 5 - 0 ratings