Dashboard: i/o timeout on heapster metrics

Created on 27 Jan 2017 · 14Comments · Source: kubernetes/dashboard

Issue details

I am deploying kube-dash and heapster/influxdb on my on-premise Kubernetes cluster (1 master, 6 nodes). I got heapster deployed with influxdb and am able to query it just fine (both from the internal cluster endpoint and the service nodeport routed through kube-proxy). The issue is that the kubernetes dashboard is having timeouts when trying to access it, which makes the dashboard not just incredibly slow but also reduces the functionality of it by a lot.

Error from kubedash logs:

[2017-01-27T16:54:33Z] Incoming HTTP/1.1 GET /api/v1/pod/kube-system?itemsPerPage=10&page=1 request from 172.16.32.128:56826
Getting list of all pods in the cluster
Getting pod metrics
Skipping Heapster metrics because of error: an error on the server ("Error: 'dial tcp 172.16.40.138:8082: i/o timeout'\nTrying to reach: 'http://172.16.40.138:8082/api/v1/model/namespaces/kube-system/pod-list/centos-deployment-275141233-7yecd,elastickube-mongo-ny0eh,elastickube-server-o8psw,heapster-3in4w,influxdb-grafana-y8isc,kube-dns-v9-0fpvq,kube-dns-v9-4swto,kube-dns-v9-zbupd,kube2consul-m3vcr,kubernetes-dashboard-2929884197-cf349,nginx-deployment-2947857529-ugcst,ubuntu-deployment-4001649334-ugymt/metrics/cpu/usage_rate'") has prevented the request from succeeding (get services heapster)

Proof that that endpoint is actually accessible from within Kubernetes (using a dedicated curl container):

kubectl logs curl-deployment-675076690-s2pac --namespace=kube-system
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
{"items":[{"metrics":[{"timestamp":"2017-01-27T16:54:00Z","value":0},{"timestamp":"2017-01-27T17:01:00Z","value":0},{"timestamp":"2017-01-27T17:03:00Z","value":0},{"timestamp":"2017-01-27T17:04:00Z","value":0}],"latestTimestamp":"2017-01-27T17:04:00Z"},{"metrics":[{"timestamp":"2017-01-27T16:50:00Z","value":5},{"timestamp":"2017-01-27T16:51:00Z","value":6},{"timestamp":"2017-01-27T16:52:00Z","value":5},{"timestamp":"2017-01-27T16:53:00Z","value":5},{"timestamp":"2017-01-27T16:54:00Z","value":5},{"timestamp":"2017-01-27T16:55:00Z","value":6},
...
{"timestamp":"2017-01-27T16:51:00Z","value":0},{"timestamp":"2017-01-27T16:52:00Z","value":0},{"timestamp":"2017-01-27T16:53:00Z","value":0},{"timestamp":"2017-01-27T16:55:00Z","value":0},{"timestamp":"2017-01-27T16:56:00Z","value":0},{"timestamp":"2017-01-27T16:57:00Z","value":0},{"timestamp":"2017-01-27T16:58:00Z","value":0},{"timestamp":"2017-01-27T17:00:00Z","value":0},{"timestamp":"2017-01-27T17:02:00Z","value":0},{"timestamp":"2017-01-27T17:03:00Z","value":0}],"latestTimestamp":"2017-01-27T17:03:00Z"}]}
100 8513 0 8513 0 0 2911k 0 --:--:-- --:--:-- --:--:-- 4156k

Environment

Dashboard version: 1.5.1
Kubernetes version: 1.4.8
Operating system: Centos 7.2
Node.js version: na
Go version: na

Steps to reproduce

Deploy heapster 1.2 w/influx db and node port load balancer service
Deploy kubedash 1.5.1
Hit the kubedash ui pod section where metrics would normally be gathered.

Observed result

Observed results are that it is not able to communicate with the heapster endpoint.

Expected result

To be able to hit the internal heapster endpoint.

Comments

Source

bh016088

Most helpful comment

It has to do with which rest client KubeDash uses. It makes that determination depending on where heapster is hosted. I don't know how to fix it in KubeDash but I was able to get around it by specifying my heapster host manually.

Example manifest snippet:
spec:
containers:
- name: kubernetes-dashboard
image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.5.1
imagePullPolicy: Always
ports:
- containerPort: 9090
protocol: TCP
args:
# Uncomment the following line to manually specify Kubernetes API server Host
# If not specified, Dashboard will attempt to auto discover the API server and connect
# to it. Uncomment only if the default does not work.
- --apiserver-host=http://:8080
- --heapster-host=http://heapster
livenessProbe:
httpGet:
path: /
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30

Specifically the line: "- --heapster-host=http://heapster"

See this for the different rest client functions: https://github.com/kubernetes/dashboard/blob/1ee771c19ed387493b372a4e7379201cafb61d93/src/app/backend/client/heapsterclient.go

bmhkb4 on 2 Mar 2017

🎉2 👍2 😄1

All 14 comments

Is the connection attempt made from the kubemaster by chance? I don't have it as part of the kube node networking cluster so therefore it can't resolv the internal cluster endpoint IP/port. I was just looking at heapster documentation here:

https://github.com/kubernetes/heapster/blob/master/docs/debugging.md

bh016088 on 27 Jan 2017

Same issue also reported here https://github.com/kubernetes/heapster/issues/1477

stege on 1 Mar 2017

I can confirm that heapster is reachable from any worker. So a curl http://10.40.0.2:8082/metrics on a worker returns some data. The same on the master runs into a timout. Any ideas why?

stege on 1 Mar 2017

Specifically the line: "- --heapster-host=http://heapster"

See this for the different rest client functions: https://github.com/kubernetes/dashboard/blob/1ee771c19ed387493b372a4e7379201cafb61d93/src/app/backend/client/heapsterclient.go

bmhkb4 on 2 Mar 2017

🎉2 👍2 😄1

great this worked for me! thanks.

stege on 2 Mar 2017

By default if heapster-host is not specified we are trying to use in-cluster config. It means that it looks for heapster service inside the cluster and tries to connect through service proxy. In order for that to work cluster networking has to be configured properly (dns, possibly network overlay). Other option is to specify heapster-host manually in the dashboard yaml file.

If cluster configuration was the issue then can we close?

floreks on 2 Mar 2017

I installed the cluster using kubeadm and weave overlay network. Other services work fine. So I'm not sure if it is a cluster setup issue. I just followed the instructions and by default it did not work.

stege on 2 Mar 2017

@floreks My heapster host is in fact in cluster and accessible through normal in-cluster endpoints. I use flanneld as my overlay network and use in cluster dns (which works). There is something bugged about the in cluster rest call which I'm not smart enough to figure out.

bmhkb4 on 2 Mar 2017

I was having the same problem and I just solved it, here it goes my case:

When the dashboard tries to get heapster metrics it calls something like:
http://MASTER_HOST:MASTER_API_PORT/api/v1/proxy/namespaces/kube-system/services/heapster/

Then it is the master trying to call the metrics (that is a guess of mine).

So, it is the master who has to have access to heapster, not the node or the pods.

To do that I configured flanneld also on the master and, VERY IMPORTANT, in the node I added those 2 rules to iptables:

iptables -I FORWARD 1 -i flannel.1 -o docker0 -j ACCEPT
iptables -I FORWARD 1 -i docker0 -o flannel.1 -j ACCEPT

As I was working just with one node for testing purposes I didn't notice those two rules weren't there, but I would if I had add more nodes I suppose.

This is MY case, maybe yours is different, but this is what I did to solve it and now it works perfectly, and the dashboard is now really usable in terms of speed. Love it!

dcodix on 6 Mar 2017

👍2

@bmhkb4 It's worked for me!

blademainer on 21 Mar 2017

@floreks I am also bh016088 (the op...that is my work account) and yes, you can close. I would simply suggest you take away the dependency for the flannel overlay network to be hosted on the master. The call from kube-dash to heapster should come from the kube-dash container (living within the overlay network). Thanks guys!

bmhkb4 on 21 Mar 2017

We do not have any direct dependencies for any overlay network. For in-cluster heapster configuration we are assuming that heapster is running inside the cluster and is accessible by using service proxy on path api/v1/proxy/namespaces/kube-system/services/heapster/api/v1/....

Only requirement is that cluster networking is configured properly. You should not need to manually adapt iptables. kube-proxy component should do it for you. I have configured cluster couple of times and did not have to do that.

floreks on 21 Mar 2017

@dcodix I was having the same problem and I just solved it thanks a lot

JaeGerW2016 on 15 Nov 2017

Just in case it may help other people. I was having a similar issue with slow dashboard after heapster deployment.
My problem was that proxy was set at /etc/kubernetes/manifests/kube-apiserver.yaml . It looks like kubeadm updates it with what is set on /etc/environments.
I removed the proxy from kube-apiserver.yaml and all works fine.