Kubernetes: GKE 1.6 kubelet /metrics endpoint unauthorized over https

Created on 11 Apr 2017  路  16Comments  路  Source: kubernetes/kubernetes

BUG REPORT

After upgrading a GKE cluster from 1.5.6 to 1.6.0 Prometheus stopped to scrape the node /metrics endpoint due to a 401 unauthorized error.

This is likely due to RBAC being enabled. In order to give Prometheus access to the node metrics I added the following ClusterRole and ClusterRoleBinding and created a dedicated service account that is used by the pod.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""] # "" indicates the core API group
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring

Although the mounted token is now the one for the _prometheus_ service account - verified at https://jwt.io/ - it can't get access to the node metrics (they're served by the kubelet, right?).

If I execute the following command it returns the 401 Unauthorized

KUBE_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://<node ip>:10250/metrics

Any tips how to get to the bottom and figure out what's needed to get this to work? I already looked at the issue with Prometheus contributors via ticket https://github.com/prometheus/prometheus/issues/2606 but as the curl doesn't work either it's probably not a Prometheus issue.

Kubernetes version

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:36:33Z", GoVersion:"go1.7.5
", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.0", GitCommit:"fff5156092b56e6bd60fff75aad4dc9de6b6ef37", GitTreeState:"clean", BuildDate:"2017-03-28T16:24:30Z", GoVersion:"go1.7.5
", Compiler:"gc", Platform:"linux/amd64"}

Environment:

clusterIpv4Cidr: 10.248.0.0/14
createTime: '2016-11-14T19:26:49+00:00'
currentMasterVersion: 1.6.0
currentNodeCount: 14
currentNodeVersion: 1.6.0
endpoint: **REDACTED**
initialClusterVersion: 1.4.5
instanceGroupUrls:
- **REDACTED**
locations:
- europe-west1-c
loggingService: logging.googleapis.com
masterAuth:
  clientCertificate: **REDACTED**
  clientKey: **REDACTED**
  clusterCaCertificate: **REDACTED**
  password: **REDACTED**
  username: **REDACTED**
monitoringService: monitoring.googleapis.com
name: development-europe-west1-c
network: development
nodeConfig:
  diskSizeGb: 250
  imageType: COS
  machineType: n1-highmem-8
  oauthScopes:
  - https://www.googleapis.com/auth/compute
  - https://www.googleapis.com/auth/devstorage.read_only
  - https://www.googleapis.com/auth/service.management
  - https://www.googleapis.com/auth/servicecontrol
  - https://www.googleapis.com/auth/logging.write
  - https://www.googleapis.com/auth/monitoring
  serviceAccount: default
nodeIpv4CidrSize: 24

What happened:

With a ClusterRole configured I would expect to be able to scrape the /metrics endpoint on each node, but this fails with 401 Unauthorized.

What you expected to happen:

The service account token with appropriate ClusterRole to be able to give access to the /metrics endpoint.

How to reproduce it (as minimally and precisely as possible):

  • create namespace, serviceaccount, clusterrole, clusterrolebinding and deployment with linked serviceaccount
  • get the ip for one of the nodes
  • run KUBE_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token) and curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://<node ip>:10250/metrics from the container in your deployment

Anything else we need to know:

This failed with the default service account as well. Whereas I thought initially GKE would still be very liberal with it's access control settings.

Most helpful comment

Querying the same endpoint over _http_ to port _10255_ actually works. Any idea why there's a difference?

Could the cause be similar to https://github.com/coreos/coreos-kubernetes/issues/714 ?

All 16 comments

Querying the same endpoint over _http_ to port _10255_ actually works. Any idea why there's a difference?

Could the cause be similar to https://github.com/coreos/coreos-kubernetes/issues/714 ?

ref: #11816

GKE doesn't enable service account token authentication to the kubelet

cc @mikedanese @cjcullen

the resources and subresources used to authorize access to the kubelet API are documented at https://kubernetes.io/docs/admin/kubelet-authentication-authorization/#kubelet-authorization

to allow all kubelet API requests, you'd need a role like the one kube-up uses:
https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/rbac/kubelet-api-admin-role.yaml

GKE doesn't enable service account token auth to the kubelet

I'm fairly certain we do...

In your ClusterRole I think

- nodes

should be

- nodes
- nodes/metrics

Like this https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/rbac/kubelet-api-admin-role.yaml#L16

Your nonResourceURLs doesn't make sense.

Thanks, but even when creating the most permissive binding for my prometheus service account I get a 401 unauthorized when querying the kubelet /metrics endpoint with the service account token set as Bearer Token.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: permissive-binding-prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring

Running

curl -k --tlsv1.2 -H "Authorization: Bearer <service account token>" -v https://<node ip>:10250/metrics

returns

* About to connect() to **REDACTED** port 10250 (#0)
*   Trying <node ip>...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to **REDACTED** (**REDACTED**) port 10250 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* NSS: client certificate not found (nickname not specified)
* SSL connection using TLS_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*   subject: CN=**REDACTED**
*   start date: Apr 12 17:23:15 2017 GMT
*   expire date: Apr 12 17:23:15 2018 GMT
*   common name: **REDACTED**
*   issuer: CN=**REDACTED**
> GET /metrics HTTP/1.1
> User-Agent: curl/7.29.0
> Host: **REDACTED**:10250
> Accept: */*
> Authorization: Bearer **REDACTED**
> 
< HTTP/1.1 401 Unauthorized
< Date: Thu, 13 Apr 2017 03:05:30 GMT
< Content-Length: 12
< Content-Type: text/plain; charset=utf-8
< 
{ [data not shown]
100    12  100    12    0     0     59      0 --:--:-- --:--:-- --:--:--    59
* Connection #0 to host **REDACTED** left intact

If GKE is using the GCE cluster up scripts, it isn't enabling service account token authentication:

https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L699

to authenticate to the kubelet with API tokens, these steps would be needed (from https://kubernetes.io/docs/admin/kubelet-authentication-authorization/#kubelet-authentication):

  • ensure the authentication.k8s.io/v1beta1 API group is enabled in the API server
  • start the kubelet with the --authentication-token-webhook, --kubeconfig, and --require-kubeconfig flags
  • the kubelet calls the TokenReview API on the configured API server to determine user information from bearer tokens

a 401 indicates you are not authenticated. you are not even reaching the authorization stage.

By the way, when querying https:///api/v1 I only see nodes, nodes/proxy and nodes/status.

No nodes/metrics, nodes/log, nodes/stats nor nodes/spec.

I'll inspect the kubelet startup params and escalate to Google.

Those virtual subresources are used by the kubelet to perform authorization checks when speaking directly to the kubelet API in order to allow granting access to part of the kubelet's API

Ahhhh, ya that's not going to work. We don't plan on enabling token review API in GKE. You can either configure prometheus to pull metrics by hitting the apiserver proxy directly or you can create a client certificate using the certificates API for prometheus to use when contacting kubelets.

@mikedanese is there a particular reason to deviate from the more default Kubernetes setup where kubelet uses RBAC? Does it provide it more security? Is it because Google takes care of the master?

@JorritSalverda the problem is the integration with google oauth. Access tokens need to have UserInfo and GroupInfo scopes in order for us to fill out the full kubernetes UserInfo object. These scopes say that google is allowed to give your email and group info out to people with this token. Generally the tokens that we see in GCP do not have this scope. It's possible that we could enable the token API for some but not all tokens.

I've describe how I got this to work for Prometheus by proxy'ing through the API in GKE at https://github.com/prometheus/prometheus/issues/2606#issuecomment-294869099.

I'll close this ticket, because this provides a nice and future-proof way to get to those metrics. The only drawback is that it will put slightly more load on the API server.

Take a look to the following parameter in kubelet exporter:
https://github.com/coreos/prometheus-operator/blob/master/helm/exporter-kubelets/values.yaml#L2
Hope it helps

Was this page helpful?
0 / 5 - 0 ratings

Related issues

errordeveloper picture errordeveloper  路  3Comments

Seb-Solon picture Seb-Solon  路  3Comments

ttripp picture ttripp  路  3Comments

rhohubbuild picture rhohubbuild  路  3Comments

jadhavnitind picture jadhavnitind  路  3Comments