Charts: [stable/prometheus-operator] admissionWebhooks fail with timeout

Created on 12 Aug 2019 · 16Comments · Source: helm/charts

Describe the bug
Prometheusrules don't pass the webhooks checks

Version of Helm and Kubernetes:

Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}

Which chart:
stable/prometheus-operator v6.4.3

What happened:
After upgrading to the latest version of the prometheus-operator chart when I release anything that has a prometheusrule fails when trying to pass the admissionControl webhooks.

Post https://prometheus-operator-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

What you expected to happen:
The rules are valid, should validate and the release should not fail.

How to reproduce it (as minimally and precisely as possible):
Run the latest version of the chart 6.4.3.

Anything else we need to know:

The cluster is running in GKE.
Each time a release that contains one or more prometheus rules is being deployed with Helm it fails with the message above.
I've checked and the webhooks are both there
I've tried curl-ing the same endpoints the webhook is trying to hit from an ubuntu container and it worked (except for the certificate not being trusted, but shouldn't be a problem as it's part of the client definition for the web hook). The example is just a GET, so I'm getting an error message, but the endpoint works.

root@my-shell-95cb5df57-f9tnp:/# curl https://prometheus-operator-operator.monitoring.svc:443/admission-prometheusrules/mutate -k
request has no body

Source

amartorelli

👍2

Most helpful comment

if you have run before without disabling the webhook, you must manually delete the following kinds:

kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io //delete all objects
kubectl get MutatingWebhookConfiguration //delete all objects

and after that run:

helm install --name prometheus-operator stable/prometheus-operator \
--set prometheusOperator.admissionWebhooks.enabled=false \
--set prometheusOperator.admissionWebhooks.patch.enabled=false \
--set prometheusOperator.tlsProxy.enabled=false

BahmaniAlireza on 10 Mar 2020

❤8

All 16 comments

The apiserver is the component making the request, so I'm wondering if you have something in your cluster that's preventing this from happening.

Googling around for GKE and admission hooks I've come across this article, indicating a firewall issue between the masters and regular nodes https://www.revsys.com/tidbits/jetstackcert-manager-gke-private-clusters/

You can simply disable the admission webhooks : prometheusOperator.admissionWebhooks.enabled=false

vsliouniaev on 12 Aug 2019

I have just done a test with a new GCP cluster and I don't see this behaviour, however, it's possible that on a cluster that was provisioned earlier this is still a problem. This issue appears to be related to what you are seeing, any chance you could validate it?
https://github.com/kubernetes/kubernetes/issues/79739

vsliouniaev on 12 Aug 2019

I suspect the issue is that you're running this in a private GKE cluster:

When Google configure the control plane for private clusters, they automatically configure VPC peering between your Kubernetes cluster’s network and a separate Google managed project. In order to restrict what Google are able to access within your cluster, the firewall rules configured restrict access to your Kubernetes pods. This means that in order to use the webhook component with a GKE private cluster, you must configure an additional firewall rule to allow the GKE control plane access to your webhook pod.

You can read more information on how to add firewall rules for the GKE control plane nodes in the GKE docs

Alternatively, you can disable the hooks by setting prometheusOperator.admissionWebhooks.enabled=false.

vsliouniaev on 13 Aug 2019

Thanks @vsliouniaev , sorry for the late reply but I still haven't had time to test it. Will do it today and let you know.

amartorelli on 13 Aug 2019

I'm just updating the issue as I find out possible causes - big thanks to the folks here for pointing me in the right direction: https://github.com/coreos/prometheus-operator/issues/2711

vsliouniaev on 13 Aug 2019

I ran the script present in the doc and I could deploy Prometheus-operator this time. I think that fixed it. I'm going to run some more tests, but I'm pretty confident the solution works.

Just a tiny issue I had with the script was that while creating the list of tags it was appending extra ,,,,, at the end.
I added a quick and simple sed to fix that, other people may find it useful.

#!/bin/bash
CLUSTER_NAME=clustername
CLUSTER_REGION=europe-west1
VPC_NETWORK=$(gcloud container clusters describe $CLUSTER_NAME --region $CLUSTER_REGION --format='value(network)')
MASTER_IPV4_CIDR_BLOCK=$(gcloud container clusters describe $CLUSTER_NAME --region $CLUSTER_REGION --format='value(privateClusterConfig.masterIpv4CidrBlock)')
NODE_POOLS_TARGET_TAGS=$(gcloud container clusters describe $CLUSTER_NAME --region $CLUSTER_REGION --format='value[terminator=","](nodePools.config.tags)' --flatten='nodePools[].config.tags[]' | sed 's/,\{2,\}//g')

echo $VPC_NETWORK
echo $MASTER_IPV4_CIDR_BLOCK
echo $NODE_POOLS_TARGET_TAGS

gcloud compute firewall-rules create "allow-apiserver-to-admission-webhook-8443" \
      --allow tcp:8443 \
      --network="$VPC_NETWORK" \
      --source-ranges="$MASTER_IPV4_CIDR_BLOCK" \
      --target-tags="$NODE_POOLS_TARGET_TAGS" \
      --description="Allow apiserver access to admission webhook pod on port 8443" \
      --direction INGRESS

amartorelli on 13 Aug 2019

👍1

Confirmed, I was able to release applications using prometheus rules.
@vsliouniaev thanks a lot for your help!

amartorelli on 13 Aug 2019

Thanks a lot for that @amalucelli , if you could add that to the docs in the chart that would be awesome!

vsliouniaev on 13 Aug 2019

The error log posted by @amartorelli shows that the call hits the service on port 443. How does enabling 8443 help in this case?

acondrat on 23 Sep 2019

@acondrat because the call is done to the operator service which listen to 443, and then forward to the pod on port 8443

allamand on 4 Oct 2019

@allamand Can you please confirm what forward means in this case? Do you mean that Operator service is redirecting the API service to call the POD directly on POD_IP:POD_PORT? So first call goes to service on port 443 which is OK and second call goes to pod on port 8443 which is NOK and requires an extra firewall rule.

thanks!

acondrat on 4 Oct 2019

Yes that's it the pod only listen on 8443, not 443, but service listen on 443 not 8443

allamand on 4 Oct 2019

if you have run before without disabling the webhook, you must manually delete the following kinds:

kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io //delete all objects
kubectl get MutatingWebhookConfiguration //delete all objects

and after that run:

BahmaniAlireza on 10 Mar 2020

❤8

@BahmaniAlireza this worked as it made things run, but none of the alert rules were created

patrickleet on 20 Mar 2020

That was due to kube version restrictions in chart

Using the following values is mostly working with GKE, aside from kube-proxy

prometheus-operator:
  coreDns:
    enabled: false
  defaultRules:
    create: true
  kubelet:
    enabled: true
    serviceMonitor:
      https: false
  kubeControllerManager:
    enabled: false
  kubeDns:
    enabled: true
  kubeEtcd:
    enabled: false
  kubeScheduler:
    enabled: false
  kubeTargetVersionOverride: "1.15.999"
  prometheusOperator:
    admissionWebhooks:
      enabled: false
      patch:
        enabled: false
    tlsProxy:
      enabled: false

patrickleet on 20 Mar 2020

In my case GKE firewall rule was already added, but I had a deny-all network policy for all in a namespace. So, I've granted access and PrometheusRule start working :)

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-admission-webhook-access
  namespace: monitoring
spec:
  podSelector:
    matchLabels:
      "app": "prometheus-operator"
  ingress:
    - from: []
      ports:
        - port: 8443

Policy example I took from Elasticsearch docs, possibly we need to add it to the prometheus-operator documentation:
https://github.com/elastic/cloud-on-k8s/pull/2524/files

Good luck.