Tell us about your request
In some scenarios it is useful for Kubernetes operators to know the health of the EKS control plane. Some applications or pods may overload the control plane and it can be helpful to know this. Having control plane metrics in cloudwatch such as:
can help customers diagnosing slowness or unresponsiveness to the control plane
Which service(s) is this request for?
EKS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Sometimes if the control plane is slow we would like to know if there has been a spike in requests to the API, is there a spike in amount of errors. Did we have a spike in new pods .
Are you currently working around this issue?
Scraping the /metrics endpoint on the Kubernetes service
Hey everyone, I鈥檓 a Product Manager for CloudWatch. We are looking for people to join our beta program to provide feedback and test Prometheus metric monitoring in CloudWatch. The beta program will allow you to test the collection of the EKS Control Plane Metrics exposed as Prometheus metrics. Email us if interested, [email protected].
Can we include the cluster component status into the CloudWatch as well, for example:
These can be used to set up CloudWatch alarm when a custom webhook breaks the component, for example, a newly installed ValidatingWebhook that breaks the Scheduler renew lease calls.
@starchx - https://aws.amazon.com/about-aws/whats-new/2020/09/amazon-cloudwatch-monitors-prometheus-metrics-container-environments/.
You can use CloudWatch Prometheus agent to support above use case. In the first phase (already available), we encourage you to configure the agent to consume Control plane metrics for EKS and leverage CloudWatch alarms. In the second phase, we will also build an automated and out of the box dashboard for EKS Control plane.
Check out this workshop to learn more: https://observability.workshop.aws/en/containerinsights/eks/_prometheusmonitoring.html
I don't mind scraping the endpoints myself since I use Datadog for monitoring but not having access to the schedulers or control plane manger metrics endpoint is tough. For example, without access to the kube scheduler my team and I are unable to track "time to schedule a pod" which is a key service level indicator for us.
Most helpful comment
Hey everyone, I鈥檓 a Product Manager for CloudWatch. We are looking for people to join our beta program to provide feedback and test Prometheus metric monitoring in CloudWatch. The beta program will allow you to test the collection of the EKS Control Plane Metrics exposed as Prometheus metrics. Email us if interested, [email protected].