Following https://github.com/linkerd/linkerd2/issues/3305#issuecomment-525033155, we'll like to update the linkerd check command to validate the properties of HA control planes, such as:
web, grafana and prometheus) should have more than one replicas.FailurePolicy should be set to Fail to prevent uninjected workloads from entering the service mesh.kube-system has the inject skip annotation.Let's check to make sure the kube-system namespace has the skip annotation as well.
Actually, IIRC, in order to support multi-stage installation, the linkerd check currently doesn't check for cluster-scoped resources, because the user running the command may not have the RBAC permissions. So checking for MWC/VWC's FailurePolicy and kube-system label won't work here.
I think it is fine to do a warning that says we were unable to check because of RBAC.
Hi, @grampelberg @ihcsim !
I'd like to work on this issue. Where can I start from?
@mayankshah1607 the list @ihcsim has is pretty good, you'll want to:
--ha.@grampelberg @ihcsim
This PR - https://github.com/linkerd/linkerd2/pull/3731 implements some parts of this issue like adding a new section to the checks to see if --ha is enabled and hence check if kube-system has the inject skip annotation.
I guess I'd have to wait for https://github.com/linkerd/linkerd2/pull/3731 to get merged to prevent any overlaps. Once merged, I'll open a PR that implements the remaining checks mentioned above. Is that ok?
@mayankshah1607 sgtm; I'll keep you posted.
@ihcsim @grampelberg It seems like point 5 (kube-system annotation) has been fixed in https://github.com/linkerd/linkerd2/pull/3731 . Could we remove it from this issue then?
@grampelberg I think we should close this one now that https://github.com/linkerd/linkerd2/pull/3942 has been merged :)