Pipelines: cluster-autoscaler: safe-to-evict Pod annotation

Created on 22 Sep 2020  路  10Comments  路  Source: kubeflow/pipelines

With the new multi-user pipelines feature, more pods are used which have emptyDir (local storage). Local storage causes issues for K8S clusters using the cluster-autoscaler, as it will refuse to evict pods which have local storage unless they have the following annotation:

cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

We should either make this anotations the default in the manifests, or make this a simple config to change for users.

help wanted

Most helpful comment

After further investigation it is almost always the istio-sidecar which is causing this issue.

The following Pipeline Deployment resources have this issue:

  • metadata-writer
  • ml-pipeline
  • ml-pipeline-persistenceagent
  • ml-pipeline-scheduledworkflow
  • ml-pipeline-ui
  • ml-pipeline-viewer-crd
  • ml-pipeline-visualizationserver

These two Deployment resources are more problematic (because a new one is created for each Profile):

  • ml-pipeline-ui-artifact
  • ml-pipeline-visualizationserver

To fully fix this, we need to:

All 10 comments

After further investigation it is almost always the istio-sidecar which is causing this issue.

The following Pipeline Deployment resources have this issue:

  • metadata-writer
  • ml-pipeline
  • ml-pipeline-persistenceagent
  • ml-pipeline-scheduledworkflow
  • ml-pipeline-ui
  • ml-pipeline-viewer-crd
  • ml-pipeline-visualizationserver

These two Deployment resources are more problematic (because a new one is created for each Profile):

  • ml-pipeline-ui-artifact
  • ml-pipeline-visualizationserver

To fully fix this, we need to:

Users can work around this issue (if they are installing istio with the istio helm chart), using the sidecarInjectorWebhook.injectedAnnotations in values.yaml to inject the safe-to-evict annotation.

WARNING: this is only a solution for users who are happy with applying safe-to-evict to all ALL workloads with an istio-sidecar, so we should still fix this in our manifests.

WARNING2: this will also affect Notebook pods, which is probably not the best, as users don't want their Pods randomly evicted from a Node

See here: https://github.com/istio/istio/blob/1.4.10/install/kubernetes/helm/istio/charts/sidecarInjectorWebhook/values.yaml#L59

Totally makes sense to me.

Regarding notebook pods, do we have any other mechanism to prevent them from being evicted?

emptyDir will be provided by ephemeral storage. Having many pod using ephemeral storage is not a good practice. I think applying safe-to-evict is a short term fix. It would be better to understand why all pods need emptyDir. Is it necessary?

Totally makes sense to me.

Regarding notebook pods, do we have any other mechanism to prevent them from being evicted?

@Bobgy I was talking about the workaround, this is NOT an issue if Kubeflow Pipelines updates their manifests

However, I agree that the Notebooks owners should make an effort to support users who apply the workaround (as its probably not that uncommon for Istio users to apply safe-to-evict with sidecarInjectorWebhook.injectedAnnotations

emptyDir will be provided by ephemeral storage. Having many pod using ephemeral storage is not a good practice. I think applying safe-to-evict is a short term fix. It would be better to understand why all pods need emptyDir. Is it necessary?

@Jeffwan the emptyDir is a feature of istio-sidecar, and is not specific to Kubeflow, see here: https://github.com/istio/istio/issues/19395

Until Changes are made in kubeflow manifests, a temporary workaround is to edit the configmap manually.

kubectl edit cm istio-sidecar-injector -n istio-system

Search for injectedAnnotations and add cluster-autoscaler.kubernetes.io/safe-to-evict: "true".

    injectedAnnotations:
      cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

@kubeflow/wg-pipeline-leads is there anyone looking at this issue currently?

Not yet, do you want to submit a PR?

/assign

Was this page helpful?
0 / 5 - 0 ratings