Pipeline: Schedule Pods in resource-constrained environments

Created on 5 Apr 2019  路  4Comments  路  Source: tektoncd/pipeline

Expected Behavior

In a resource-constrained environment like a namespace with resource limits imposed (or just an insufficiently provisioned cluster), creating a TaskRun (Pod) that exceeds those limits should not fail the TaskRun, but should instead continually try to create the Pod until it either succeeds or times out.

Actual Behavior

Pods fail to start and the TaskRun is failed ~immediately.

Steps to Reproduce the Problem

  1. Define a namespace with resource constraints (e.g., 10 CPU, 10 GB RAM)
  2. Create 15 TaskRuns each requesting 1 CPU and 1 GB RAM, running hello world or something simple
  3. ~10 of those will be scheduled and will succeed, the rest will fail due to insufficient resources.

Additional Info

This is similar to how Jobs can handle Pod scheduling failures by retrying until they are successful.

It's unclear whether users would expect TaskRuns waiting for sufficient resources to queue in order of the time they were created, or whether they'd expect the Kubernetes scheduler to do whatever it needs to do to schedule the Pods. As an initial implementation it's probably fine to have Kubernetes schedule Pods, and not have to worry about enforcing FIFO.

good first issue help wanted kinfeature

Most helpful comment

I'm going to move this into a design doc. There are enough variables here to seed some discussion and it'd be good to get broader input before committing to one approach.

All 4 comments

/assign @sbwsg

Been working through some of the implementation details in a POC but want to drop current working notes here since I likely won't be able to work on it more until tomorrow.

  1. Catching a pod failure is relatively straightforward; checking the error message produced by the createPod() func in pkg/reconciler/v1alpha1/taskrun/taskrun.go reveals the reason. From here it's quick to parse out the error message and look for e.g. "exceeded quota" in the string. This relies on a somewhat brittle contract though. I'll also need to check for the different messages generated both by LimitRanges as well as ResourceQuotas since it looks like they both enforce resource limits on a pod. I'm currently looking around to see if there's a less brittle approach to this error checking.

  2. Once the resource constraint error is detected the pod then needs to be restarted. In my POC implementation this works by simply Enqueue()ing the TR to be re-assessed on the next reconcile loop. This results in many rapid reruns however when really it would be nicer to see an exponential backoff strategy similar to that used by k8s' job controller. The job controller uses a particular kind of workqueue to implement this ("ExponentialFailureRateLimiter") but TaskRun's controller Impl uses the "RateLimitedQueue", which is set up via knative's controller.NewImpl() func. So I'm looking at other alternatives to implement this.

I'm going to move this into a design doc. There are enough variables here to seed some discussion and it'd be good to get broader input before committing to one approach.

I've started the design doc here including use cases, a draft implementation, some open questions and possible alternative implementations that I'm still working through.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sbwsg picture sbwsg  路  4Comments

silverlyra picture silverlyra  路  4Comments

r0bj picture r0bj  路  3Comments

objectiveous picture objectiveous  路  3Comments

csantanapr picture csantanapr  路  3Comments