Pipeline: Optimize step signalling in entrypoint

Created on 14 Nov 2019 · 10Comments · Source: tektoncd/pipeline

The entrypoint signaling mechanism currently wakes up and checks for file changes written by the previous step every second. This is simple but in out experience slow. We have a synthetic test that runs a 20 step do nothing task that takes something like 60s to run. A similar raw Pod without the entrypoint runs in 10s. We should see if we can get those times down.

We might reduce sleep time to 500ms but another option is using fsnotify to make our signalling immediate. Another option described in #1569 is use a sidecar as a signaling hub.

areperformance help wanted kinfeature lifecyclrotten

Source

skaegi

👍2 🚀1

All 10 comments

/kind feature
/priority important-longterm

vdemeester on 15 Nov 2019

/assign

skaegi on 20 Nov 2019

Ok... so first off my initial numbers were totally incorrect. My imagePullPolicy was just not right so that was a good part of what I was seeing. Redoing my numbers in my cluster I see 6s for a vanilla pod case and 17s for the TaskRun case for a 20 step.

So I played with the entrypoint wait time...:
raw pod -- 6s (lower limit...)
1ms -- 12s (burning laptop... power percentage going down in real time)
50ms -- 10-11s
100ms -- 11-12s
200ms -- 11-12s
250ms -- 12-15s (sudden jump here -- not sure why -- might be specific to my test)
300ms -- 14-15s
500ms -- 14-16s
750ms -- 15-16s
1000ms -- 15-17s

The point here is not to pick a magic number like 200ms but to point out that in the process of optimizing the entrypoint the first big problem is that we're currently spending a significant chunk of time waiting that goes up more or less linearly with the number of steps. fsnotify might let us bring that overhead for the waiting bit to roughly zero so I'll try that out next.

Later I think it would be good to do a bit of analysis on the initial sync and maybe what the initcontainers are doing...

pod gist
taskrun gist

skaegi on 21 Nov 2019

Thanks for that data Simon! This makes me think we should have a metric for "overhead" time -- time spent between step[n].finish and step[n+1].start. That would let us gather data across a bunch of runs before and after (and during) tweaks to the poll interval and while moving to something better.

This is also something an operator might want to monitor, in case they want to precache popular step images for instance.

Unfortunately today we don't have a good strong signal about when a step actually started executing due to entrypoint rewriting. Tackling that first could help here and probably other places.

ImJasonH on 21 Nov 2019

👍1

I've been working with Kata containers a fair bit lately and... inotify does not work there 😿 I guess that means our advanced sleep technology is a really good choice for now.

skaegi on 12 Dec 2019

(remove/re-add labels to check if project automation bot is working, plz ignore)

dibyom on 12 Mar 2020

😄1

@skaegi feel free to bring this back to the API WG for discussion if it need priority attention

afrittoli on 15 Jun 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

tekton-robot on 14 Aug 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot on 14 Aug 2020

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot on 14 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings