Pipeline: Sidecars doesn't get terminated when the binary is in the nop image

Created on 23 Sep 2019 · 11Comments · Source: tektoncd/pipeline

Expected Behavior

Sidecars get terminated along the main container

Actual Behavior

This is a followup to the discussion we had with @sbwsg on this issue :

https://github.com/tektoncd/pipeline/issues/1253#issuecomment-527464092

Since the sidecar tests has been implemented we have seen some issues on our openshift based CI. The test would run waiting for a terminate state and fails waiting. Here is the test :

https://github.com/chmouel/tektoncd-pipeline/blob/chmouel-ci-test-1809/test/sidecar_test.go#L105-L107

We believe that we only just figured this out, It seems that it is because of the base image we are using that are based on a RHEL image called registry.access.redhat.com/ubi8/ubi:latest.

With KO the nop image is by default based according to https://github.com/google/ko#overriding-the-default-base-image on gcr.io/distroless/base:latestwhich has no /bin/sh while the RHEL image has.

We are guessing of what's happening is :

a main container runs with a sidecar container
the main and sidecar containers are both doing /bin/sh scripts like this :
https://github.com/chmouel/tektoncd-pipeline/blob/chmouel-ci-test-1809/test/sidecar_test.go#L49-L50
the main container gets killed, tekton controller sees that there is a sidecar container and replaces the main image name with a nop image and keep the same arguments.
If the nop container is able to run those arguments then it continue running instead of getting to killed state as it should be.

Steps to Reproduce the Problem

override the base image for nop with a container that has /bin/sh

diff --git a/.ko.yaml b/.ko.yaml
index 9b34cc27..27524d01 100644
--- a/.ko.yaml
+++ b/.ko.yaml
@@ -1,4 +1,5 @@
 baseImageOverrides:
+  github.com/tektoncd/pipeline/cmd/nop: google/cloud-sdk:alpine
   # TODO(christiewilson): Use our built base image
   github.com/tektoncd/pipeline/cmd/creds-init: gcr.io/knative-nightly/github.com/knative/build/build-base:latest
   github.com/tektoncd/pipeline/cmd/git-init: gcr.io/knative-nightly/github.com/knative/build/build-base:latest

Run the TestSideCar test :

% go test -failfast -v -count=1 -tags=e2e -ldflags '-X github.com/tektoncd/pipeline/test.missingKoFatal=false' ./test -timeout=20m --kubeconfig $KUBECONFIG -run TestSidecarTaskSupport

Additional Info

We probably want to figure why rewriting the Entrypoint is not possible.

/kind bug
/cc @sbwsg

kinbug

Source

chmouel

All 11 comments

Related Kubernetes issue for a RFE allowing spec.Container.Command to be modified https://github.com/kubernetes/kubernetes/issues/83059

chmouel on 24 Sep 2019

Ah, great work figuring this out @chmouel! I'm wondering - what is the reason for overriding the nop image?

sbwsg on 24 Sep 2019

Oh wait, nevermind, I see that https://github.com/google/ko#overriding-the-default-base-image describes some reasons to do so.

sbwsg on 24 Sep 2019

Our case is a bit different,

As a policy (and our CI enforces it) all our images needs to use our official distro so we have to base the nop image against a minimal RHEL container (called UBI).

chmouel on 24 Sep 2019

Does UBI contain a kill binary? If so then this issue may be resolved by https://github.com/tektoncd/pipeline/issues/1131 since, once implemented, that would no longer perform a nop image swap (and at the same time would introduce a nice simple contract for sidecar containers that want to be stopped "gracefully").

sbwsg on 24 Sep 2019

There is also the ongoing sidecar KEP which is progressively being implemented in Kubernetes. https://github.com/kubernetes/enhancements/issues/753 (and which Tekton may eventually use "under the hood" to run the sidecars).

sbwsg on 24 Sep 2019

@sbwsg ah really nice yes we do have a kill binary :

but if I understand your comment here https://github.com/tektoncd/pipeline/issues/1131#issuecomment-525845852 you want to SIGKILL process 1 in the sidecar container before it gets replaced to nop. That sidecar container is whatever the user choose to be, we are not enforcing our base images on this one, it's only on the images we ship from tekton that we have a RHEL base.

So what I am trying to say is that if we come down to the bullet point number 3. from your comment where we endup doing the swap this would end up the same as what we have here.

Having said that I don't see any alternative and if we implement #1131 things would def be better,

I am still wondering why k8 allows us to change the image name and not the entrypoints,

kubernetes/enhancements#753 is definitively the way to go, glad to see something like this being worked on.

chmouel on 24 Sep 2019

it's only on the images we ship from tekton that we have a RHEL base.

Ah yeah good point!

sbwsg on 24 Sep 2019

At least in the short term i think we should document this. I will do this today.

sbwsg on 24 Oct 2019

👍1

https://github.com/tektoncd/pipeline/pull/1464

sbwsg on 24 Oct 2019

Now that this is documented I'm going to close this issue.

sbwsg on 27 Apr 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Flaky TestPipelineRun when provisioning pvcs

chmouel · 3Comments

Run CI's YAML tests as a `go test` instead of bash scripts

sbwsg · 3Comments

k8s resource names are not configurable

silverlyra · 4Comments

enable script for conditions

pritidesai · 4Comments

Encorporate k8s documentation style guidelines

bobcatfish · 4Comments