Pipeline: Task fails to run when using runAsNonRoot and runAsUser

Created on 6 Mar 2020  路  19Comments  路  Source: tektoncd/pipeline

This doc shows an example of using runAsNonRoot.

The Pod will also run as a non-root user.

Expected Behavior

Tasks run successfully.

Actual Behavior

Tested this on both GKE and OpenShift 4 (CodeReady Containers, actually). I set runAsNonRoot (or runAsUser) in a TaskRun like this:

spec:
  taskRef:
    name: jib-gradle
  ...
  podTemplate:
    securityContext:
      runAsNonRoot: true

runAsUser

  • When using runAsUser: 12345, the tekton-results-folder-writable container run fails with an error. I think the command chmod 777 /tekton/results doesn't work due to a permission issue.
  tekton-results-folder-writable:
    Container ID:  cri-o://bfe047e46148846f24efd90608ec31796c2e3359fff23c2905c5f942213bd168
    Image:         busybox
    Image ID:      docker.io/library/busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
    Args:
      -c
      chmod 777 /tekton/results
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 06 Mar 2020 12:18:49 -0500
      Finished:     Fri, 06 Mar 2020 12:18:49 -0500
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /tekton/home from tekton-internal-home (rw)
      /tekton/results from tekton-internal-results (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-khxck (ro)
      /workspace from tekton-internal-workspace (rw)

runAsNonRoot

  • When using runAsNonRoot: true, I saw slightly different messages on GKE and OpenShfit, but both of them hang indefintely.

    • GKE

      NAME READY STATUS RESTARTS AGE pod/jib-gradle-run-pod-kbpp2 0/4 Init:CreateContainerConfigError 0 21m

      Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 9m2s default-scheduler Successfully assigned default/jib-gradle-run-pod-kbpp2 to gke-cluster-2-default-pool-12e48c0a-wwmc Normal Pulled 7m56s (x8 over 9m1s) kubelet, gke-cluster-2-default-pool-12e48c0a-wwmc Successfully pulled image "busybox" Warning Failed 7m56s (x8 over 9m1s) kubelet, gke-cluster-2-default-pool-12e48c0a-wwmc Error: container has runAsNonRoot and image will run as root Normal Pulling 3m54s (x27 over 9m1s) kubelet, gke-cluster-2-default-pool-12e48c0a-wwmc Pulling image "busybox"

    • OpenShift

      Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned chanseok/jib-gradle-run-pod-zmzpb to crc-w6th5-master-0 Normal Pulled 1s (x3 over 3s) kubelet, crc-w6th5-master-0 Container image "gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/creds-init@sha256:d1251c017ad8db911f6b459f9cbe94328c962b619b5a6e4ef57524254c1dc30d" already present on machine Warning Failed 1s (x3 over 3s) kubelet, crc-w6th5-master-0 Error: container has runAsNonRoot and image will run as root

Steps to Reproduce the Problem

  1. Set runAsNonRoot: true or runAsUser: 12345 in PodTemplate.
  2. Run a task.

Additional Info

I saw #1872, but I don't even get to that point.

Tekton version: v0.11.0-rc1

kindocumentation

All 19 comments

Thanks for the report. I believe this is a duplicate of https://github.com/tektoncd/pipeline/issues/2172 and the fix will be rolled into 0.11.0-rc2, which is planned to be released early next week.

Ah, @piyush-garg got me before me. But I wonder if runAsNonRoot is a different issue. It doesn't fail with chmod but hangs.

Warning  Failed     1s (x3 over 3s)  kubelet, crc-w6th5-master-0  Error: container has runAsNonRoot and image will run as root

Are you able to tell from the describe or get output of kubectl which container this error is referring to?

It's right before running tekton-results-folder-writable (i.e., State: Waiting). The reason is CreateContainerConfigError.

$ kubectl logs pod/jib-gradle-run-pod-kbpp2 -c tekton-results-folder-writable
Error from server (BadRequest): container "tekton-results-folder-writable" in pod "jib-gradle-run-pod-kbpp2" is waiting to start: CreateContainerConfigError
$ kubectl describe pod/jib-gradle-run-pod-kbpp2
Name:           jib-gradle-run-pod-kbpp2
Namespace:      default
Priority:       0
Node:           gke-cluster-2-default-pool-12e48c0a-wwmc/10.128.0.36
Start Time:     Fri, 06 Mar 2020 12:58:47 -0500
Labels:         app.kubernetes.io/managed-by=tekton-pipelines
                tekton.dev/task=jib-gradle
                tekton.dev/taskRun=jib-gradle-run
Annotations:    kubectl.kubernetes.io/last-applied-configuration:
                  {"apiVersion":"tekton.dev/v1alpha1","kind":"Task","metadata":{"annotations":{},"name":"jib-gradle","namespace":"default"},"spec":{"inputs"...
                kubernetes.io/limit-ranger:
                  LimitRanger plugin set: cpu request for init container tekton-results-folder-writable; cpu request for init container place-scripts; cpu r...
                pipeline.tekton.dev/release: devel
Status:         Pending
IP:             10.44.0.5
Controlled By:  TaskRun/jib-gradle-run
Init Containers:
  tekton-results-folder-writable:
    Container ID:
    Image:         busybox
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
    Args:
      -c
      chmod 777 /tekton/results
    State:          Waiting
      Reason:       CreateContainerConfigError
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /tekton/home from tekton-internal-home (rw)
      /tekton/results from tekton-internal-results (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kj8l9 (ro)

Oh, but I see you removed tekton-results-folder-writable entirely in #2143, so I guess this won't happen in the next release.

hm, CreateContainerConfigError implies something may be wrong in the secrets or configmap mounts. What do the Task and TaskRun specs look like for this?

Still not 100% sure if this is related to the chmod init container or something else.

$ kubectl apply -f https://raw.githubusercontent.com/tektoncd/catalog/master/jib-gradle/jib-gradle.yaml
task.tekton.dev/jib-gradle created

And apply the following TaskRun.

apiVersion: tekton.dev/v1alpha1
kind: TaskRun
metadata:
  name: jib-gradle-run
spec:
  taskRef:
    name: jib-gradle
  inputs:
    resources:
    - name: source
      resourceSpec:
        type: git
        params:
        - name: url
          value: https://github.com/che-samples/console-java-simple
  outputs:
    resources:
    - name: image
      resourceSpec:
        type: image
        params:
        - name: url
          value: doesnotmatter
  podTemplate:
    securityContext:
      runAsNonRoot: true                                                         

Can verify I am seeing the same behavior a v1.17.3 cluster on AWS. Only difference is I am setting securityContext on the steps of a Task:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: echo-task
spec:
  steps:
    - name: echo-first
      image: ubuntu
      command:
        - echo
      args:
        - "Executing first"
      securityContext:
        runAsUser: 1000
    - name: echo-second
      image: ubuntu
      command:
        - echo
      args:
        - "Executing second"
      securityContext:
        runAsUser: 1000

kubectl describe pod echo-task-run-std22-pod-g9t85:

Name:         echo-task-run-std22-pod-g9t85
Namespace:    lab-tekton-fundamentals-w01-s001
Priority:     0
Node:         ip-10-0-1-18.us-east-2.compute.internal/10.0.1.18
Start Time:   Tue, 21 Apr 2020 19:10:25 +0000
Labels:       app.kubernetes.io/managed-by=tekton-pipelines
              tekton.dev/task=echo-task
              tekton.dev/taskRun=echo-task-run-std22
Annotations:  cni.projectcalico.org/podIP: 192.168.40.37/32
              kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"tekton.dev/v1beta1","kind":"Task","metadata":{"annotations":{},"name":"echo-task
","namespace":"lab-tekton-fundamentals-w01-...
              kubernetes.io/psp: vmware-system-tmc-restricted
              pipeline.tekton.dev/release: devel
              seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status:       Pending
IP:           192.168.40.37
IPs:
  IP:           192.168.40.37
Controlled By:  TaskRun/echo-task-run-std22
Init Containers:
  place-tools:
    Container ID:
    Image:         gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/entrypoint:v0.11.1@sha256:4475ce9472
2b7a6b4ca0cb4244e4cd6d8781d30ca1527767474334518908c405
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      /ko-app/entrypoint
      /tekton/tools/entrypoint
    State:          Waiting
      Reason:       CreateContainerConfigError
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /tekton/tools from tekton-internal-tools (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-597d9 (ro)
Containers:
  step-echo-first:
    Container ID:
    Image:         ubuntu
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      /tekton/tools/entrypoint
    Args:
      -wait_file
      /tekton/downward/ready
      -wait_file_content
      -post_file
      /tekton/tools/0
      -termination_path
      /tekton/termination
      -entrypoint
      echo
      --
      Executing first
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:                0
      ephemeral-storage:  0
      memory:             0
    Environment:
      HOME:  /tekton/home
    Mounts:
      /tekton/downward from tekton-internal-downward (rw)
      /tekton/home from tekton-internal-home (rw)
      /tekton/results from tekton-internal-results (rw)
      /tekton/tools from tekton-internal-tools (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-597d9 (ro)
      /workspace from tekton-internal-workspace (rw)
  step-echo-second:
    Container ID:
    Image:         ubuntu
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      /tekton/tools/entrypoint
    Args:
      -wait_file
      /tekton/tools/0
      -post_file
      /tekton/tools/1
      -termination_path
      /tekton/termination
      -entrypoint
      echo
      --
      Executing second
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:                0
      ephemeral-storage:  0
      memory:             0
    Environment:
      HOME:  /tekton/home
    Mounts:
      /tekton/home from tekton-internal-home (rw)
      /tekton/results from tekton-internal-results (rw)
      /tekton/tools from tekton-internal-tools (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-597d9 (ro)
      /workspace from tekton-internal-workspace (rw)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  tekton-internal-workspace:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  tekton-internal-home:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  tekton-internal-results:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  tekton-internal-tools:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  tekton-internal-downward:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations['tekton.dev/ready'] -> ready
  default-token-597d9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-597d9
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From                                              Message
  ----     ------     ----                  ----                                              -------
  Normal   Scheduled  15m                   default-scheduler                                 Successfully assi
gned lab-tekton-fundamentals-w01-s001/echo-task-run-std22-pod-g9t85 to ip-10-0-1-18.us-east-2.compute.internal
  Warning  Failed     12m (x12 over 15m)    kubelet, ip-10-0-1-18.us-east-2.compute.internal  Error: container
has runAsNonRoot and image will run as root
  Normal   Pulled     4m54s (x50 over 15m)  kubelet, ip-10-0-1-18.us-east-2.compute.internal  Container image "
gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/entrypoint:v0.11.1@sha256:4475ce94722b7a6b4ca0cb4244e4c
d6d8781d30ca1527767474334518908c405" already present on machine

This can be resolved by setting fsGroup and runAsGroup on the Pod template in addition to runAsUser:

securityContext:
  runAsUser: 1000
  runAsGroup: 3000
  fsGroup: 2000

See more here: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod

@chanseokoh @danielhelfand at this point should we consider this a bug in Pipelines, a lack of documentation in Pipelines, or just expected behaviour when using runAsUser/runAsNonRoot? Or maybe all three? I'm not totally clear on the actions that need to be taken to resolve this issue.

In the case of steps, I think that this is possibly a bug with pipelines as I cannot get this to work correctly with steps that are part of a Task definition.

For runAsNonRoot, you still need to specify a UID:

securityContext:
      runAsNonRoot: true
      runAsUser: 1001

Any documentation suggesting otherwise is incorrect.

When it comes to using runAsUser, however, runAsGroup and fsGroup must be specified as well. In many cases, certain distributions of Kubernetes will handle fsGroup on behalf of users, so it may not be necessary. At the very least, any documentation with runAsUser without runAsNonRoot should also include runAsGroup/fsGroup.

So digging further here, Tekton does not account for the securityContext of steps that are defined as part of a Task, such as:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: echo-task
spec:
  steps:
    - name: echo-first
      image: ubuntu
      command:
        - echo
      args:
        - "Executing first"
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001

The only securityContext that is applied is whatever is applied on the TaskRun or PipelineRun pod template:

https://github.com/tektoncd/pipeline/blob/c30d3aa0230faab3be210fccac19f7bb7ddd658f/pkg/pod/pod.go#L211

So this really comes down to two things:

  1. The documentation should be updated for securityContext through out the project if only runAsNonRoot/runAsUser are being specified for a securityContext
  2. The securityContext of steps should be applied if specified

For runAsNonRoot, you still need to specify a UID:

securityContext:
      runAsNonRoot: true
      runAsUser: 1001

This may not be the fault of Tekton, but to me, this doesn't sound intuitive. If you have to use runAsUser, then there's no reason for the existence of runAsNonRoot. Is runAsNonRoot deprecated? Maybe I'm missing something.

runAsNonRoot is about validating containers running in the pod, not actually setting the user. See more under the Kubernetes pod documentation:

runAsNonRoot | boolean | Indicates that the container must run as a non-root user. If true, the Kubelet will validate the image at runtime to ensure that it does not run as UID 0 (root) and fail to start the container if it does. If unset or false, no such validation will be performed. May also be set in PodSecurityContext. If set in both SecurityContext and PodSecurityContext, the value specified in SecurityContext takes precedence.

In general, running as a non root user should be built into the container image itself as opposed to relying on a securityContext whenever possible.

Ah, makes sense. Thanks for the explanation. It would have been better if the name were verifyNonRoot or such, because this is inconsistent with runAsUser. (I would expect that runAsUser: 1001 validates that the container runs with UID 1001, not making the container run with UID 1001.)

Last piece of information on this: To specify a securityContext for a step, the Pod also must specify a securityContext. Basically, a securityContext must be specified at the Pod level, but can be overrode at the Container level if something is specified. For example:

apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: echo-task
spec:
  steps:
    - name: echo-first
      image: ubuntu
      command:
        - sleep
      args:
        - 120s
      securityContext:
        runAsUser: 2000
        allowPrivilegeEscalation: false
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  generateName: echo-task-run-
spec:
  taskRef:
    name: echo-task
  podTemplate:
    securityContext:
      runAsNonRoot: true
      runAsUser: 1001

The combination above will both run in an environment that does not allow root user containers and also allow the step to specify its own user to run as amongst other properties. So this is not actually an issue with Pipelines itself, but rather a documentation/examples issue that should be better highlighted.

馃憤 fantastic summary and investigation, thank you very much @danielhelfand ! I've updated labels on the issue to reflect this info.

Sure, happy to help. If no one is working on this, more than happy to pick it up.

/assign

Was this page helpful?
0 / 5 - 0 ratings