Test-infra: Prow: statusreconciler pod entering Backoff status after failing

Created on 7 May 2019  路  5Comments  路  Source: kubernetes/test-infra

I'm trying out prow for running some end to end tests for my project. I've followed the instructions mentioned here, with the tackle based setup. While I have followed the bazel setup instructions and I get a clean exit from tackle setup, though statusreconciler pod fails to deploy.

What happened: statusreconciler pod enters Backoff status after failing

What you expected to happen: statusreconciler pod needs to be deployed with status success.

How to reproduce it (as minimally and precisely as possible): Following the tackle based setup step mentioned here. Setup using Google Kubernetes Engine with 3 nodes of n1-standard-2.

Please provide links to example occurrences, if any:

Anything else we need to know?:
Here are a few logs:

$ kubectl logs statusreconciler-68bb79c9c9-dsrwb
{"component":"status-reconciler","error":"stat /etc/job-config: no such file or directory","level":"fatal","msg":"Error starting config agent.","time":"2019-05-07T13:57:49Z"}



md5-8b34f0dc37acba0b30072da818e561c2



apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
statusreconciler'
creationTimestamp: "2019-05-07T13:46:46Z"
generateName: statusreconciler-68bb79c9c9-
labels:
app: statusreconciler
pod-template-hash: "2466357575"
name: statusreconciler-68bb79c9c9-dsrwb
namespace: default
ownerReferences:

  • apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: statusreconciler-68bb79c9c9
    uid: 8e3fd37b-70ce-11e9-a7e0-42010aa00003
    resourceVersion: "31575"
    selfLink: /api/v1/namespaces/default/pods/statusreconciler-68bb79c9c9-dsrwb
    uid: 8e451404-70ce-11e9-a7e0-42010aa00003
    spec:
    containers:
  • args:

    • --dry-run=false

    • --continue-on-error=true

    • --plugin-config=/etc/plugins/plugins.yaml

    • --config-path=/etc/config/config.yaml

    • --github-token-path=/etc/github/oauth

    • --job-config-path=/etc/job-config

      image: gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7

      imagePullPolicy: IfNotPresent

      name: statusreconciler

      resources:

      requests:

      cpu: 100m

      terminationMessagePath: /dev/termination-log

      terminationMessagePolicy: File

      volumeMounts:

    • mountPath: /etc/github

      name: oauth

      readOnly: true

    • mountPath: /etc/config

      name: config

      readOnly: true

    • mountPath: /etc/plugins

      name: plugins

      readOnly: true

    • mountPath: /var/run/secrets/kubernetes.io/serviceaccount

      name: statusreconciler-token-lzfll

      readOnly: true

      dnsPolicy: ClusterFirst

      nodeName: gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d

      priority: 0

      restartPolicy: Always

      schedulerName: default-scheduler

      securityContext: {}

      serviceAccount: statusreconciler

      serviceAccountName: statusreconciler

      terminationGracePeriodSeconds: 180

      tolerations:

  • effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  • effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
    volumes:
  • name: oauth
    secret:
    defaultMode: 420
    secretName: oauth-token
  • configMap:
    defaultMode: 420
    name: config
    name: config
  • configMap:
    defaultMode: 420
    name: plugins
    name: plugins
  • name: statusreconciler-token-lzfll
    secret:
    defaultMode: 420
    secretName: statusreconciler-token-lzfll
    status:
    conditions:
  • lastProbeTime: null
    lastTransitionTime: "2019-05-07T13:46:46Z"
    status: "True"
    type: Initialized
  • lastProbeTime: null
    lastTransitionTime: "2019-05-07T13:46:46Z"
    message: 'containers with unready status: [statusreconciler]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  • lastProbeTime: null
    lastTransitionTime: null
    message: 'containers with unready status: [statusreconciler]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  • lastProbeTime: null
    lastTransitionTime: "2019-05-07T13:46:46Z"
    status: "True"
    type: PodScheduled
    containerStatuses:
  • containerID: docker://7397fe32f93ecd1de8849484f0e91e7a9394a0a079783542ee3d28ee391bab08
    image: gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7
    imageID: docker-pullable://gcr.io/k8s-prow/status-reconciler@sha256:5c4d8250fe0f211337fd140c1c2aabcb3a9a849d99c8749e5c3c28159ca0b890
    lastState:
    terminated:
    containerID: docker://7397fe32f93ecd1de8849484f0e91e7a9394a0a079783542ee3d28ee391bab08
    exitCode: 1
    finishedAt: "2019-05-07T13:57:49Z"
    reason: Error
    startedAt: "2019-05-07T13:57:49Z"
    name: statusreconciler
    ready: false
    restartCount: 7
    state:
    waiting:
    message: Back-off 5m0s restarting failed container=statusreconciler pod=statusreconciler-68bb79c9c9-dsrwb_default(8e451404-70ce-11e9-a7e0-42010aa00003)
    reason: CrashLoopBackOff
    hostIP: 10.160.15.198
    phase: Running
    podIP: 10.44.2.15
    qosClass: Burstable
    startTime: "2019-05-07T13:46:46Z"



```console
$ kubectl describe pod statusreconciler-68bb79c9c9-dsrwb
Name:               statusreconciler-68bb79c9c9-dsrwb
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d/10.160.15.198
Start Time:         Tue, 07 May 2019 19:16:46 +0530
Labels:             app=statusreconciler
                    pod-template-hash=2466357575
Annotations:        kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container statusreconciler
Status:             Running
IP:                 10.44.2.15
Controlled By:      ReplicaSet/statusreconciler-68bb79c9c9
Containers:
  statusreconciler:
    Container ID:  docker://d24eca1e061bb24f44a7c977d0e0c239ad8456a3c6950a5f5a9becada2b06350
    Image:         gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7
    Image ID:      docker-pullable://gcr.io/k8s-prow/status-reconciler@sha256:5c4d8250fe0f211337fd140c1c2aabcb3a9a849d99c8749e5c3c28159ca0b890
    Port:          <none>
    Host Port:     <none>
    Args:
      --dry-run=false
      --continue-on-error=true
      --plugin-config=/etc/plugins/plugins.yaml
      --config-path=/etc/config/config.yaml
      --github-token-path=/etc/github/oauth
      --job-config-path=/etc/job-config
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 07 May 2019 19:18:24 +0530
      Finished:     Tue, 07 May 2019 19:18:24 +0530
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 07 May 2019 19:17:33 +0530
      Finished:     Tue, 07 May 2019 19:17:33 +0530
    Ready:          False
    Restart Count:  4
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /etc/config from config (ro)
      /etc/github from oauth (ro)
      /etc/plugins from plugins (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from statusreconciler-token-lzfll (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  oauth:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  oauth-token
    Optional:    false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      config
    Optional:  false
  plugins:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      plugins
    Optional:  false
  statusreconciler-token-lzfll:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  statusreconciler-token-lzfll
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                From                                                         Message
  ----     ------     ----               ----                                                         -------
  Normal   Scheduled  106s               default-scheduler                                            Successfully assigned default/statusreconciler-68bb79c9c9-dsrwb to gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d
  Normal   Pulling    105s               kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  pulling image "gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7"
  Normal   Pulled     102s               kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  Successfully pulled image "gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7"
  Normal   Created    8s (x5 over 102s)  kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  Created container
  Normal   Started    8s (x5 over 102s)  kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  Started container
  Normal   Pulled     8s (x4 over 101s)  kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  Container image "gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7" already present on machine
  Warning  BackOff    7s (x9 over 100s)  kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  Back-off restarting failed container
kinbug lifecyclstale

Most helpful comment

Hey @krishnadurai, the /etc/job-config is an arg to a bunch of components. You can either remove it in your deployment by removing - --job-config-path=/etc/job-config from the starter yaml. Or if you have jobs defined in a separate job-config, you can specify its location in that parameter. I also ran into this error, when I started, I'll make a PR in the documentation clarifying this.

All 5 comments

Hey @krishnadurai, the /etc/job-config is an arg to a bunch of components. You can either remove it in your deployment by removing - --job-config-path=/etc/job-config from the starter yaml. Or if you have jobs defined in a separate job-config, you can specify its location in that parameter. I also ran into this error, when I started, I'll make a PR in the documentation clarifying this.

Thanks @yashbhutwala for suggesting this change.
Looks like we still have to make this change in the bazel tackle version of Prow's deployment.

I'll comment here once I verify if statusreconciler works with this change.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/close

@krishnadurai: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xiangpengzhao picture xiangpengzhao  路  3Comments

cblecker picture cblecker  路  4Comments

BenTheElder picture BenTheElder  路  3Comments

chaosaffe picture chaosaffe  路  3Comments

fejta picture fejta  路  4Comments