Test-infra: Prow: statusreconciler pod entering Backoff status after failing

Created on 7 May 2019 · 5Comments · Source: kubernetes/test-infra

I'm trying out prow for running some end to end tests for my project. I've followed the instructions mentioned here, with the tackle based setup. While I have followed the bazel setup instructions and I get a clean exit from tackle setup, though statusreconciler pod fails to deploy.

What happened: statusreconciler pod enters Backoff status after failing

What you expected to happen: statusreconciler pod needs to be deployed with status success.

How to reproduce it (as minimally and precisely as possible): Following the tackle based setup step mentioned here. Setup using Google Kubernetes Engine with 3 nodes of n1-standard-2.

Please provide links to example occurrences, if any:

Anything else we need to know?:
Here are a few logs:

$ kubectl logs statusreconciler-68bb79c9c9-dsrwb

{"component":"status-reconciler","error":"stat /etc/job-config: no such file or directory","level":"fatal","msg":"Error starting config agent.","time":"2019-05-07T13:57:49Z"}



md5-8b34f0dc37acba0b30072da818e561c2

apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
statusreconciler'
creationTimestamp: "2019-05-07T13:46:46Z"
generateName: statusreconciler-68bb79c9c9-
labels:
app: statusreconciler
pod-template-hash: "2466357575"
name: statusreconciler-68bb79c9c9-dsrwb
namespace: default
ownerReferences:

apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: statusreconciler-68bb79c9c9
uid: 8e3fd37b-70ce-11e9-a7e0-42010aa00003
resourceVersion: "31575"
selfLink: /api/v1/namespaces/default/pods/statusreconciler-68bb79c9c9-dsrwb
uid: 8e451404-70ce-11e9-a7e0-42010aa00003
spec:
containers:
args:
- --dry-run=false
- --continue-on-error=true
- --plugin-config=/etc/plugins/plugins.yaml
- --config-path=/etc/config/config.yaml
- --github-token-path=/etc/github/oauth
- --job-config-path=/etc/job-config
  
  image: gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7
  
  imagePullPolicy: IfNotPresent
  
  name: statusreconciler
  
  resources:
  
  requests:
  
  cpu: 100m
  
  terminationMessagePath: /dev/termination-log
  
  terminationMessagePolicy: File
  
  volumeMounts:
- mountPath: /etc/github
  
  name: oauth
  
  readOnly: true
- mountPath: /etc/config
  
  name: config
  
  readOnly: true
- mountPath: /etc/plugins
  
  name: plugins
  
  readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  
  name: statusreconciler-token-lzfll
  
  readOnly: true
  
  dnsPolicy: ClusterFirst
  
  nodeName: gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d
  
  priority: 0
  
  restartPolicy: Always
  
  schedulerName: default-scheduler
  
  securityContext: {}
  
  serviceAccount: statusreconciler
  
  serviceAccountName: statusreconciler
  
  terminationGracePeriodSeconds: 180
  
  tolerations:
effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
name: oauth
secret:
defaultMode: 420
secretName: oauth-token
configMap:
defaultMode: 420
name: config
name: config
configMap:
defaultMode: 420
name: plugins
name: plugins
name: statusreconciler-token-lzfll
secret:
defaultMode: 420
secretName: statusreconciler-token-lzfll
status:
conditions:
lastProbeTime: null
lastTransitionTime: "2019-05-07T13:46:46Z"
status: "True"
type: Initialized
lastProbeTime: null
lastTransitionTime: "2019-05-07T13:46:46Z"
message: 'containers with unready status: [statusreconciler]'
reason: ContainersNotReady
status: "False"
type: Ready
lastProbeTime: null
lastTransitionTime: null
message: 'containers with unready status: [statusreconciler]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
lastProbeTime: null
lastTransitionTime: "2019-05-07T13:46:46Z"
status: "True"
type: PodScheduled
containerStatuses:
containerID: docker://7397fe32f93ecd1de8849484f0e91e7a9394a0a079783542ee3d28ee391bab08
image: gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7
imageID: docker-pullable://gcr.io/k8s-prow/status-reconciler@sha256:5c4d8250fe0f211337fd140c1c2aabcb3a9a849d99c8749e5c3c28159ca0b890
lastState:
terminated:
containerID: docker://7397fe32f93ecd1de8849484f0e91e7a9394a0a079783542ee3d28ee391bab08
exitCode: 1
finishedAt: "2019-05-07T13:57:49Z"
reason: Error
startedAt: "2019-05-07T13:57:49Z"
name: statusreconciler
ready: false
restartCount: 7
state:
waiting:
message: Back-off 5m0s restarting failed container=statusreconciler pod=statusreconciler-68bb79c9c9-dsrwb_default(8e451404-70ce-11e9-a7e0-42010aa00003)
reason: CrashLoopBackOff
hostIP: 10.160.15.198
phase: Running
podIP: 10.44.2.15
qosClass: Burstable
startTime: "2019-05-07T13:46:46Z"




```console
$ kubectl describe pod statusreconciler-68bb79c9c9-dsrwb

Name:               statusreconciler-68bb79c9c9-dsrwb
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d/10.160.15.198
Start Time:         Tue, 07 May 2019 19:16:46 +0530
Labels:             app=statusreconciler
                    pod-template-hash=2466357575
Annotations:        kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container statusreconciler
Status:             Running
IP:                 10.44.2.15
Controlled By:      ReplicaSet/statusreconciler-68bb79c9c9
Containers:
  statusreconciler:
    Container ID:  docker://d24eca1e061bb24f44a7c977d0e0c239ad8456a3c6950a5f5a9becada2b06350
    Image:         gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7
    Image ID:      docker-pullable://gcr.io/k8s-prow/status-reconciler@sha256:5c4d8250fe0f211337fd140c1c2aabcb3a9a849d99c8749e5c3c28159ca0b890
    Port:          <none>
    Host Port:     <none>
    Args:
      --dry-run=false
      --continue-on-error=true
      --plugin-config=/etc/plugins/plugins.yaml
      --config-path=/etc/config/config.yaml
      --github-token-path=/etc/github/oauth
      --job-config-path=/etc/job-config
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 07 May 2019 19:18:24 +0530
      Finished:     Tue, 07 May 2019 19:18:24 +0530
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 07 May 2019 19:17:33 +0530
      Finished:     Tue, 07 May 2019 19:17:33 +0530
    Ready:          False
    Restart Count:  4
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /etc/config from config (ro)
      /etc/github from oauth (ro)
      /etc/plugins from plugins (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from statusreconciler-token-lzfll (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  oauth:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  oauth-token
    Optional:    false
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      config
    Optional:  false
  plugins:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      plugins
    Optional:  false
  statusreconciler-token-lzfll:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  statusreconciler-token-lzfll
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                From                                                         Message
  ----     ------     ----               ----                                                         -------
  Normal   Scheduled  106s               default-scheduler                                            Successfully assigned default/statusreconciler-68bb79c9c9-dsrwb to gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d
  Normal   Pulling    105s               kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  pulling image "gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7"
  Normal   Pulled     102s               kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  Successfully pulled image "gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7"
  Normal   Created    8s (x5 over 102s)  kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  Created container
  Normal   Started    8s (x5 over 102s)  kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  Started container
  Normal   Pulled     8s (x4 over 101s)  kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  Container image "gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7" already present on machine
  Warning  BackOff    7s (x9 over 100s)  kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d  Back-off restarting failed container

kinbug lifecyclstale

Source

krishnadurai

Most helpful comment

Hey @krishnadurai, the /etc/job-config is an arg to a bunch of components. You can either remove it in your deployment by removing - --job-config-path=/etc/job-config from the starter yaml. Or if you have jobs defined in a separate job-config, you can specify its location in that parameter. I also ran into this error, when I started, I'll make a PR in the documentation clarifying this.

yashbhutwala on 8 May 2019

❤1 👍1

All 5 comments

yashbhutwala on 8 May 2019

❤1 👍1

Thanks @yashbhutwala for suggesting this change.
Looks like we still have to make this change in the bazel tackle version of Prow's deployment.

I'll comment here once I verify if statusreconciler works with this change.

krishnadurai on 8 May 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 6 Aug 2019

/close

krishnadurai on 6 Aug 2019

@krishnadurai: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.