I'm trying out prow for running some end to end tests for my project. I've followed the instructions mentioned here, with the tackle based setup. While I have followed the bazel setup instructions and I get a clean exit from tackle setup, though statusreconciler pod fails to deploy.
What happened: statusreconciler pod enters Backoff status after failing
What you expected to happen: statusreconciler pod needs to be deployed with status success.
How to reproduce it (as minimally and precisely as possible): Following the tackle based setup step mentioned here. Setup using Google Kubernetes Engine with 3 nodes of n1-standard-2.
Please provide links to example occurrences, if any:
Anything else we need to know?:
Here are a few logs:
$ kubectl logs statusreconciler-68bb79c9c9-dsrwb
{"component":"status-reconciler","error":"stat /etc/job-config: no such file or directory","level":"fatal","msg":"Error starting config agent.","time":"2019-05-07T13:57:49Z"}
md5-8b34f0dc37acba0b30072da818e561c2
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
statusreconciler'
creationTimestamp: "2019-05-07T13:46:46Z"
generateName: statusreconciler-68bb79c9c9-
labels:
app: statusreconciler
pod-template-hash: "2466357575"
name: statusreconciler-68bb79c9c9-dsrwb
namespace: default
ownerReferences:
```console
$ kubectl describe pod statusreconciler-68bb79c9c9-dsrwb
Name: statusreconciler-68bb79c9c9-dsrwb
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d/10.160.15.198
Start Time: Tue, 07 May 2019 19:16:46 +0530
Labels: app=statusreconciler
pod-template-hash=2466357575
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container statusreconciler
Status: Running
IP: 10.44.2.15
Controlled By: ReplicaSet/statusreconciler-68bb79c9c9
Containers:
statusreconciler:
Container ID: docker://d24eca1e061bb24f44a7c977d0e0c239ad8456a3c6950a5f5a9becada2b06350
Image: gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7
Image ID: docker-pullable://gcr.io/k8s-prow/status-reconciler@sha256:5c4d8250fe0f211337fd140c1c2aabcb3a9a849d99c8749e5c3c28159ca0b890
Port: <none>
Host Port: <none>
Args:
--dry-run=false
--continue-on-error=true
--plugin-config=/etc/plugins/plugins.yaml
--config-path=/etc/config/config.yaml
--github-token-path=/etc/github/oauth
--job-config-path=/etc/job-config
State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 07 May 2019 19:18:24 +0530
Finished: Tue, 07 May 2019 19:18:24 +0530
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 07 May 2019 19:17:33 +0530
Finished: Tue, 07 May 2019 19:17:33 +0530
Ready: False
Restart Count: 4
Requests:
cpu: 100m
Environment: <none>
Mounts:
/etc/config from config (ro)
/etc/github from oauth (ro)
/etc/plugins from plugins (ro)
/var/run/secrets/kubernetes.io/serviceaccount from statusreconciler-token-lzfll (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
oauth:
Type: Secret (a volume populated by a Secret)
SecretName: oauth-token
Optional: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: config
Optional: false
plugins:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: plugins
Optional: false
statusreconciler-token-lzfll:
Type: Secret (a volume populated by a Secret)
SecretName: statusreconciler-token-lzfll
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 106s default-scheduler Successfully assigned default/statusreconciler-68bb79c9c9-dsrwb to gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d
Normal Pulling 105s kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d pulling image "gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7"
Normal Pulled 102s kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d Successfully pulled image "gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7"
Normal Created 8s (x5 over 102s) kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d Created container
Normal Started 8s (x5 over 102s) kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d Started container
Normal Pulled 8s (x4 over 101s) kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d Container image "gcr.io/k8s-prow/status-reconciler:v20190503-f018ebad7" already present on machine
Warning BackOff 7s (x9 over 100s) kubelet, gke-krishna-prow-test-1-default-pool-6c3bb17d-f77d Back-off restarting failed container
Hey @krishnadurai, the /etc/job-config is an arg to a bunch of components. You can either remove it in your deployment by removing - --job-config-path=/etc/job-config from the starter yaml. Or if you have jobs defined in a separate job-config, you can specify its location in that parameter. I also ran into this error, when I started, I'll make a PR in the documentation clarifying this.
Thanks @yashbhutwala for suggesting this change.
Looks like we still have to make this change in the bazel tackle version of Prow's deployment.
I'll comment here once I verify if statusreconciler works with this change.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/close
@krishnadurai: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
Hey @krishnadurai, the
/etc/job-configis an arg to a bunch of components. You can either remove it in your deployment by removing- --job-config-path=/etc/job-configfrom the starter yaml. Or if you have jobs defined in a separatejob-config, you can specify its location in that parameter. I also ran into this error, when I started, I'll make a PR in the documentation clarifying this.