Kubectl: The documentation of kubectl wait is a bit miss-leading

Created on 3 Nov 2019  路  14Comments  路  Source: kubernetes/kubectl

The --timeout option for kubectl wait says:

"The length of time to wait before giving up" .

I personally understand this as the timeout for the entire command but after experimenting with it a bit I realised that the value of this option is per resource. For example, if I am waiting for a set of 10 pods to be Ready and used --timeout=60s, I might end up waiting for 10 minutes before the command exits, and not just 1 minute as I assumed.

So as I see it, there are 2 possible solutions here:

  1. Improve the documentation.
  2. Change the implementation so that the --timeout value will be for the entire command duration (preferable solution in my opinion).
arekubectl kinbug lifecyclrotten prioritbacklog sicli

Most helpful comment

/remove-lifecycle rotten

I just experienced exactly the issue described in the OP, and found it very confusing. I noticed these log messages from my e2e test:

+ kubectl wait --for=condition=ready pod -l prepull-test-images=e2e --timeout 30m
timed out waiting for the condition on pods/prepull-test-containers-gxjhh
timed out waiting for the condition on pods/prepull-test-containers-mqf4j
timed out waiting for the condition on pods/prepull-test-containers-r96tp
+ kubectl get pods -o wide
NAME                            READY   STATUS             RESTARTS   AGE   IP          NODE                                           NOMINATED NODE   READINESS GATES
prepull-test-containers-gxjhh   14/17   CrashLoopBackOff   85         92m   10.64.2.3   e2e-726b009feb-d872a-windows-node-group-ltb5   <none>           <none>
prepull-test-containers-mqf4j   14/17   CrashLoopBackOff   84         92m   10.64.1.3   e2e-726b009feb-d872a-windows-node-group-0qdq   <none>           <none>
prepull-test-containers-r96tp   14/17   CrashLoopBackOff   82         92m   10.64.3.3   e2e-726b009feb-d872a-windows-node-group-lc19   <none>           <none>

The age of the pods is 90 minutes rather than the 30 minutes I expected, because kubectl wait is applying the timeout to each selected pod sequentially. This is very unintuitive to me. If this default behavior can't be changed, perhaps a flag can be added to change the behavior to enforce the timeout across all selected pods/conditions.

All 14 comments

Actually the kubectl wait will do the watch request for every single resource and do the request one by one:

GET https://10.6.192.3:6443/api/v1/namespaces/default/pods?fieldSelector=metadata.name%3Dmy-busybox&resourceVersion=5594455&watch=true

But watch request is actually a unlimited connection so every request has a timeout(default is 30s in wait).

So the only way to do your second solution is average time for the timeout, but seems not very good to do that.

I implemented sort of a wrapper for internal use with the following logic:

  1. run kubectl wait <resources> --timeout=0
  2. parse the output from step 1 to get a list of unready_pods
  3. while (command_elapsed_time < command_timeout and len(unready_pods) > 0)
    a. run kubectl wait <unready_pods> --timeout=0
    b. parse the output from step 3.a to get a list of unready_pods
  4. if len(unready_pods) > 0
    a. raise a command_timeout exception.

IMHO, this logic fits better to the use cases kubernetes users encounter on the day to day work but again, my opinion only. I'm not a go developer so I can't submit a PR with this logic but I would definitely like to hear more opinions about this logic.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

/remove-lifecycle rotten

I just experienced exactly the issue described in the OP, and found it very confusing. I noticed these log messages from my e2e test:

+ kubectl wait --for=condition=ready pod -l prepull-test-images=e2e --timeout 30m
timed out waiting for the condition on pods/prepull-test-containers-gxjhh
timed out waiting for the condition on pods/prepull-test-containers-mqf4j
timed out waiting for the condition on pods/prepull-test-containers-r96tp
+ kubectl get pods -o wide
NAME                            READY   STATUS             RESTARTS   AGE   IP          NODE                                           NOMINATED NODE   READINESS GATES
prepull-test-containers-gxjhh   14/17   CrashLoopBackOff   85         92m   10.64.2.3   e2e-726b009feb-d872a-windows-node-group-ltb5   <none>           <none>
prepull-test-containers-mqf4j   14/17   CrashLoopBackOff   84         92m   10.64.1.3   e2e-726b009feb-d872a-windows-node-group-0qdq   <none>           <none>
prepull-test-containers-r96tp   14/17   CrashLoopBackOff   82         92m   10.64.3.3   e2e-726b009feb-d872a-windows-node-group-lc19   <none>           <none>

The age of the pods is 90 minutes rather than the 30 minutes I expected, because kubectl wait is applying the timeout to each selected pod sequentially. This is very unintuitive to me. If this default behavior can't be changed, perhaps a flag can be added to change the behavior to enforce the timeout across all selected pods/conditions.

/kind bug

/area kubectl
/sig cli

/priority backlog

/assign

This is easy to see with a small timeout if you have more than one resource

For example with a 2s timeout it takes approximately number of pods * 2s to timeout, not 2s

Also note that there is no output until all of them timeout, which also adds up to the confusion

$ kubectl get pods -l release=myrelease -o name | wc -l
      17

$ time bash -c "kubectl wait --for=delete pods -l release=tudorkata  --timeout=2s 2>&1 | ts '[%Y-%m-%d %H:%M:%S]'"
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-c66pr
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-8tgq6
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-gx5mn
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-6xb9b
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-s9cbv
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-qtwx7
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-qk842
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-l6wq2
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-zv25k
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-diss0
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-mh8h2
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-cb2zn
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-q879p
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-skggg
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-4mklb
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-pn94r
[2020-09-28 20:40:15] timed out waiting for the condition on pods/myrelease-r2226
kubectl wait --for=delete pods -l release=myrelease --timeout=2s  0.23s user 0.09s system 0% cpu 40.267 total

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

Was this page helpful?
0 / 5 - 0 ratings