Kubebuilder: watch terminating pods

Created on 25 Mar 2019  ·  25Comments  ·  Source: kubernetes-sigs/kubebuilder

I want to observe the pods the controller creates so I can update the CRD's status.

For example, say how many pods are running and etc my running kubectl describe my-crd.

To do that, I added a watch to v1.Pod. It seems to work when pods are being created and etc, but doesn't once the pod enters in terminating state 🤔

I tried both this:

if err := c.Watch(&source.Kind{Type: &v1.Pod{}}, &handler.EnqueueRequestForObject{}); err != nil {
    return err
}

and this:

if err := c.Watch(&source.Kind{Type: &v1.Pod{}}, &handler.EnqueueRequestForObject{
    IsController: true,
    OwnerType:    &appsv1beta1.App{},
}); err != nil {
    return err
}

in both cases, after the pod shows as "terminating" on kubectl get pods, it starts being ignored.

Since when terminating the pod state is actually still running, I end up with a wrong status...

Am I doing something wrong?

All 25 comments

Terminating is just a soft-delete. A pod enters terminating when deletiontimestamp is set, and then will eventually be deleted. You should be able to check if the pod is in terminating by checking the deletion timestamp. Eventually, it'll be deleted, at which point it won't show up in the cache, but you'll get a reconcile request.

See https://github.com/kubernetes/kubernetes/blob/b7394102d6ef778017f2ca4046abbaa23b88c290/pkg/kubectl/describe/versioned/describe.go#L676-L681 for how kubectl does it.

Alternatively, here's an example in controller form:


main.go

/*
Copyright 2018 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package main

import (
    "os"
    "context"

    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    "sigs.k8s.io/controller-runtime/pkg/log/zap"
    apierrors "k8s.io/apimachinery/pkg/api/errors"
    corev1 "k8s.io/api/core/v1"
)

var (
    initLog = ctrl.Log.WithName("setup")
    ctrlLog = ctrl.Log.WithName("controller")
)

func ignoreNotFound(err error) error {
    if apierrors.IsNotFound(err) {
        return nil
    }
    return err
}

type podLogger struct {
    client.Client
}

func (r *podLogger) Reconcile(req ctrl.Request) (ctrl.Result, error) {
    ctx := context.Background()
    log := ctrlLog.WithValues("pod", req.NamespacedName)

    var pod corev1.Pod
    if err := r.Get(ctx, req.NamespacedName, &pod); err != nil {
        log.Error(err, "no such pod")
        return ctrl.Result{}, ignoreNotFound(err)
    }

    if pod.DeletionTimestamp != nil {
        log.Info("observed pod", "phase", "Terminating")
    } else {
        log.Info("observed pod", "phase", pod.Status.Phase)
    }
    return ctrl.Result{}, nil
}

func main() {
    ctrl.SetLogger(zap.Logger(true))

    mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{})
    if err != nil {
        initLog.Error(err, "unable to initialize manager")
        os.Exit(1)
    }

    err = ctrl.NewControllerManagedBy(mgr).
        For(&corev1.Pod{}).
        Complete(&podLogger{
            Client: mgr.GetClient(),

        })
    if err != nil {
        initLog.Error(err, "unable to add controller")
        os.Exit(1)
    }

    initLog.Info("running manager")
    stopCh := ctrl.SetupSignalHandler()
    if err := mgr.Start(stopCh); err != nil {
        initLog.Error(err, "unable to continue running manager")
        os.Exit(1)
    }
}

(closing the issue for now, feel free to comment if this doesn't answer your question)

hmm, got it. Thanks! :)

it also happens with other states though.

Eg: pod is in imagepullbackoff but my status still is containercreating... any ideas?

maybe because it's actually a change to pod status and not to the pod itself? 🤔

maybe because its pod created from a job object?

I did a really ugly hack for this particular use case, and it more or less works:

if instance.Status.Status == "ContainerCreating" {
    time.Sleep(time.Second)
    return r.Reconcile(request)
}
return reconcile.Result{}, err

I put that in the end of my Reconcile func, if you have better ideas I'm all ears hehe

you shouldn't sleep in a reconcile loop -- you'll block things. If you really need to do something like that, use a requeueafter, but generally you'll probably just want to return immediately and wait for another queue on the update.

wait for another queue on the update.

that's the thing, no new updates arrive in the queue based on the pod events...

BTW didn't knew about requeueafter, will check that out! thanks :)

that's the thing, no new updates arrive in the queue based on the pod events...

aaah, sorry, doing triage, read too quickly ;-)

Will try to repro tomorrow. Can you post a minimal reproducer -- it would make things much easier

yeah let me do that real quick

https://github.com/caarlos0/test-operator

there.

just install/run and etc and apply the example.. should be reproducible. I can reproduce this on GKE, docker for mac k8s and also on k3s.

❯ kubebuilder version
Version: version.Version{KubeBuilderVersion:"1.0.8", KubernetesVendor:"1.13.1", GitCommit:"1adf50ed107f5042d7472ba5ab50d5e1d357169d", BuildDate:"2019-01-25T23:14:29Z", GoOs:"unknown", GoArch:"unknown"}

Thanks

ah, so for a start (sorry I missed this earlier), if you want to reconcile on an indirect dependency (e.g. foo --> job --> pod), you can't use reconcileowner, because there's no direct ownership path -- the job owns the pod. I've been meaning to add a helper for this case (since it's fairly common), but you'd have to trace the pod back to the job, and then trace the job to the foo. You can add an index to make that easier.

not sure I get how to do the trace thing... you mean when reconciling Foo, when I create the Job, search for pods created by the Job and call SetControllerReference on it?

If not, do you have an example of what you mean?

BTW a helper function would be awesome yeah 🚀

not sure I get how to do the trace thing... you mean when reconciling Foo, when I create the Job, search for pods created by the Job and call SetControllerReference on it?

shoulnd't be it as it errors with Object default/image-dont-exist-pnlqz is already owned by another Job controller image-dont-exist

from docs:

// Watch for Pod events, and enqueue a reconcile.Request for the ReplicaSet in the OwnerReferences
err := c.Watch(
    &source.Kind{Type: &corev1.Pod{}},
    &handler.EnqueueRequestForOwner{
        IsController: true,
        OwnerType:    &appsv1.ReplicaSet{}})
if err != nil {
    return err
}

this leads to the understanding that it should work, but in this case the replicaset will be the owner of the pods too right? If so, it also won't work, right? 🤔

yeah, that says "try to enqueue the replicaset owning the pod", but that only works if the replicaset is the thing that you're reconciling. The output of the handler always needs to be the thing that the controller is reconciling.

Basically, in your watch, you need to do EnqueueRequestFromReconcileFunc, and use mgr.GetClient and the GetControllerRef function to first fetch the job that owns the pod, and then fetch the Foo that owns the job, so the output is a Foo. Does that make sense?

EnqueueRequestFromReconcileFunc

I haven't found any reference of this on kubebuilder's code...

@DirectXMan12 ah, ok, so, here's what I tried: https://github.com/caarlos0/test-operator/pull/5

still seems like events won't come for status changes 🤔

The logs I got:

{"level":"info","ts":1559654949.062195,"logger":"controller","msg":"Creating Job","namespace":"default","name":"should-be-fine"}
{"level":"info","ts":1559654949.070393,"logger":"controller","msg":"Will try to update Foo status based on job and pod","namespace":"default","name":"should-be-fine"}
{"level":"info","ts":1559654949.070427,"logger":"controller","msg":"got foo","name":"should-be-fine"}
{"level":"info","ts":1559654949.070472,"logger":"controller","msg":"Will try to update Foo status based on job and pod","namespace":"default","name":"should-be-fine"}
{"level":"info","ts":1559654949.0725062,"logger":"controller","msg":"Creating Job","namespace":"default","name":"image-dont-exist"}
{"level":"info","ts":1559654949.07882,"logger":"controller","msg":"Will try to update Foo status based on job and pod","namespace":"default","name":"image-dont-exist"}
{"level":"info","ts":1559654949.0788832,"logger":"controller","msg":"got foo","name":"image-dont-exist"}
{"level":"info","ts":1559654949.078937,"logger":"controller","msg":"Will try to update Foo status based on job and pod","namespace":"default","name":"image-dont-exist"}
{"level":"info","ts":1559654949.1032488,"logger":"controller","msg":"got foo","name":"should-be-fine"}
{"level":"info","ts":1559654949.103328,"logger":"controller","msg":"watching pod","name":"image-dont-exist-r2vl4"}
{"level":"info","ts":1559654949.103331,"logger":"controller","msg":"Will try to update Foo status based on job and pod","namespace":"default","name":"should-be-fine"}
{"level":"info","ts":1559654949.1034071,"logger":"controller","msg":"Updating Job Status","namespace":"default","name":"should-be-fine","status":{"status":"Pending"}}
{"level":"info","ts":1559654949.1033401,"logger":"controller","msg":"watching pod","name":"should-be-fine-9zwv6"}
{"level":"info","ts":1559654949.103486,"logger":"controller","msg":"got foo","name":"should-be-fine"}
{"level":"info","ts":1559654949.103532,"logger":"controller","msg":"watching pod","name":"image-dont-exist-r2vl4"}
{"level":"info","ts":1559654949.103542,"logger":"controller","msg":"watching pod","name":"should-be-fine-9zwv6"}
{"level":"info","ts":1559654949.1122088,"logger":"controller","msg":"got foo","name":"image-dont-exist"}
{"level":"info","ts":1559654949.112279,"logger":"controller","msg":"Will try to update Foo status based on job and pod","namespace":"default","name":"should-be-fine"}
{"level":"info","ts":1559654949.1122859,"logger":"controller","msg":"watching pod","name":"image-dont-exist-r2vl4"}
{"level":"info","ts":1559654949.112318,"logger":"controller","msg":"watching pod","name":"should-be-fine-9zwv6"}
{"level":"info","ts":1559654949.112333,"logger":"controller","msg":"got foo","name":"image-dont-exist"}
{"level":"info","ts":1559654949.112351,"logger":"controller","msg":"Will try to update Foo status based on job and pod","namespace":"default","name":"image-dont-exist"}
{"level":"info","ts":1559654949.1123588,"logger":"controller","msg":"watching pod","name":"image-dont-exist-r2vl4"}
{"level":"info","ts":1559654949.1124082,"logger":"controller","msg":"Updating Job Status","namespace":"default","name":"image-dont-exist","status":{"status":"Pending"}}
{"level":"info","ts":1559654949.11242,"logger":"controller","msg":"watching pod","name":"should-be-fine-9zwv6"}
{"level":"info","ts":1559654949.117622,"logger":"controller","msg":"Will try to update Foo status based on job and pod","namespace":"default","name":"should-be-fine"}
{"level":"info","ts":1559654949.118119,"logger":"controller","msg":"Will try to update Foo status based on job and pod","namespace":"default","name":"image-dont-exist"}
{"level":"info","ts":1559654952.483846,"logger":"controller","msg":"got foo","name":"should-be-fine"}
{"level":"info","ts":1559654952.4839308,"logger":"controller","msg":"watching pod","name":"image-dont-exist-r2vl4"}
{"level":"info","ts":1559654952.483949,"logger":"controller","msg":"watching pod","name":"should-be-fine-9zwv6"}
{"level":"info","ts":1559654952.4839292,"logger":"controller","msg":"Will try to update Foo status based on job and pod","namespace":"default","name":"should-be-fine"}
{"level":"info","ts":1559654952.483975,"logger":"controller","msg":"got foo","name":"should-be-fine"}
{"level":"info","ts":1559654952.4840121,"logger":"controller","msg":"watching pod","name":"image-dont-exist-r2vl4"}
{"level":"info","ts":1559654952.48402,"logger":"controller","msg":"watching pod","name":"should-be-fine-9zwv6"}

I'm probably doing something wrong, just can't figure out what...

thanks for the help!

added a comment to your PR in the link. I think you have things slightly off. Remember, the result of the map func should be the name of the thing that you're reconciling (so ships in your case).

awesome, it works now!

Another question: is there a way to get all pods from a given job? Right now I use labels to get the pods and use them to set the status of my object, but that's not very precise...

you can add an index on the owner reference. That's the best way right now. We need to add a helper to make that easier, but for now there's an example of that here: https://book.kubebuilder.io/cronjob-tutorial/controller-implementation.html#setup, and then use it like this https://book.kubebuilder.io/cronjob-tutorial/controller-implementation.html#2-list-all-active-jobs-and-update-the-status.

Closing for now, happy to re-open if you have further questions.

Awesome, thank you so much for all the help and your work! 💯

@DirectXMan12 Hey!

How do I do this on the v2?

I'm looking into migrating, but it seems that the things changed quite a bit, and the Watch method does not exist anymore. There is a Watches though, are there any examples/docs of this? (I couldn't find any)

Thanks!


dug this old issue so hopefully context is easier, if not let me know and I'll open a new one.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

EnriqueL8 picture EnriqueL8  ·  5Comments

gerred picture gerred  ·  4Comments

narayanasamyr picture narayanasamyr  ·  4Comments

derailed picture derailed  ·  5Comments

champak picture champak  ·  4Comments