/kind bug
What steps did you take and what happened:
What did you expect to happen:
Anything else you would like to add:
I hit this return statement: https://github.com/kubernetes-sigs/cluster-api/blob/b37cf11c05acebee2ffd0c8a845c2bc3e1c1f7a0/pkg/controller/noderef/noderef_controller.go#L169
which returns a nil error. But it crashed at https://github.com/kubernetes-sigs/cluster-api/blob/b37cf11c05acebee2ffd0c8a845c2bc3e1c1f7a0/pkg/controller/noderef/noderef_controller.go#L143
because it didn't check to see if the reconcile needed to be requeued, and machine.status.nodeRef was nil.
Environment:
Once fixed, we'll need to backport to release-0.1.
Fixing the panic seems to be as easy as not swallowing the error when returning: https://github.com/kubernetes-sigs/cluster-api/blob/b37cf11c05acebee2ffd0c8a845c2bc3e1c1f7a0/pkg/controller/noderef/noderef_controller.go#L169
Now, I'm not sure this is enough because the code (or its comments) is not clear whether Reconcile is expected to be called again. ~Assuming it should be, we also need to set Requeue to true as it defaults to false.~ edited sigs.k8s.io/controller-runtime/pkg/reconcile.Result defines that setting RequeueAfter implicitly sets Requeue to true.
The end result would be:
return reconcile.Result{RequeueAfter: 10 * time.Second}, err
If this makes sense to you, I think the next step would be to figure out tests that prove the fix.
@pires That seems like a good approach, we can return both
I'll be opening a PR to address this today.
I just hit this and have a patch. I do not se lifecycle active so I will take it.
/assign
/lifecycle active
@chuckha check with @pires ?
ah ok, actually this is a good chance for someone else to get a contribution
/remove-lifecycle active
/unassign
Sorry for the noise @pires, you should for sure work on this