Cluster-api: Panic in NodeRef controller

Created on 2 Jul 2019  路  7Comments  路  Source: kubernetes-sigs/cluster-api

/kind bug

What steps did you take and what happened:

  • Create a cluster using CAPI + CAPA in AWS
  • Add a machine
  • NodeRef controller panicked

What did you expect to happen:

  • No panic

Anything else you would like to add:
I hit this return statement: https://github.com/kubernetes-sigs/cluster-api/blob/b37cf11c05acebee2ffd0c8a845c2bc3e1c1f7a0/pkg/controller/noderef/noderef_controller.go#L169

which returns a nil error. But it crashed at https://github.com/kubernetes-sigs/cluster-api/blob/b37cf11c05acebee2ffd0c8a845c2bc3e1c1f7a0/pkg/controller/noderef/noderef_controller.go#L143

because it didn't check to see if the reconcile needed to be requeued, and machine.status.nodeRef was nil.

Environment:

  • Cluster-api version: master + v0.1.4
good first issue kinbug prioritimportant-soon

All 7 comments

Once fixed, we'll need to backport to release-0.1.

Fixing the panic seems to be as easy as not swallowing the error when returning: https://github.com/kubernetes-sigs/cluster-api/blob/b37cf11c05acebee2ffd0c8a845c2bc3e1c1f7a0/pkg/controller/noderef/noderef_controller.go#L169

Now, I'm not sure this is enough because the code (or its comments) is not clear whether Reconcile is expected to be called again. ~Assuming it should be, we also need to set Requeue to true as it defaults to false.~ edited sigs.k8s.io/controller-runtime/pkg/reconcile.Result defines that setting RequeueAfter implicitly sets Requeue to true.

The end result would be:

return reconcile.Result{RequeueAfter: 10 * time.Second}, err

If this makes sense to you, I think the next step would be to figure out tests that prove the fix.

@pires That seems like a good approach, we can return both

I'll be opening a PR to address this today.

I just hit this and have a patch. I do not se lifecycle active so I will take it.

/assign
/lifecycle active

@chuckha check with @pires ?

ah ok, actually this is a good chance for someone else to get a contribution

/remove-lifecycle active
/unassign

Sorry for the noise @pires, you should for sure work on this

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chuckha picture chuckha  路  4Comments

dlipovetsky picture dlipovetsky  路  5Comments

chaosaffe picture chaosaffe  路  6Comments

invidian picture invidian  路  5Comments

vincepri picture vincepri  路  4Comments