Cluster-api: CAPI controller logging endless "Starting EventSource" messages

Created on 11 Jun 2020  路  13Comments  路  Source: kubernetes-sigs/cluster-api

What steps did you take and what happened:

  • Create a MachineHealthCheck for my cluster
  • See that you get an endless stream of these log lines
[capi-controller-manager-789f85c5fb-2trbn manager] I0611 20:37:07.646250     182 controller.go:138] controller-runtime/controller "msg"="Starting EventSource"  "controller"="machinehealthcheck" "source"={}

What did you expect to happen:
Not see that

Anything else you would like to add:
I did a git bisect already and it was introduced by https://github.com/kubernetes-sigs/cluster-api/commit/f31f0cea32d03b158b381ffb330c5b61b87cd82d / #3129.

I'm not sure if there are any bugs here other the overwhelming amount of logs.

Environment:

  • Cluster-api version: master
  • Minikube/KIND version: 0.8.1
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-21T20:58:41Z", GoVersion:"go1.13.11", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-30T20:19:45Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • OS (e.g. from /etc/os-release): Ubuntu

/kind bug
/area health

arehealth kinbug lifecyclactive

All 13 comments

@benmoss: The label(s) area/ cannot be applied, because the repository doesn't have them

In response to this:

What steps did you take and what happened:

  • Create a MachineHealthCheck for my cluster
  • See that you get an endless stream of these log lines
[capi-controller-manager-789f85c5fb-2trbn manager] I0611 20:37:07.646250     182 controller.go:138] controller-runtime/controller "msg"="Starting EventSource"  "controller"="machinehealthcheck" "source"={}

What did you expect to happen:
Not see that

Anything else you would like to add:
I did a git bisect already and it was introduced by https://github.com/kubernetes-sigs/cluster-api/commit/f31f0cea32d03b158b381ffb330c5b61b87cd82d / #3129

Environment:

  • Cluster-api version: master
  • Minikube/KIND version: 0.8.1
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-21T20:58:41Z", GoVersion:"go1.13.11", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-30T20:19:45Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • OS (e.g. from /etc/os-release): Ubuntu

/kind bug
/area health

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cc @JoelSpeed

I'll take a look at this next week

@benmoss I've just tried to reproduce this (though accidentally on a slightly older kind), having built the images from master and deployed to AWS, I've only seen the starting eventsource message 4 times for the machine health check controller, all 4 as valid cases. I'm gonna tear down my cluster and update kind to repro with identical versions, but I suspect this shouldn't make a difference.

Is there anything more you can tell me about the cluster that you built that might help narrow down the problem?

/priority awaiting-more-evidence
/milestone Next

Ya, let me try to reproduce this again and give some more details

Steps:

  • kind up (kind 0.8.1)
  • clusterctl init --infrastructure aws to install aws provider
  • tilt up to build from master (71a6fd702bf0b3ebce0920539f39ec4d21f96588 just now)
  • clusterctl config cluster ben --kubernetes-version v1.17.3 --control-plane-machine-count=1 | k apply -f- (clusterctl 0.3.6)
  • wait for cluster to come up / stabilize and then k apply -f mhc.yml
  • kubetail -n capi-system or whatever, see the stream of logs, it stops when you delete the mhc.

https://gist.github.com/benmoss/1210db23c361cd86b63842ad7e56bedb has the result of clusterctl config cluster and the mhc.yml

I think I found the problem, I believe that the watchExists isn't working because the eventhandler and kind are being created as _separate_ pointers for each cycle through the controller

I've updated the test to reflect this and it now fails, which I believe is what is causing this hot loop https://github.com/JoelSpeed/cluster-api/commit/26bbc140c18880e590797a02568be41dcf4a080f

We need to do change the matching so that equality can account for this somehow

@JoelSpeed are you able to put a PR up in this milestone? Otherwise, I can try to take care of this today given that this might be a milestone blocker

I'm working on a fix, I initially tried reflect.DeepEqual() over the watchInfo, but that passed the test but failed on the cluster. For some reason the eventHandler thats being passed through doesn't match on reflect.DeepEqual, so I need to work out why/come up with a way to compare those. Will likely have a PR up tomorrow

Thanks @JoelSpeed, reach out if you wanna chat over the solution on slack

/assign
/lifecycle active

/milestone v0.3.7

Was this page helpful?
0 / 5 - 0 ratings