What steps did you take and what happened:
[capi-controller-manager-789f85c5fb-2trbn manager] I0611 20:37:07.646250 182 controller.go:138] controller-runtime/controller "msg"="Starting EventSource" "controller"="machinehealthcheck" "source"={}
What did you expect to happen:
Not see that
Anything else you would like to add:
I did a git bisect already and it was introduced by https://github.com/kubernetes-sigs/cluster-api/commit/f31f0cea32d03b158b381ffb330c5b61b87cd82d / #3129.
I'm not sure if there are any bugs here other the overwhelming amount of logs.
Environment:
kubectl version): Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-21T20:58:41Z", GoVersion:"go1.13.11", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-30T20:19:45Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
/etc/os-release): Ubuntu/kind bug
/area health
@benmoss: The label(s) area/ cannot be applied, because the repository doesn't have them
In response to this:
What steps did you take and what happened:
- Create a MachineHealthCheck for my cluster
- See that you get an endless stream of these log lines
[capi-controller-manager-789f85c5fb-2trbn manager] I0611 20:37:07.646250 182 controller.go:138] controller-runtime/controller "msg"="Starting EventSource" "controller"="machinehealthcheck" "source"={}What did you expect to happen:
Not see thatAnything else you would like to add:
I did agit bisectalready and it was introduced by https://github.com/kubernetes-sigs/cluster-api/commit/f31f0cea32d03b158b381ffb330c5b61b87cd82d / #3129Environment:
- Cluster-api version: master
- Minikube/KIND version: 0.8.1
- Kubernetes version: (use
kubectl version):Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-21T20:58:41Z", GoVersion:"go1.13.11", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-30T20:19:45Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
- OS (e.g. from
/etc/os-release): Ubuntu/kind bug
/area health
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
cc @JoelSpeed
I'll take a look at this next week
@benmoss I've just tried to reproduce this (though accidentally on a slightly older kind), having built the images from master and deployed to AWS, I've only seen the starting eventsource message 4 times for the machine health check controller, all 4 as valid cases. I'm gonna tear down my cluster and update kind to repro with identical versions, but I suspect this shouldn't make a difference.
Is there anything more you can tell me about the cluster that you built that might help narrow down the problem?
/priority awaiting-more-evidence
/milestone Next
Ya, let me try to reproduce this again and give some more details
Steps:
kind up (kind 0.8.1)clusterctl init --infrastructure aws to install aws providertilt up to build from master (71a6fd702bf0b3ebce0920539f39ec4d21f96588 just now)clusterctl config cluster ben --kubernetes-version v1.17.3 --control-plane-machine-count=1 | k apply -f- (clusterctl 0.3.6)k apply -f mhc.ymlkubetail -n capi-system or whatever, see the stream of logs, it stops when you delete the mhc.https://gist.github.com/benmoss/1210db23c361cd86b63842ad7e56bedb has the result of clusterctl config cluster and the mhc.yml
I think I found the problem, I believe that the watchExists isn't working because the eventhandler and kind are being created as _separate_ pointers for each cycle through the controller
I've updated the test to reflect this and it now fails, which I believe is what is causing this hot loop https://github.com/JoelSpeed/cluster-api/commit/26bbc140c18880e590797a02568be41dcf4a080f
We need to do change the matching so that equality can account for this somehow
@JoelSpeed are you able to put a PR up in this milestone? Otherwise, I can try to take care of this today given that this might be a milestone blocker
I'm working on a fix, I initially tried reflect.DeepEqual() over the watchInfo, but that passed the test but failed on the cluster. For some reason the eventHandler thats being passed through doesn't match on reflect.DeepEqual, so I need to work out why/come up with a way to compare those. Will likely have a PR up tomorrow
Thanks @JoelSpeed, reach out if you wanna chat over the solution on slack
/assign
/lifecycle active
/milestone v0.3.7