Cluster-api: CAPI controller logging endless "Starting EventSource" messages

Created on 11 Jun 2020 · 13Comments · Source: kubernetes-sigs/cluster-api

What steps did you take and what happened:

Create a MachineHealthCheck for my cluster
See that you get an endless stream of these log lines

[capi-controller-manager-789f85c5fb-2trbn manager] I0611 20:37:07.646250     182 controller.go:138] controller-runtime/controller "msg"="Starting EventSource"  "controller"="machinehealthcheck" "source"={}

What did you expect to happen:
Not see that

Anything else you would like to add:
I did a git bisect already and it was introduced by https://github.com/kubernetes-sigs/cluster-api/commit/f31f0cea32d03b158b381ffb330c5b61b87cd82d / #3129.

I'm not sure if there are any bugs here other the overwhelming amount of logs.

Environment:

Cluster-api version: master
Minikube/KIND version: 0.8.1
Kubernetes version: (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-21T20:58:41Z", GoVersion:"go1.13.11", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-30T20:19:45Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

OS (e.g. from /etc/os-release): Ubuntu

/kind bug
/area health

arehealth kinbug lifecyclactive

Source

benmoss

All 13 comments

@benmoss: The label(s) area/ cannot be applied, because the repository doesn't have them

In response to this:

What steps did you take and what happened:

Create a MachineHealthCheck for my cluster

See that you get an endless stream of these log lines
[capi-controller-manager-789f85c5fb-2trbn manager] I0611 20:37:07.646250     182 controller.go:138] controller-runtime/controller "msg"="Starting EventSource"  "controller"="machinehealthcheck" "source"={}
What did you expect to happen:
Not see that

Anything else you would like to add:
I did a git bisect already and it was introduced by https://github.com/kubernetes-sigs/cluster-api/commit/f31f0cea32d03b158b381ffb330c5b61b87cd82d / #3129

Environment:

Cluster-api version: master

Minikube/KIND version: 0.8.1

Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-21T20:58:41Z", GoVersion:"go1.13.11", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-30T20:19:45Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
OS (e.g. from /etc/os-release): Ubuntu

/kind bug
/area health

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 11 Jun 2020

cc @JoelSpeed

benmoss on 11 Jun 2020

I'll take a look at this next week

JoelSpeed on 12 Jun 2020

👍1

@benmoss I've just tried to reproduce this (though accidentally on a slightly older kind), having built the images from master and deployed to AWS, I've only seen the starting eventsource message 4 times for the machine health check controller, all 4 as valid cases. I'm gonna tear down my cluster and update kind to repro with identical versions, but I suspect this shouldn't make a difference.

Is there anything more you can tell me about the cluster that you built that might help narrow down the problem?

JoelSpeed on 15 Jun 2020

/priority awaiting-more-evidence
/milestone Next

vincepri on 15 Jun 2020

Ya, let me try to reproduce this again and give some more details

benmoss on 15 Jun 2020

👍1

Steps:

kind up (kind 0.8.1)
clusterctl init --infrastructure aws to install aws provider
tilt up to build from master (71a6fd702bf0b3ebce0920539f39ec4d21f96588 just now)
clusterctl config cluster ben --kubernetes-version v1.17.3 --control-plane-machine-count=1 | k apply -f- (clusterctl 0.3.6)
wait for cluster to come up / stabilize and then k apply -f mhc.yml
kubetail -n capi-system or whatever, see the stream of logs, it stops when you delete the mhc.

https://gist.github.com/benmoss/1210db23c361cd86b63842ad7e56bedb has the result of clusterctl config cluster and the mhc.yml

benmoss on 15 Jun 2020

I think I found the problem, I believe that the watchExists isn't working because the eventhandler and kind are being created as _separate_ pointers for each cycle through the controller

I've updated the test to reflect this and it now fails, which I believe is what is causing this hot loop https://github.com/JoelSpeed/cluster-api/commit/26bbc140c18880e590797a02568be41dcf4a080f

We need to do change the matching so that equality can account for this somehow

JoelSpeed on 16 Jun 2020

👍1

@JoelSpeed are you able to put a PR up in this milestone? Otherwise, I can try to take care of this today given that this might be a milestone blocker

vincepri on 16 Jun 2020

I'm working on a fix, I initially tried reflect.DeepEqual() over the watchInfo, but that passed the test but failed on the cluster. For some reason the eventHandler thats being passed through doesn't match on reflect.DeepEqual, so I need to work out why/come up with a way to compare those. Will likely have a PR up tomorrow

JoelSpeed on 16 Jun 2020

👍1

Thanks @JoelSpeed, reach out if you wanna chat over the solution on slack

vincepri on 16 Jun 2020

/assign
/lifecycle active

JoelSpeed on 17 Jun 2020

/milestone v0.3.7

vincepri on 17 Jun 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Docs: evaluate if possible to remove install.md

fabriziopandini · 3Comments

Use MatchPolicy on conversion webhooks to intercept all convertible versions

vincepri · 5Comments

clusterctl: filter out cert-manager-leaderelection ClusterRoleBinding

ncdc · 5Comments

`clusterctl delete --all --include-namespace` skips capi-webhook-system namespace

mboersma · 5Comments

Using clusterctl generate yaml for Windows NamedPipes causes backslashes to be stripped

jsturtevant · 5Comments