Cluster-api: Flakes in the ClusterResourceSet unit tests

Created on 14 Jan 2021  Â·  9Comments  Â·  Source: kubernetes-sigs/cluster-api

What steps did you take and what happened:
While investigating some unit test failures I some this error happen sometimes:


------------------------------
• Failure [10.167 seconds]
ClusterResourceSet Reconciler
/home/prow/go/src/sigs.k8s.io/cluster-api/exp/addons/controllers/clusterresourceset_controller_test.go:41
  Should reconcile a ClusterResourceSet when a resource is created that is part of ClusterResourceSet resources [It]
  /home/prow/go/src/sigs.k8s.io/cluster-api/exp/addons/controllers/clusterresourceset_controller_test.go:240
  Timed out after 10.001s.

  Expected
      <bool>: false
  to be true
  /home/prow/go/src/sigs.k8s.io/cluster-api/exp/addons/controllers/clusterresourceset_controller_test.go:308
------------------------------
....
Summarizing 1 Failure:
[Fail] ClusterResourceSet Reconciler [It] Should reconcile a ClusterResourceSet when a resource is created that is part of ClusterResourceSet resources 
/home/prow/go/src/sigs.k8s.io/cluster-api/exp/addons/controllers/clusterresourceset_controller_test.go:308
Ran 5 of 5 Specs in 29.157 seconds
FAIL! -- 4 Passed | 1 Failed | 0 Pending | 0 Skipped
--- FAIL: TestAPIs (29.16s)
FAIL
FAIL    sigs.k8s.io/cluster-api/exp/addons/controllers  29.280s

Environment:

  • Cluster-api version: Main

/kind bug
/area testing

aretesting kinbug lifecyclactive

All 9 comments

/milestone v0.4.0

After some investigation, I saw this error happen rarely, apparently when test env gets stuck for a few seconds.

I think it can be easily fixed by increasing the timeout to 20 seconds, so we account for a temporary glitch of the testenv.

https://github.com/kubernetes-sigs/cluster-api/blob/e73e3d9021e5df61ea2861be6f73698a031a63c3/exp/addons/controllers/clusterresourceset_controller_test.go#L37

/help
/good-first-issue

@fabriziopandini:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

After some investigation, I saw this error happen rarely, apparently when test env gets stuck for a few seconds.

I think it can be easily fixed by increasing the timeout to 20 seconds, so we account for a temporary glitch of the testenv.

https://github.com/kubernetes-sigs/cluster-api/blob/e73e3d9021e5df61ea2861be6f73698a031a63c3/exp/addons/controllers/clusterresourceset_controller_test.go#L37

/help
/good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fabriziopandini I can send a quick patch for it.

/assign

FYI - I am still seeing this failure after the bump to 20 second in https://github.com/kubernetes-sigs/cluster-api/pull/4069. The test passes when I run the ./scripts/ci-test.sh but I was able to reproduce occasionally when running via my IDE. Looking into it a bit more now

/reopen
according to @jsturtevant; this requires further investigation 😞

@fabriziopandini: Reopened this issue.

In response to this:

/reopen
according to @jsturtevant; this requires further investigation 😞

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

I am now able to get a fairly consistent reproduction and found that increasing the timeout didn't help. Still digging into it.

  Should reconcile a ClusterResourceSet when a resource is created that is part of ClusterResourceSet resources [It]
  /home/jstur/projects/cluster-api/exp/addons/controllers/clusterresourceset_controller_test.go:240

  Timed out after 400.000s.

/assign
/lifecycle active

Was this page helpful?
0 / 5 - 0 ratings