Test-infra: end DNS flakes

Created on 25 Sep 2018  路  14Comments  路  Source: kubernetes/test-infra

Tests flake on connecting to external services. We should minimize how much we depend on these, but we should also deflake DNS. First step was using FQDN for in-cluster services https://github.com/kubernetes/test-infra/pull/9547

TODO:

/area prow
/area jobs
/kind bug
/priority important-soon

/assign
/assign @MrHohn

aredeflake arejobs areprow kinbug lifecyclrotten prioritimportant-soon

Most helpful comment

now that we are on 1.10 for prow we can start to leverage dnsConfig on some pods :-)

All 14 comments

now that we are on 1.10 for prow we can start to leverage dnsConfig on some pods :-)

DNS changes in https://github.com/kubernetes/test-infra/pull/9556 look very promising so far -- no discernible network flakes. Will roll out to critical k/k jobs tomorrow.

We may need a better mechanism to set this on all jobs, perhaps extending presets ...? @cjwagner
@krzyzacy any ideas? Basically we will want to set dnsConfig on ~all agent: kubernetes job specs.

Kubernetes has no mechanism for this currently ... cluster level DNS config is only about name servers, which we do not need or want to change. We _could_ just add this to all ~800 jobs but ideally we'd have a preset or some other defaulting mechanism for this.

@krzyzacy it's probably not preferable to hard code this for _all_ prow users.

Side note: there has been _one_ flake since.

or once we move to podutils, we can put this to our default decoration config

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale
/area deflake
@BenTheElder what remains to be done here?

it's unclear if our current attempts made a significant difference or not. we probably can't do much more ourselves besides the dnsConfig which also unfortunately makes the jobs more verbose :(

NodeLocal DNS Cache might help with this as well :)
cc @prameshj

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

stevekuznetsov picture stevekuznetsov  路  4Comments

xiangpengzhao picture xiangpengzhao  路  3Comments

Aisuko picture Aisuko  路  3Comments

sjenning picture sjenning  路  4Comments

BenTheElder picture BenTheElder  路  4Comments