Tests flake on connecting to external services. We should minimize how much we depend on these, but we should also deflake DNS. First step was using FQDN for in-cluster services https://github.com/kubernetes/test-infra/pull/9547
TODO:
dnsConfigndots using dnsConfig https://github.com/kubernetes/test-infra/pull/9556/area prow
/area jobs
/kind bug
/priority important-soon
/assign
/assign @MrHohn
now that we are on 1.10 for prow we can start to leverage dnsConfig on some pods :-)
DNS changes in https://github.com/kubernetes/test-infra/pull/9556 look very promising so far -- no discernible network flakes. Will roll out to critical k/k jobs tomorrow.
We may need a better mechanism to set this on all jobs, perhaps extending presets ...? @cjwagner
@krzyzacy any ideas? Basically we will want to set dnsConfig on ~all agent: kubernetes job specs.
Kubernetes has no mechanism for this currently ... cluster level DNS config is only about name servers, which we do not need or want to change. We _could_ just add this to all ~800 jobs but ideally we'd have a preset or some other defaulting mechanism for this.
add to defaults, like https://github.com/kubernetes/test-infra/blob/master/prow/config/config.go#L1134-L1169?
@krzyzacy it's probably not preferable to hard code this for _all_ prow users.
Side note: there has been _one_ flake since.
or once we move to podutils, we can put this to our default decoration config
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
/remove-lifecycle stale
/area deflake
@BenTheElder what remains to be done here?
it's unclear if our current attempts made a significant difference or not. we probably can't do much more ourselves besides the dnsConfig which also unfortunately makes the jobs more verbose :(
NodeLocal DNS Cache might help with this as well :)
cc @prameshj
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
now that we are on 1.10 for prow we can start to leverage
dnsConfigon some pods :-)