Kubespray: Kube-router unable to connect API server on start because nodelocaldns can't access Service IP provided by kube-router

Created on 21 May 2020 · 6Comments · Source: kubernetes-sigs/kubespray

After deploying a new cluster we have experienced a strange problem — kube-router pod on some node stucks in CrashLoopBackOff.
The log file of kube-router says timeout connecting to API server:

E0521 09:25:04.217633    1733 reflector.go:205] github.com/cloudnativelabs/kube-router/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Node: Get https://localhost:6443/api/v1/nodes?resourceVersion=0: dial tcp: i/o timeout

Checking strace of the kube-router reveals that it tries to resolve localhost by querying nodelocaldns on it's IP address and gets a timeout.
In logs of nodelocaldns it tries to access DNS service IP's provided by kube-router.

The problem solves if 127.0.0.1 is specified instead of localhost in kube-router kubeconfig, in inventory like this:

kube_apiserver_endpoint: https://127.0.0.1:6443

Environment:

Cloud provider or hardware configuration:
Bare metal k8s cluster

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

Linux 4.18.0-147.8.1.el8_1.x86_64 x86_64
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

Version of Ansible (ansible --version):
Version of Python (python --version):

Kubespray version (commit) (git rev-parse --short HEAD):
2.13.0
01dbc909be34c9c8b34cb9d5e88a4f0e74affcbc

Network plugin used:
kube-router

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

Command used to invoke ansible:

Output of ansible run:

Anything else do we need to know:

kinbug lifecyclrotten

Source

rearden-steel

Most helpful comment

This is because kube-router uses alpine as base image where /etc/nsswitch.conf is not included, as a result, the localhost cannot be resolved from /etc/hosts. I submitted https://github.com/cloudnativelabs/kube-router/pull/957 to work around this issue.

qingkunl on 27 Jul 2020

👍3

All 6 comments

I ran into this issue as well with version 2.13.0 with Ubuntu 18.04 nodes

mikesmitty on 26 May 2020

👍1

I got the same issue as well

qingkunl on 27 Jul 2020

👍3

This is because kube-router uses alpine as base image where /etc/nsswitch.conf is not included, as a result, the localhost cannot be resolved from /etc/hosts. I submitted cloudnativelabs/kube-router#957 to work around this issue.

My kube-router PR has been merged and released in v1.0.1, and https://github.com/kubernetes-sigs/kubespray/pull/6479 has updated kube-router to v1.0.1 in Kubespray. So this issue should have been fixed.

qingkunl on 6 Aug 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 4 Nov 2020

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 4 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings