Kubespray: set up api-server not work

Created on 3 Jun 2017 · 15Comments · Source: kubernetes-sigs/kubespray

BUG REPORT
i change the kube_apiserver_insecure_port as 8443,but i seem not work, it still use default 8080

kube_apiserver_insecure_port: 8443 # (http)

fatal: [node2]: FAILED! => {"attempts": 20, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] 拒绝连接>", "redirected": false, "status": -1, "url": "http://localhost:8080/healthz"}

i also set the system_namespace as kube-system, but i seem not work too.

[root@node1 kargo]# cat /etc/kubernetes/kube-system-ns.yml 
apiVersion: v1
kind: Namespace
metadata:
  name: "{{system_namespace}}"

does anything i do wrongly？
thx in advance~

Environment:

Cloud provider or hardware configuration:

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):

[root@node1 ~]# cat /etc/os-release  
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

[root@node1 ~]# uname 
Linux
[root@node1 ~]#

Version of Ansible (ansible --version):

[root@node1 ~]# ansible --version
ansible 2.3.0.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = Default w/o overrides
  python version = 2.7.5 (default, Nov  6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)]

Kargo version (commit) (git rev-parse --short HEAD):

[root@node1 kargo]# git rev-parse --short HEAD
acae0fe

Network plugin used:
calico

Copy of your inventory file:

[root@node1 kargo]# cat inventory/inventory.cfg 
[all]
node1    ansible_host=192.168.0.69 ip=192.168.0.69
node2    ansible_host=192.168.0.191 ip=192.168.0.191

[kube-master]
node1    
node2    

[kube-node]
node1    
node2    

[etcd]
node1    
node2    

[k8s-cluster:children]
kube-node        
kube-master      

[calico-rr]

Command used to invoke ansible:

nohup ansible-playbook -i inventory/inventory.cfg cluster.yml -b -v --private-key=~/.ssh/id_rsa > `pwd`/../install.log 2>&1 &

Output of ansible run:

fatal: [node2]: FAILED! => {"attempts": 20, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] 拒绝连接>", "redirected": false, "status": -1, "url": "http://localhost:8080/healthz"}

Anything else do we need to know:
here is my full config k8s-cluster.yml

# Valid bootstrap options (required): ubuntu, coreos, centos, none
bootstrap_os: centos

#Directory where etcd data stored
etcd_data_dir: /var/lib/etcd

# Directory where the binaries will be installed
bin_dir: /usr/local/bin

# Kubernetes configuration dirs and system namespace.
# Those are where all the additional config stuff goes
# the kubernetes normally puts in /srv/kubernets.
# This puts them in a sane location and namespace.
# Editting those values will almost surely break something.
kube_config_dir: /etc/kubernetes
kube_script_dir: "{{ bin_dir }}/kubernetes-scripts"
kube_manifest_dir: "{{ kube_config_dir }}/manifests"
system_namespace: kube-system

# Logging directory (sysvinit systems)
kube_log_dir: "/var/log/kubernetes"

# This is where all the cert scripts and certs will be located
kube_cert_dir: "{{ kube_config_dir }}/ssl"

# This is where all of the bearer tokens will be stored
kube_token_dir: "{{ kube_config_dir }}/tokens"

# This is where to save basic auth file
kube_users_dir: "{{ kube_config_dir }}/users"

kube_api_anonymous_auth: false

## Change this to use another Kubernetes version, e.g. a current beta release
kube_version: v1.6.1

# Where the binaries will be downloaded.
# Note: ensure that you've enough disk space (about 1G)
local_release_dir: "/tmp/releases"
# Random shifts for retrying failed ops like pushing/downloading
retry_stagger: 5

# This is the group that the cert creation scripts chgrp the
# cert files to. Not really changable...
kube_cert_group: kube-cert

# Cluster Loglevel configuration
kube_log_level: 2

# Users to create for basic auth in Kubernetes API via HTTP
kube_api_pwd: "abcdefg"
kube_users:
  kube:
    pass: "{{kube_api_pwd}}"
    role: admin
  root:
    pass: "{{kube_api_pwd}}"
    role: admin



## It is possible to activate / deactivate selected authentication methods (basic auth, static token auth)
#kube_oidc_auth: false
#kube_basic_auth: false
#kube_token_auth: false


## Variables for OpenID Connect Configuration https://kubernetes.io/docs/admin/authentication/
## To use OpenID you have to deploy additional an OpenID Provider (e.g Dex, Keycloak, ...)

# kube_oidc_url: https:// ...
# kube_oidc_client_id: kubernetes
## Optional settings for OIDC
# kube_oidc_ca_file: {{ kube_cert_dir }}/ca.pem
# kube_oidc_username_claim: sub
# kube_oidc_groups_claim: groups


# Choose network plugin (calico, weave or flannel)
# Can also be set to 'cloud', which lets the cloud provider setup appropriate routing
kube_network_plugin: calico

# Enable kubernetes network policies
enable_network_policy: true

# Kubernetes internal network for services, unused block of space.
kube_service_addresses: 10.233.0.0/18

# internal network. When used, it will assign IP
# addresses from this range to individual pods.
# This network must be unused in your network infrastructure!
kube_pods_subnet: 10.233.64.0/18

# internal network node size allocation (optional). This is the size allocated
# to each node on your network.  With these defaults you should have
# room for 4096 nodes with 254 pods per node.
kube_network_node_prefix: 24

# The port the API Server will be listening on.
kube_apiserver_ip: "{{ kube_service_addresses|ipaddr('net')|ipaddr(1)|ipaddr('address') }}"
kube_apiserver_port: 6443 # (https)
kube_apiserver_insecure_port: 8443 # (http)

# DNS configuration.
# Kubernetes cluster name, also will be used as DNS domain
cluster_name: cluster.local
# Subdomains of DNS domain to be resolved via /etc/resolv.conf for hostnet pods
ndots: 2
# Can be dnsmasq_kubedns, kubedns or none
dns_mode: dnsmasq_kubedns
# Can be docker_dns, host_resolvconf or none
resolvconf_mode: docker_dns
# Deploy netchecker app to verify DNS resolve as an HTTP service
deploy_netchecker: true
# Ip address of the kubernetes skydns service
skydns_server: "{{ kube_service_addresses|ipaddr('net')|ipaddr(3)|ipaddr('address') }}"
dns_server: "{{ kube_service_addresses|ipaddr('net')|ipaddr(2)|ipaddr('address') }}"
dns_domain: "{{ cluster_name }}"

# Path used to store Docker data
docker_daemon_graph: "/opt/docker"

## A string of extra options to pass to the docker daemon.
## This string should be exactly as you wish it to appear.
## An obvious use case is allowing insecure-registry access
## to self hosted registries like so:
docker_options: "--insecure-registry={{ kube_service_addresses }} --graph={{ docker_daemon_graph }}"
docker_bin_dir: "/usr/bin"

# Settings for containerized control plane (etcd/kubelet/secrets)
etcd_deployment_type: docker
kubelet_deployment_type: docker
cert_management: script
vault_deployment_type: docker

# K8s image pull policy (imagePullPolicy)
k8s_image_pull_policy: IfNotPresent

# Monitoring apps for k8s
efk_enabled: true

# Helm deployment
helm_enabled: false

Source

kaybinwong

Most helpful comment

I had a running cluster and was trying to reprovision and got hit by this issue.

This is how the logs look like

TASK [kubernetes-apps/cluster_roles : Kubernetes Apps | Wait for kube-apiserver] ***************************************************************************************************
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (10 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (9 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (8 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (7 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (6 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (5 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (4 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (3 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (2 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (1 retries left).
fatal: [k8s-master-1]: FAILED! => {"attempts": 10, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://127.0.0.1:8080/healthz"}

Do we know of a fix ?

Miyurz on 22 Nov 2017

👍4

All 15 comments

I found the problem because kubernetes still use the 8080 port for health check on the api-server, and the configuration file is here

https://github.com/kubernetes-incubator/kargo/blob/master/roles/kubernetes/master/handlers/main.yml#L42

- name: Master | wait for the apiserver to be running
  uri:
    url: http://localhost:8080/healthz
  register: result
  until: result.status == 200
  retries: 20
  delay: 6

I tried to change it to the correct port, just like this

- name: Master | wait for the apiserver to be running
  uri:
    url: http://localhost:{{ kube_apiserver_insecure_port }}/healthz
  register: result
  until: result.status == 200
  retries: 20
  delay: 6

But the next error message makes me doubt my life

LED - RETRYING: Create kube system namespace (4 retries left).
FAILED - RETRYING: Create kube system namespace (3 retries left).
FAILED - RETRYING: Create kube system namespace (2 retries left).
FAILED - RETRYING: Create kube system namespace (1 retries left)
fatal: [node1]: FAILED! => {"attempts": 4, "changed": false, "cmd": ["/usr/local/bin/kubectl", "create", "-f", "/etc/kubernetes/kube-system-ns.yml"], "delta": "
0:00:00.140787", "end": "2017-06-05 22:57:08.321912", "failed": true, "rc": 1, "start": "2017-06-05 22:57:08.181125", "stderr": "The connection to the server lo
calhost:8080 was refused - did you specify the right host or port?", "stderr_lines": ["The connection to the server localhost:8080 was refused - did you specify
 the right host or port?"], "stdout": "", "stdout_lines": []}

I found the reason is kargo try to use kubectl to create namespace, etc., but this time kubectl configuration has not been written, so kubectl still choose to connect to the default 8080 port

https://github.com/kubernetes-incubator/kargo/blob/19bb97d24d9f9cd287482944c365c7ab51b5ef2a/roles/kubernetes/master/tasks/main.yml#L53

- name: Create kube system namespace
  command: "{{ bin_dir }}/kubectl create -f {{kube_config_dir}}/{{system_namespace}}-ns.yml"
  retries: 4
  delay: "{{ retry_stagger | random + 3 }}"
  register: create_system_ns
  until: create_system_ns.rc == 0
  changed_when: False
  when: kubesystem|failed and inventory_hostname == groups['kube-master'][0]
  tags: apps
......

I am sorry and I am not familiar with ansible, so I can't complete the modification

mritd on 5 Jun 2017

The static port 8080 is set in 3 different files, so when you change the kube_apiserver_insecure_port variable, you should also change it in those 3 files.
I had the same error a while ago, I've created a pull request with the changes to make the playbooks use the port defined in the kube_apiserver_insecure_port variable.
Check out PR #1332 to try it out.

gstorme on 7 Jun 2017

@gstorme thx, it works, but new error comes up. it seems the cluster not started yet.

TASK [kubernetes/master : Create kube system namespace] *********************************************************************************************************************************************************************************
task path: /data/k8s/kargo/roles/kubernetes/master/tasks/main.yml:53
Thursday 08 June 2017  10:03:06 +0800 (0:00:01.548)       0:08:45.960 ********* 
FAILED - RETRYING: Create kube system namespace (4 retries left).
FAILED - RETRYING: Create kube system namespace (3 retries left).
FAILED - RETRYING: Create kube system namespace (2 retries left).
FAILED - RETRYING: Create kube system namespace (1 retries left).
fatal: [node1]: FAILED! => {"attempts": 4, "changed": false, "cmd": ["/usr/local/bin/kubectl", "create", "-f", "/etc/kubernetes/kube-system-ns.yml"], "delta": "0:00:00.236391", "end": "2017-06-08 10:03:26.157691", "failed": true, "rc": 1, "start": "2017-06-08 10:03:25.921300", "stderr": "error: error validating \"/etc/kubernetes/kube-system-ns.yml\": error validating data: unexpected end of JSON input; if you choose to ignore these errors, turn validation off with --validate=false", "stderr_lines": ["error: error validating \"/etc/kubernetes/kube-system-ns.yml\": error validating data: unexpected end of JSON input; if you choose to ignore these errors, turn validation off with --validate=false"], "stdout": "", "stdout_lines": []}
        to retry, use: --limit @/data/k8s/kargo/cluster.retry

here is the kube-system-ns.yml

[root@node1 ~]# cat /etc/kubernetes/kube-system-ns.yml 
apiVersion: v1
kind: Namespace
metadata:
  name: "kube-system"

kaybinwong on 8 Jun 2017

@kaybinwong Because kubectl is not configured, so still choose to use 8080 port access apiserver

mritd on 8 Jun 2017

@mritd kubectl not configured yet？any way to fix this?

kaybinwong on 8 Jun 2017

@kaybinwong If kargo write kubectl configuration and then create namespace this problem will be resolved, but I still change this order.

mritd on 8 Jun 2017

@mritd i had read the source, but i still can not find it. in which section or module i can find the related-sources？

kaybinwong on 8 Jun 2017

@kaybinwong I also did not find kubectl to create the configuration file source, but from the failure of the state can be seen kubectl configuration file has not been created.

mritd on 12 Jun 2017

Some fix for this issue ?

smileisak on 7 Sep 2017

I am getting the below

FAILED - RETRYING: Master | wait for the apiserver to be running (1 retries left).
fatal: [ache1]: FAILED! => {"attempts": 20, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://127.0.0.1:8080/healthz"}

Looks like the http://127.0.0.1:8080/healthz is not accessible but I can curl on http://0.0.0.0:8080/healthz

Any configuration change needed?

RSamal on 12 Oct 2017

I had a running cluster and was trying to reprovision and got hit by this issue.

This is how the logs look like

TASK [kubernetes-apps/cluster_roles : Kubernetes Apps | Wait for kube-apiserver] ***************************************************************************************************
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (10 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (9 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (8 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (7 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (6 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (5 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (4 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (3 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (2 retries left).
FAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (1 retries left).
fatal: [k8s-master-1]: FAILED! => {"attempts": 10, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "http://127.0.0.1:8080/healthz"}

Do we know of a fix ?

Miyurz on 22 Nov 2017

👍4

it was a swap problem ,i solved it

swapoff -a

vim roles/download/tasks/download_container.yml
 75 - name: Stop if swap enabled
 76   assert:
 77     that: ansible_swaptotal_mb == 0         
 78   when: kubelet_fail_swap_on|default(false)

PLAY RECAP **********************************************************************************************************************************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0   
node1                      : ok=274  changed=22   unreachable=0    failed=0   
node2                      : ok=257  changed=18   unreachable=0    failed=0   
node3                      : ok=257  changed=18   unreachable=0    failed=0   
node4                      : ok=234  changed=10   unreachable=0    failed=0   
node5                      : ok=207  changed=5    unreachable=0    failed=0   

Wednesday 27 December 2017  23:07:54 +0800 (0:00:00.039)       0:10:24.688 **** 
===============================================================================

lannyMa on 27 Dec 2017

reopen a new issue if you still experience an issue

ant31 on 16 Aug 2018

Hello,
I am beginner in k8s and i am deploying a kargo solution with 4 nodes (2 master and 2 nodes).

Unfortunately I as still have the aforementioned problem

TASK [kubernetes-apps/cluster_roles : Kubernetes Apps | Wait for kube-apiserver] *****
Monday 28 January 2019 14:44:49 +0100 (0:00:00.056) 0:20:46.903 **
[1;30mFAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (10 retries left).[0m
[1;30mFAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (9 retries left).[0m
[1;30mFAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (8 retries left).[0m
[1;30mFAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (7 retries left).[0m
[1;30mFAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (6 retries left).[0m
[1;30mFAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (5 retries left).[0m
[1;30mFAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (4 retries left).[0m
[1;30mFAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (3 retries left).[0m
[1;30mFAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (2 retries left).[0m
[1;30mFAILED - RETRYING: Kubernetes Apps | Wait for kube-apiserver (1 retries left).[0m

[0;31mfatal: [master1]: FAILED! => {"attempts": 10, "changed": false, "content": "", "msg": "Status code was -1 and not [200]: Request failed: ", "redirected": false, "status": -1, "url": "https://127.0.0.1:6443/healthz"}[0m

Have any suggestions please, i am blocked on this problem for days.

AdelBouridah on 31 Jan 2019

have you please fix the aforementioned problem. I think the 8080 signaled prolem was resolved and committed. hence, way i still get the issue with 6443 port for apiserver. In my configuration the inscure port is disabled.

AdelBouridah on 31 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings