Environment:
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 4.15.0-88-generic x86_64
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Version of Ansible (ansible --version):
ansible 2.9.6
Version of Python (python --version):
Python 3.6.9
Kubespray version (commit) (git rev-parse --short HEAD):
8f3d8206
Network plugin used:
Calico
Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):
Command used to invoke ansible:ansible-playbook -i inventory/demo2node/hosts.yml -vvvv -b -u ubuntu --private-key=~/.ssh/k8s_rsa cluster.yml --flush-cache
Output of ansible run:
TASK [kubernetes-apps/ansible : Kubernetes Apps | Wait for kube-apiserver] ********************
task path: /home/jtaylor/work/ansible/kubespray/roles/kubernetes-apps/ansible/tasks/main.yml:2
fatal: [node1]: FAILED! => {
"attempts": 20,
"changed": false,
"content": "",
"elapsed": 0,
"invocation": {
"module_args": {
"attributes": null,
"backup": null,
"body": null,
"body_format": "raw",
"client_cert": "/etc/kubernetes/ssl/ca.crt",
"client_key": "/etc/kubernetes/ssl/ca.key",
"content": null,
"creates": null,
"delimiter": null,
"dest": null,
"directory_mode": null,
"follow": false,
"follow_redirects": "safe",
"force": false,
"force_basic_auth": false,
"group": null,
"headers": {},
"http_agent": "ansible-httpget",
"method": "GET",
"mode": null,
"owner": null,
"regexp": null,
"remote_src": null,
"removes": null,
"return_content": false,
"selevel": null,
"serole": null,
"setype": null,
"seuser": null,
"src": null,
"status_code": [
200
],
"timeout": 30,
"unix_socket": null,
"unsafe_writes": null,
"url": "https://127.0.0.1:6443/healthz",
"url_password": null,
"url_username": null,
"use_proxy": true,
"validate_certs": false
}
},
"msg": "Status code was -1 and not [200]: Request failed: <urlopen error Tunnel connection failed: 403 Access violation>",
"redirected": false,
"status": -1,
"url": "https://127.0.0.1:6443/healthz"
}
Anything else do we need to know:
healthz is fine via curl on the node:
curl --cacert /etc/kubernetes/ssl/ca.crt https://127.0.0.1:6443/healthz
ok
From command being run, can see that proxy is set but no_proxy is not:
<10.74.23.96> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=30m -o ConnectionAttempts=100 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o 'IdentityFile="/home/jtaylor/.ssh/k8s_rsa"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="ubuntu"' -o ConnectTimeout=10 -o ControlPath=/home/jtaylor/.ansible/cp/83ecd4c0ff -tt 10.74.23.96 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-jobamzrcjodcuxkifinyfxnbjspmyhea ; http_proxy=http://16.100.210.81:8888 HTTP_PROXY=http://16.100.210.81:8888 https_proxy=http://16.100.210.81:8888 HTTPS_PROXY=http://16.100.210.81:8888 no_proxy='"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' NO_PROXY='"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"' /usr/bin/python /home/ubuntu/.ansible/tmp/ansible-tmp-1586521304.0805278-210095447508559/AnsiballZ_uri.py'"'"'"'"'"'"'"'"' && sleep 0'"'"''
Explicitly setting no_proxy in inventory/demo2node/group_vars/all/all.yml will override the bad value.
## Refer to roles/kubespray-defaults/defaults/main.yml before modifying no_proxy
no_proxy: "127.0.0.1,localhost"
Looks like roles/kubespray-defaults/defaults/main.yaml would normally set no_proxy to a reasonable value for deployment when http_proxy/https_proxy are set.
it's not because of the proxy. I've tried curl -I -k --cert /etc/kubernetes/ssl/ca.crt --key /etc/kubernetes/ssl/ca.key https://127.0.0.1:6443/healthz on the same master node and have gotten the same result (403). but if I run curl -I -k --cert /etc/kubernetes/ssl/apiserver-kubelet-client.crt --key /etc/kubernetes/ssl/apiserver-kubelet-client.key https://localhost:6443/healthz it return 200
I'm having the same issue with Calico | wait for etcd task too
Duplicate of #5891.
I'm hitting exactly the same issue as @jasonltaylor, no_proxy variable is empty, playbook breaks on kubernetes-apps/ansible : Kubernetes Apps | Wait for kube-apiserver for me.
Problem disappears when no_proxy is set manually beforehand as described here https://github.com/kubernetes-sigs/kubespray/issues/5935#issuecomment-614287455 however then that autogenerated sane default configuration and entries from additonal_no_proxy are missing.
Reverted #5896 locally and problem seems to be gone.
this is my no_proxy
no_proxy: >-
10.0.0.0/8,
127.0.0.0/8,
172.16.0.0/12,
192.168.0.0/16,
localhost
and I'm still having issues
Maybe @alexkross has some input on this?
Happy to see a PR if a patch is required.
@Miouge1 Was just getting ready to revert #5896 and retest. Is that worth trying or would I have to just reset to the commit prior instead? Just retried latest mainline and seeing same issue.
As @przemeklal mentioned, reverting #5896 is a workaround.
What is the minimum vars to set to reproduce this problem?
On latest master vagrant up with all defaults works just fine, so I suppose one needs to set some vars to see this problem?
Maybe @alexkross has some input on this?
Happy to see a PR if a patch is required.
An array of exclusions for proxy is defined here: https://github.com/kubernetes-sigs/kubespray/blob/910a821d0bd5c29dd227a38a91e82546ca70116b/roles/kubespray-defaults/defaults/main.yaml#L419-L438
When prettyfied and shortened for readability sake this jinja2 template looks like:
if http[s]_proxy is defined
if loadbalancer_apiserver is defined
apiserver_loadbalancer_domain_name, loadbalancer_apiserver.address,
endif
for item in (groups['k8s-cluster'] + groups['etcd'] + groups['calico-rr'])
hostvars[item]['access_ip'] | default(hostvars[item]['ip'] | default(fallback_ips[item])),
if item != hostvars[item].get('ansible_hostname', '')
hostvars[item]['ansible_hostname'],
hostvars[item]['ansible_hostname'].dns_domain,
endif
item,item.dns_domain,
endfor
...
127.0.0.1,localhost,kube_service_addresses,kube_pods_subnet
endif
For some reason "for" loop have been executed over the list of hosts, but none vars were expanded into values producing weird output strings like no_proxy='"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'
I hope that #5957 have fixed the issue.
this is my
no_proxyno_proxy: >- 10.0.0.0/8, 127.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, localhostand I'm still having issues
no_proxy in CIDR notation is known to be ignored by python modules: https://github.com/ansible/ansible/issues/52705
In the VM that I am provisioning with kubespray, the Ansible python process has no_proxy='' passed explicitly to its environment, which explains why the TASK [kubernetes-apps/ansible : Kubernetes Apps | Wait for kube-apiserver] fails with the proxy error (the kube apiserver URL is queried through the proxy while it should be directly queried).
TASK [kubernetes-apps/ansible : Kubernetes Apps | Wait for kube-apiserver] ****************************************************************************************************************************************
fatal: [kapitan]: FAILED! => {"attempts": 20, "changed": false, "content": "", "elapsed": 0, "msg": "Status code was -1 and not [200]: Request failed: <urlopen error Tunnel connection failed: 503 Service Unavailable>", "redirected": false, "status": -1, "url": "https://127.0.0.1:6443/healthz"}
Here is the Ansible python process cmdline, captured on the target while Ansible is running:
[root@kapitan ~]# ps aux | grep python | grep no_proxy
root 3978 0.0 0.0 113184 1216 ? Ss 15:54 0:00 /bin/sh -c no_proxy='' https_proxy=http://10.0.3.195:3128 NO_PROXY='' http_proxy=http://10.0.3.195:3128 HTTPS_PROXY=http://10.0.3.195:3128 HTTP_PROXY=http://10.0.3.195:3128 /usr/bin/python && sleep 0
It shows clearly that no_proxy=''.
As a workaround, I altered cluster.yml to ensure that { role: kubespray-defaults } runs before the proxy_env fact is set, like this:
diff --git a/cluster.yml b/cluster.yml
index ca828206..ace8e0f3 100644
--- a/cluster.yml
+++ b/cluster.yml
@@ -4,6 +4,8 @@
- hosts: all
gather_facts: false
+ roles:
+ - { role: kubespray-defaults }
tasks:
- name: "Set up proxy environment"
set_fact:
diff --git a/scale.yml b/scale.yml
index 65fecae0..5c310bfb 100644
--- a/scale.yml
+++ b/scale.yml
@@ -4,6 +4,8 @@
- hosts: all
gather_facts: false
+ roles:
+ - { role: kubespray-defaults }
tasks:
- name: "Set up proxy environment"
set_fact:
diff --git a/upgrade-cluster.yml b/upgrade-cluster.yml
index 70c3943f..39af72e8 100644
--- a/upgrade-cluster.yml
+++ b/upgrade-cluster.yml
@@ -4,6 +4,8 @@
- hosts: all
gather_facts: false
+ roles:
+ - { role: kubespray-defaults }
tasks:
- name: "Set up proxy environment"
set_fact:
From my limited understanding, this ensures that the no_proxy value is calculated by roles/kubespray-defaults/tasks/no_proxy.yml before the value is copied into the proxy_env fact.
After applying the above hack, the target successfully provisions.
Here is the Ansible python process cmdline post hack, captured on the target while Ansible is running:
[root@kapitan ~]# ps aux | grep -y python | grep no_proxy
root 16437 0.0 0.0 113184 1216 ? Ss 16:46 0:00 /bin/sh -c no_proxy=192.168.220.100,kapitan,kapitan.cluster.local,127.0.0.1,localhost,10.233.0.0/18,10.233.64.0/18 https_proxy=http://10.0.3.195:3128 NO_PROXY=192.168.220.100,kapitan,kapitan.cluster.local,127.0.0.1,localhost,10.233.0.0/18,10.233.64.0/18 http_proxy=http://10.0.3.195:3128 HTTPS_PROXY=http://10.0.3.195:3128 HTTP_PROXY=http://10.0.3.195:3128 /usr/bin/python && sleep 0
Notice that the no_proxy environment variable is correct this time.
Ok indeed if I understand correctly you don't get no_proxy from your env variables but you are using the one generated by no_proxy.yml .
That would explain it.
@Miouge1 I think the minimum requirement is to just set the http_proxy and https_proxy. From the documentation, it looks like in general you are supposed to avoid setting no_proxy (which gets generated for you) and instead set additional_no_proxy.
@jasonltaylor I'm with you on this. A topology with nodes separated by a HTTP[S] proxy is weird.
FYI I managed to reproduce that in CI with PR #6039 (commit d234ee0)
It fails at the task TASK [kubernetes-apps/ansible : Kubernetes Apps | Wait for kube-apiserver] *****
With
fatal: [instance-1]: FAILED! => {"attempts": 20, "changed": false, "content": "", "elapsed": 0, "msg": "Status code was -1 and not [200]: Request failed: <urlopen error Tunnel connection failed: 500 Unable to connect>", "redirected": false, "status": -1, "url": "https://127.0.0.1:6443/healthz"}
See full Ansible logs here
Proxy logs show:
CONNECT Apr 28 18:43:46 [15]: Request (file descriptor 9): CONNECT 127.0.0.1:6443 HTTP/1.0
INFO Apr 28 18:43:46 [15]: No upstream proxy for 127.0.0.1
INFO Apr 28 18:43:46 [15]: opensock: opening connection to 127.0.0.1:6443
INFO Apr 28 18:43:46 [15]: opensock: getaddrinfo returned for 127.0.0.1:6443
ERROR Apr 28 18:43:46 [15]: opensock: Could not establish a connection to 127.0.0.1
So exactly the same situation as @jperville
Is there an agreement on the best way to resolve this?
@jasonltaylor did you try to include the 16.100.210.81 into the no_proxy values"?
@electrocucaracha I did not put the proxy IP itself in no_proxy. I simply set no_proxy to similar to how it would be normally be set by no_proxy.yml (localhost, etc.) to see if the no_proxy variable itself would get set during the deployment if made explicit.
Thanks everyone for the info. Further investigation showed that the proxy_env.no_proxy defaults to empty string, then later on the no_proxy facts gets generated, but not the proxy_env.no_proxy which means that it always was empty.
This is fixed by PR #6039