Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT
Environment:
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Ubuntu 18.04
Version of Ansible (ansible --version):
2.6.3
Kubespray version (commit) (git rev-parse --short HEAD):
b79dd602
Network plugin used:
calico
Copy of your inventory file:
[all]
kube00 ansible_host=kube00.REDACTED ip=REDACTED etcd_member_name=etcd1
kube01 ansible_host=kube01.REDACTED ip=REDACTED etcd_member_name=etcd2
kube02 ansible_host=kube02.REDACTED ip=REDACTED etcd_member_name=etcd3
kube03 ansible_host=kube03.REDACTED ip=REDACTED
kube04 ansible_host=kube04.REDACTED ip=REDACTED
[kube-master]
kube00
kube01
[kube-node]
kube02
kube03
kube04
[etcd]
kube00
kube01
kube02
[k8s-cluster:children]
kube-node
kube-master
Command used to invoke ansible:
pipenv run ansible-playbook --become --inventory inventories/kubespray/hosts.ini kubespray/cluster.yml
Output of ansible run:
Anything else do we need to know:
After a "fresh install" of netchecker, curl http://localhost:31081/api/v1/connectivity_check produces the following error:
Error occurred while checking the agents. Details: unknown (get agents.network-checker.ext netchecker-agent-xxxxx)
The netchecker-server log has this repeating:
E0910 17:15:25.308402 1 storer_k8s.go:110] unknown (get agents.network-checker.ext netchecker-agent-hostnet-2b4hm)
I0910 17:15:25.310800 1 storer_k8s.go:129] Updated agent netchecker-agent-hostnet-2b4hm unknown (put agents.network-checker.ext netchecker-agent-hostnet-2b4hm)
E0910 17:15:25.310846 1 storer_k8s.go:133] unknown (put agents.network-checker.ext netchecker-agent-hostnet-2b4hm)
[negroni] 2018-09-10T17:15:25Z | 0 | 5.088171ms | netchecker-service:8081 | POST /api/v1/agents/netchecker-agent-hostnet-2b4hm
[negroni] 2018-09-10T17:15:25Z | 0 | 20.881碌s | netchecker-service:8081 | GET /api/v1/ping
Is this a netchecker bug or a kubespray buy, or a problem with my environment/config?
I'm facing the same issue since the switch from l23network/k8s-netchecker to Mirantis/k8s-netchecker-* images.
@mirwan I suspect a netchecker bug, I've submitted an issue on their repo here
Change variables:
# netchecker
agent_img: "quay.io/l23network/k8s-netchecker-agent:v1.0"
server_img: "quay.io/l23network/k8s-netchecker-server:v1.0"
Apply changes:
ansible-playbook -i inventory/mycluster/hosts.ini -bvv cluster.yml --tags netchecker
Test it (ssh root@node1):
root@node1:~# curl http://localhost:31081/api/v1/connectivity_check
{"Message":"All 6 pods successfully reported back to the server","Absent":null,"Outdated":null}
Also consistently getting this on VMs (-ish, LXCs actually) deployment,
perfectly fixed by @pahaz workaround, thanks! ->
ansible-playbook -i inventory/mycluster/hosts.ini cluster.yml --tags netchecker \
--extra-vars '{
deploy_netchecker: True,
netcheck_agent_img_repo: "quay.io/l23network/k8s-netchecker-agent",
netcheck_server_img_repo: "quay.io/l23network/k8s-netchecker-server",
netcheck_agent_tag: "v1.0",
netcheck_server_tag: "v1.0"
}'
Most helpful comment
@mirwan I suspect a netchecker bug, I've submitted an issue on their repo here