Kubespray: domain inside pods doesn't resolved

Created on 14 Feb 2017 · 9Comments · Source: kubernetes-sigs/kubespray

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
This is a issue from dns i think.
inside of pods domain can't resolve.

ping google.com
ping gluster-svc

ping: unknown host

my dns config

# Can be dnsmasq_kubedns, kubedns or none
dns_mode: kubedns

# Can be docker_dns, host_resolvconf or none
resolvconf_mode: docker_dns

Environment:

Cloud provider or hardware configuration:
digitalocean

OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
centos
Version of Ansible (ansible --version):
2.2.1

Kargo version (commit) (git rev-parse --short HEAD):

commit e877cd2874da1625bf41c332989a52054ab2aa22
Merge: 203ddfc 732ae69
Author: Antoine Legrand antoine.legrand@coreos.com
Date: Mon Feb 13 17:53:57 2017 +0100

Merge pull request #1024 from holser/bug/961

Install pip on Ubuntu

Network plugin used:
flannel

Copy of your inventory file:

## Configure 'ip' variable to bind kubernetes services on a

## different ip than the default iface

node1 ansible_ssh_host=pro.1 ip=10.135.1.221 ansible_user=root
node2 ansible_ssh_host=pro.2 ip=10.135.38.61 ansible_user=root
node3 ansible_ssh_host=pro.3 ip=10.135.46.221 ansible_user=root

## configure a bastion host if your nodes are not directly reachable

bastion ansible_ssh_host=x.x.x.x

[kube-master]
node1

[etcd]
node1
node2
node3

[kube-node]
node2
node3
node4
node5
node6

[k8s-cluster:children]
kube-node
kube-master

Command used to invoke ansible:
ansible-playbook -i inventory/inventory cluster.yml -b

Output of ansible run:

Anything else do we need to know:
UPDATE:
When i use private ip this issue Appears, with public ip every thing work like a charm!

Source

alirezaDavid

Most helpful comment

@alirezaDavid I had the same issue and it was iptables.
What i did was to remove the option --iptables=false

vim /etc/systemd/system/docker.service.d/docker-options.conf

systemctl daemon-reload
systemctl restart docker

dberuben on 13 Mar 2017

👍3

All 9 comments

I was about to file my own issue but then noticed this one. I am having what looks like a related issue, but with:

dns_mode: dnsmasq_kubedns
resolvconf_mode: docker_dns

Cluster service name resolution used to work for me, but now it seems broken. When I looked at logs of dnsmasq pods I saw this on all my nodes:

$ kubectl --namespace=kube-system logs po/dnsmasq-6c7t1 | uniq -c
      1 dnsmasq[1]: started, version 2.72 cachesize 1000
      1 dnsmasq[1]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect
      1 dnsmasq[1]: using nameserver 127.0.1.1#53
      1 dnsmasq[1]: using nameserver 10.0.1.1#53
      1 dnsmasq[1]: using local addresses only for domain com.svc.cluster.local
      1 dnsmasq[1]: using local addresses only for domain svc.cluster.local.svc.cluster.local
      1 dnsmasq[1]: using local addresses only for domain cluster.local.svc.cluster.local
      1 dnsmasq[1]: using local addresses only for domain com.default.svc.cluster.local
      1 dnsmasq[1]: using local addresses only for domain default.svc.cluster.local.default.svc.cluster.local
      1 dnsmasq[1]: using local addresses only for domain cluster.local.default.svc.cluster.local
      1 dnsmasq[1]: using nameserver 10.233.0.3#53 for domain cluster.local
      2 dnsmasq[1]: read /etc/hosts - 7 addresses
  53523 dnsmasq[1]: Maximum number of concurrent DNS queries reached (max: 150)

NOTE: dnsmasq[1]: Maximum number of concurrent DNS queries reached (max: 150) is repeated over 50,000 times, on each node's pod. I am not sure how to go about further debug...

OS ansible host: Darwin 16.4.0 x86_64
nodes (Ubuntu installs)

Linux 4.4.0-63-generic x86_64
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
...

Version of Ansible : ansible 2.2.1.0
Kargo version (commit): 410438a

idcrook on 14 Feb 2017

I reset and brought back up cluster, and so far I am not seeing those errors in the dnsmasq pod logs. Crossing fingers...

idcrook on 14 Feb 2017

This is probably fixed by @holser's tuning for kubelet monitoring and my patch for etcd heartbeat interval. If these aren't flapping, kubedns behaves much better.

mattymo on 14 Feb 2017

One thing different I just noticed was this output line in the old "non-working" cluster:
dnsmasq[1]: using nameserver 127.0.1.1#53

It's no longer in the configuration/output in the "new" cluster (just the svc/kubedns IP and the one on the external router/gateway remain).

Since OP had dns_mode: kubedns and since my setup is related to dnsmasq and I had dns_mode: dnsmasq_kubedns I'll stop commenting on this issue...

idcrook on 14 Feb 2017

@mattymo
i try with kubedns but host and svc can't resolve inside pods.

alirezaDavid on 15 Feb 2017

@alirezaDavid I had the same issue and it was iptables.
What i did was to remove the option --iptables=false

vim /etc/systemd/system/docker.service.d/docker-options.conf

systemctl daemon-reload
systemctl restart docker

dberuben on 13 Mar 2017

👍3

had the same problem and solved it like @dberuben
looks like there is a problem with flannel and --iptables-false

pskrzyns on 13 Mar 2017

@dberuben Do you using ip in inventory file (private ip of digitalocean droplet)?
Because i try your solution but issue still appear.
Can you please send your k8s_cluster file?

alirezaDavid on 13 Mar 2017

I'm running into the same issue:
I0812 17:19:17.507710 1 nanny.go:108] dnsmasq[13]: Maximum number of concurrent DNS queries reached (max: 150)

/etc/systemd/system/docker.service.d/docker-options.conf doesn't exist.... any ideas?