kubeadm 🚀 - nodes with multiple network interfaces can fail to talk to services

Another user reported the same behavior on #kubernetes-users.

pires on 4 Jan 2017

@damaspi I have opened this issue and provided a fix. Waiting on feedback!

Also, moving to userspace mode brings quite of a performance penalty.

pires on 4 Jan 2017

Sorry, I commented in the wrong issue.
Thanks for the fix. I'll not be able to test it soon though (was working on this during holidays, and I am back to work), and I was using only the official stable version (so I have not the environment to build it).

damaspi on 4 Jan 2017

I copied it here now, and delete it in the other...

I worked-around temporarily by configuring proxy-mode to userspace but any advice welcome...

(inspired by this issue )

kubectl -n kube-system get ds -l "component=kube-proxy" -o json | jq ".items[0].spec.template.spec.containers[0].command |= .+ [\"--proxy-mode=userspace\"]" | kubectl apply -f - && kubectl -n kube-system delete pods -l "component=kube-proxy"

damaspi on 4 Jan 2017

Again, @damaspi

Also, moving to userspace mode brings quite of a performance penalty.

pires on 4 Jan 2017

👍2

I had the same issue.
My Kube-Proxy would not install the Service related rules, making any service unavailable from the pods.

My fix was to modify the Kubeadm DaemonSet for kube-proxy and add explicitely the --cluster-cidr = option.

bvandewalle on 19 Jan 2017

/cc @luxas

pires on 19 Jan 2017

@spxtr you are closing a bunch of issues in this repo

mikedanese on 2 Feb 2017

@mikedanese PRs being merged and there was a PR merged that fixed the lack of --cluster-cidr flag in controller-manager.

pires on 2 Feb 2017

@pires, the merge of the PR in the main repo is not what closed this PR. It was the merge in @spxtr's branch. That's what concerns me.

mikedanese on 2 Feb 2017

Ah I've seen it before indeed.

pires on 2 Feb 2017

I have seen this on 1.5.2. I manually building a cluster (to learn.) . I am unclear what the fix is, as there is mention of controller-manager and daemon set. That implies to me that people are launching kube-proxy via a daemon-set. Just to clarify, the actual fix is to add the flag (--cluster-cidr) to kube-proxy correct? Just trying to make sure I am not missing something. Also, just to clear my memory, didn't kube-proxy use to get this from the kube-apiserver? Was it always needed, I can't remember. If it doesn't, can someone clarify the difference between --service-cluster-ip-range=10.0.0.0/16 (api) and --cluster-cidr (proxy)? Thanks. (sorry to add here, not sure where else to ask for this issue.)

ronaldpetty on 10 Feb 2017

Where did the API server exposed the cluster pod CIDR? This was a misconception on my side as well.

pires on 10 Feb 2017

Hi @pires, I thought . --service-cluster-ip-range=10.0.0.0/16 on the api-server set it all up as the proxies would talk to the k8s server to get that information. --cluster-cidr maybe was to do a subset of --service-cluster-ip-range, else it seems redundant or there is a use case that I am unclear about (or I just don't know what I am talking about, which could be true!)

ronaldpetty on 10 Feb 2017

Service CIDR is the subnet used for virtual IPs (used by kube-proxy). Problem is kube-proxy doesn't know about pod network CIDR, which is different than service CIDR.

pires on 10 Feb 2017

Ah, so would that be the overlay?

ronaldpetty on 10 Feb 2017

Would this issue cause communication between pod and api-server? For example if I was to run the curl command from a kube pod to apiserver "curl https://10.96.0.1:443/api" result:> curl: (7) Failed to connect to 10.96.0.1 port 443: Connection timed out...

bamb00 on 22 Feb 2017

@bamb00, yes that symptom is caused by this bug.

For those interested, what's happening is that without knowledge of the cluster subnet, kube-proxy can't generate iptables conditions to match external traffic. Without those conditions, the traffic doesn't get marked for SNAT and gets put on the wire with the correct destination address but incorrect source.

Demonstration of the missing rules:

--- /root/ipt.old   2017-02-22 09:26:48.666151853 +0000
+++ /root/ipt.new   2017-02-22 09:25:52.010151853 +0000
@@ -27,8 +27,11 @@
 -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
 -A KUBE-SEP-EHDRCCD3XO3VA5ZU -s 192.168.1.4/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
 -A KUBE-SEP-EHDRCCD3XO3VA5ZU -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-EHDRCCD3XO3VA5ZU --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 192.168.1.4:6443
+-A KUBE-SERVICES ! -s 10.32.0.0/12 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
 -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
+-A KUBE-SERVICES ! -s 10.32.0.0/12 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
 -A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
+-A KUBE-SERVICES ! -s 10.32.0.0/12 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
 -A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
 -A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
 -A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-EHDRCCD3XO3VA5ZU --mask 255.255.255.255 --rsource -j KUBE-SEP-EHDRCCD3XO3VA5ZU
@@ -72,4 +75,6 @@
 -A WEAVE-NPC -d 224.0.0.0/4 -j ACCEPT
 -A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-DEFAULT
 -A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-INGRESS
+-A WEAVE-NPC-DEFAULT -m set --match-set weave-k?Z;25^M}|1s7P3|H9i;*;MhG dst -j ACCEPT
+-A WEAVE-NPC-DEFAULT -m set --match-set weave-iuZcey(5DeXbzgRFs8Szo]<@p dst -j ACCEPT
 COMMIT

This can be fixed at runtime by modifying @damaspi's command from above, replacing --proxy-mode=userspace with --cluster-cidr=your_cidr

Currently building kubeadm with the merged patch, will re-bootstrap with that and report back on it's success.

predakanga on 22 Feb 2017

👍1

@predakanga, Thanks for responding and explanation. I'm struggling to understand a connection timed out to the apiserver from a pod. What puzzling to me is the timed out error occurs on pod running in node on non-master (AWS) and pod running on master node does not have the timed out error. I want to apply the suggested workaround but have a question on how do I get the value your_cidr for --cluster-cidr?

Workaround:
kubectl -n kube-system get ds -l "component=kube-proxy" -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ [\"--cluster-cidr=your_cidr\"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l "component=kube-proxy"

Here is the timed out log:
2017-02-22T16:23:44.200770003Z 2017-02-22 16:23:44 +0000 [info]: starting fluentd-0.12.31
2017-02-22T16:23:44.281836006Z 2017-02-22 16:23:44 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.9.2'
2017-02-22T16:23:44.281862309Z 2017-02-22 16:23:44 +0000 [info]: gem 'fluent-plugin-journal-parser' version '0.1.0'
2017-02-22T16:23:44.281867643Z 2017-02-22 16:23:44 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.26.2'
2017-02-22T16:23:44.281873256Z 2017-02-22 16:23:44 +0000 [info]: gem 'fluent-plugin-record-reformer' version '0.8.3'
2017-02-22T16:23:44.281876742Z 2017-02-22 16:23:44 +0000 [info]: gem 'fluentd' version '0.12.31'
2017-02-22T16:23:44.281976520Z 2017-02-22 16:23:44 +0000 [info]: adding filter pattern="kubernetes." type="kubernetes_metadata"
2017-02-22T16:24:44.639919409Z 2017-02-22 16:24:44 +0000 **[error]: config error file="/fluentd/etc/fluent.conf" error="Invalid Kubernetes API v1 endpoint https://10.96.0.1:443/api: Timed out connecting to server"
2017-02-22T16:24:44.641926923Z 2017-02-22 16:24:44 +0000 [info]: process finished code=256
2017-02-22T16:24:44.641936546Z 2017-02-22 16:24:44 +0000 [error]: fluentd main process died unexpectedly. restarting.

As you can see the timed out is pointing to https://10.96.0.1:443/api, but from kubernetes service the apiserver endpoints is 10.43.0.20:6443. From what I understand from your explanation the timed out error is kube-proxy can't generate iptables condition to match external traffic.

Why is the connection going through 10.96.0.1:443 and not the endpoints 10.43.0.20:6443?

Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Selector:
Type: ClusterIP
IP: 10.96.0.1
Port: https 443/TCP
Endpoints: 10.43.0.20:6443
Session Affinity: ClientIP
No events.

bamb00 on 22 Feb 2017

Update: Apply the workaround fixes the "clusterCIDR not specified, unable to distinguish between internal and external traffic".

Thanks.

bamb00 on 23 Feb 2017

Sorry for the newbie question, but how do I get the fix to apply it permanently?

bamb00 on 23 Feb 2017

👍1

We're using kube-proxy v1.5.3 (gcr.io/google_containers/kube-proxy-amd64:v1.5.3) but still seeing the error "clusterCIDR not specified, unable to distinguish between internal and external traffic".

According to the URL below the fix is in for kube-proxy v1.5,
https://github.com/dchen1107/kubernetes-1/commit/9dedf92d42028e1bbb4d6aae66b353697afaa55b

Is this correct?

bamb00 on 1 Mar 2017

@bamb00 this is not a kube-proxy fix but a kubeadm change that sets a flag in the kube-proxy pod manifest. kubeadm doesn't follow (yet!) the kubernetes release process, so there's no kubeadm 1.5.3. There will be a 1.6.

pires on 1 Mar 2017

I deploy 1.5.2.some warning hint. proxy: clusterCIDR not specified, unable to distinguish between internal and external traffic

timchenxiaoyu on 31 Mar 2017

@timchenxiaoyu With v1.6 you can set the --pod-network-cidr flag to set that

luxas on 31 Mar 2017

Hi, I work on Weave Net, and I cannot understand why a change would be necessary as described at https://github.com/kubernetes/kubeadm/issues/102#issuecomment-281617189

Weave Net installs its own masquerading rules on exit from the pod network, so should not need kube-proxy to do it too.

bboreham on 4 Apr 2017

@bboreham I can't speak to the why, but from memory the issue was that the weave-net daemonset pods couldn't talk to each other.

I'll re-up my environment so I can give you more details, but that may take a while (Australian internet)

predakanga on 4 Apr 2017

Can't see why this issue should be closed - could a maintainer please re-open it?

bboreham on 4 Apr 2017

/cc @luxas

pires on 4 Apr 2017

I've just checked a working Kubernetes+WeaveNet cluster, and it has the same message in kube-proxy's logs

W0404 12:24:17.175005       1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic

So I would conclude that the warning is unnecessarily scaring people.

bboreham on 4 Apr 2017

the issue was that the weave-net daemonset pods couldn't talk to each other.

@predakanga the Weave Net implementation runs in the host network namespace; it definitely shouldn't be impacted by (or going anywhere near) kube-proxy rules when the pods talk to each other.

bboreham on 4 Apr 2017

@bboreham Ah, I was misremembering, my apologies.

I think I know what the actual issue was, but I'll hold off until two of these nodes finish bootstrapping so I can confirm

predakanga on 4 Apr 2017

@bboreham The issue is that the weave pod is trying to reach the API server through it's VIP

I've just bootstrapped a two-node cluster with no special options and https://git.io/weave-kube-1.6 applied, and reach an error condition.

Syslog output:

Apr  4 13:48:45 frontend kubelet[17891]: I0404 13:48:45.775425   17891 operation_generator.go:597] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/a96a9cfb-193b-11e7-b8e0-02fc636bbb90-weave-net-token-p3qrx" (spec.Name: "weave-net-token-p3qrx") pod "a96a9cfb-193b-11e7-b8e0-02fc636bbb90" (UID: "a96a9cfb-193b-11e7-b8e0-02fc636bbb90").
Apr  4 13:48:45 frontend kubelet[17891]: I0404 13:48:45.984975   17891 kuberuntime_manager.go:458] Container {Name:weave Image:weaveworks/weave-kube:1.9.4 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath:} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath:} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath:} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath:} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath:} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath:} {Name:weave-net-token-p3qrx ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Apr  4 13:48:45 frontend kubelet[17891]: I0404 13:48:45.986470   17891 kuberuntime_manager.go:742] checking backoff for container "weave" in pod "weave-net-x18xm_kube-system(a96a9cfb-193b-11e7-b8e0-02fc636bbb90)"
Apr  4 13:48:45 frontend kubelet[17891]: I0404 13:48:45.987392   17891 kuberuntime_manager.go:752] Back-off 5m0s restarting failed container=weave pod=weave-net-x18xm_kube-system(a96a9cfb-193b-11e7-b8e0-02fc636bbb90)
Apr  4 13:48:45 frontend kubelet[17891]: E0404 13:48:45.987440   17891 pod_workers.go:182] Error syncing pod a96a9cfb-193b-11e7-b8e0-02fc636bbb90 ("weave-net-x18xm_kube-system(a96a9cfb-193b-11e7-b8e0-02fc636bbb90)"), skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=weave pod=weave-net-x18xm_kube-system(a96a9cfb-193b-11e7-b8e0-02fc636bbb90)"
Apr  4 13:48:50 frontend kubelet[17891]: W0404 13:48:50.545383   17891 cni.go:157] Unable to update cni config: No networks found in /etc/cni/net.d
Apr  4 13:48:50 frontend kubelet[17891]: E0404 13:48:50.546074   17891 kubelet.go:2067] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

Weave container logs:

2017/04/04 13:45:24 error contacting APIServer: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout; trying with fallback: http://localhost:8080
2017/04/04 13:45:24 Could not get peers: Get http://localhost:8080/api/v1/nodes: dial tcp 127.0.0.1:8080: getsockopt: connection refused
Failed to get peers

And the node never reaches "Ready" state

predakanga on 4 Apr 2017

The issue is that the weave pod is trying to reach the API server through it's VIP

what is the evidence for that?

Get http://localhost:8080/api/v1/nodes

This is the weave pod trying to reach the api-server on an unsecured local address. This is not the configuration you get from current kubeadm with no options.

bboreham on 4 Apr 2017

My bad again - my formatting cut off the first line of each log. I've amended them both.

predakanga on 4 Apr 2017

ok, I have also just set up a cluster with no special options and have no trouble contacting tcp://10.96.0.1:443

However my comment about Weave Net setting up masquerading rules is not relevant. Let me see if I can figure it out.

bboreham on 4 Apr 2017

For comparison, I've just re-run the kubeadm bootstrap with the addition of --pod-network-cidr 10.32.0.0/12, and the weave pod starts properly, the node transitions to Ready.

I suspect that you're not experiencing it because it only applies to certain network configurations - in my case I'm using Vagrant machines with the kube cluster established over secondary private-only interfaces.

predakanga on 4 Apr 2017

In a separate conversation, I have seen this happening:

There are two network interfaces, eth0 and eth1: eth0 has the default route, but we want all traffic to kubernetes to go via eth1.

A process, such as Weave Net, opens a connection to the service address 10.96.0.1
The destination address is re-mapped to the master's eth1 address 192.168.10.90 (the re-mapping is done iptables rules created by kube-proxy.)
The packet is sent on the node's eth1 interface.
However Linux has already picked the eth0 source address for this packet, based on the original destination matching the default route.
At the destination it is dropped as coming from the wrong place

Adding the --pod-network-cidr causes an extra iptables rule to rewrite the source address. Thus it will now go over the eth1 interfaces. [EDIT: I do not recommend this, because it's essentially an accident that it makes it work]

Another way to get it to work is to add a route telling Linux that all kubernetes service addresses are to go via eth1, like this:

ip route add 10.96.0.0/16 dev eth1 src 192.168.10.100

Personally I find the route more attractive since it makes the right decision earlier. But looking for other voices to comment on whether this is valid.

bboreham on 4 Apr 2017

👍1

@bboreham thanks for debugging with me, and thanks for updating with the findings here!

Will test the fixes in my environment.

obnoxxx on 4 Apr 2017

Interesting - I can confirm that this route approach works on a single
network segment, but would it cause problems across broadcast domains?

Lachlan

On Wed, Apr 5, 2017 at 1:16 AM, Bryan Boreham notifications@github.com
wrote:

In a separate conversation, I have seen this happening:

There are two network interfaces, eth0 and eth1: eth0 has the default
route, but we want all traffic to kubernetes to go via eth1.

A process, such as Weave Net, opens a connection to the service
address 10.96.0.1

The destination address is re-mapped to the master's eth1 address
192.168.10.90 (the re-mapping is done iptables rules created by kube-proxy.)

The packet is sent on the node's eth1 interface.

However Linux has already picked the eth0 source address for this
packet, based on the original destination matching the default route.

At the destination it is dropped as coming from the wrong place

Adding the --pod-network-cidr causes an extra iptables rule to rewrite
the source address. Thus it will now go over the eth1 interfaces.

Another way to get it to work is to add a route telling Linux that all
kubernetes service addresses are to go via eth1, like this:

ip route add 10.96.0.0/16 dev eth1 src 192.168.10.100

Personally I find the route more attractive since it makes the right
decision earlier. But looking for other voices to comment on whether this
is valid.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubeadm/issues/102#issuecomment-291532883,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAystQH0juIo6cXPh5QQeB-1rHlUKUgvks5rsl7WgaJpZM4La21F
.

predakanga on 4 Apr 2017

@bboreham I can confirm that I now have a working ansible setup by adding the routes... :-)

obnoxxx on 4 Apr 2017

@predakanga all I am trying to do with the route is get Linux to pick a better source address; since we expect all service IPs to get DNATted we don't expect to actually use that route. However I can see that if underlying addresses went out on two different network adapters then my suggestion wouldn't be good.

@thockin I would value your input on my analysis at https://github.com/kubernetes/kubeadm/issues/102#issuecomment-291532883 and the two suggestions to configure SNAT (for connections originating in the host network namespace) or add a route for the service IP range.

bboreham on 4 Apr 2017

@bboreham could you document these findings somehow and somewhere, please?
So more users would know about it?

luxas on 4 Apr 2017

I only finished debugging @obnoxxx system a couple of hours ago!
I'm documenting it here so someone can say "no! no! you've completely misunderstood".
If that doesn't happen, I'll happily elevate it to proper documentation :slightly_smiling_face:

bboreham on 4 Apr 2017

👍1

In a multi-NIC multi-path case, yeah, I think you'd need a route like you suggest. Not sure how to automatically figure that out...

thockin on 4 Apr 2017

One more thought came to mind: this is nothing to do with the pod network (Weave Net or otherwise), because the thing that is failing is a process in a node's host namespace trying to talk to the api-server on master. So the finding that setting the clusterCIDR makes it work must be accidental.

bboreham on 5 Apr 2017

@bboreham @thockin Anything we can do here or can I go ahead and close this?
It's possible to set --cluster-cidr on kube-proxy by passing --pod-network-cidr to kubeadm init

luxas on 29 May 2017

As I have already described, setting --cluster-cidr is not a valid response to the issue originally reported in a comment on #74. (Although it happens to make the problem go away).

The title here is unhelpful; it relates to a warning message that has absolutely nothing to do with the underlying problem.

I don't really know what kubeadm could do, since the solution seems to relate to the underlying network. Maybe add options to inform your desired "public interface" and "private interface" and have kubeadm recommend network config changes?

bboreham on 31 May 2017

I just had a look at the clusterCIDR logic in kube-proxy, and I agree that is a weird corner case.

I agree the static route is appropriate for the 2nd interface, but it's unfortunate. It feels like the kernel should be smarter than that.

thockin on 31 May 2017

I'm running v1.6.1 and thought the error "clusterCIDR not specified, unable to distinguish between internal and external traffic" would be address.

2017-06-06T17:49:17.113224501Z I0606 17:49:17.112870 1 server.go:225] Using iptables Proxier.
2017-06-06T17:49:17.139584294Z W0606 17:49:17.139190 1 proxier.go:309] clusterCIDR not specified, unable to distinguish between internal and external traffic
2017-06-06T17:49:17.139607413Z I0606 17:49:17.139223 1 server.go:249] Tearing down userspace rules.
2017-06-06T17:49:17.251412491Z I0606 17:49:17.251115 1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
2017-06-06T17:49:17.252499164Z I0606 17:49:17.252359 1 conntrack.go:66] Setting conntrack hashsize to 131072
2017-06-06T17:49:17.253220249Z I0606 17:49:17.253057 1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
2017-06-06T17:49:17.253246216Z I0606 17:49:17.253124 1 conntrack.go:81] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600

bamb00 on 6 Jun 2017

how to define the internal and external traffic ?

timchenxiaoyu on 9 Jun 2017

This error specifically refers to anything outside the clusters Pod IPs.

On Thu, Jun 8, 2017 at 10:29 PM, timchenxiaoyu notifications@github.com
wrote:

how to define the internal and external traffic ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/kubeadm/issues/102#issuecomment-307298979,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVPwm-x4smcIx1cFAOLOO6FFIn9T6ks5sCNgkgaJpZM4La21F
.

thockin on 9 Jun 2017

I've seen this problem too. a route to the pod network to the second nic resolved the issue for me. Feels a little fragile though.....

kfox1111 on 5 Jul 2017

Hi,

I'm running Kubernetes v1.6.6 & v1.7.0 kube-proxy. Getting the same error,

kube-proxy:

   W0914 00:15:41.627710       1 proxier.go:298] clusterCIDR not specified, unable to distinguish between internal and external traffic

Kubernetes version:

   Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.6", GitCommit:"7fa1c1756d8bc963f1a389f4a6937dc71f08ada2", GitTreeState:"clean", BuildDate:"2017-06-16T18:34:20Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}
   Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.6", GitCommit:"7fa1c1756d8bc963f1a389f4a6937dc71f08ada2", GitTreeState:"clean", BuildDate:"2017-06-16T18:21:54Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}

Try the workaround from @damaspi but failed in v1.6.6 and v1.7.0 use to work in v1.5.4.

  # kubectl -n kube-system get ds -l "component=kube-proxy" -o json | jq '.items[0].spec.template.spec.containers[0].command |= .+ ["--cluster-cidr=10.96.0.0/12"]' | kubectl apply -f - && kubectl -n kube-system delete pods -l "component=kube-proxy"

    error: error validating "STDIN": error validating data: items[0].apiVersion not set; if you choose to ignore these errors, turn validation off with --validate=false

Need guidance to resolve in v1.6.6 & v1.7.0. Thanks.

bamb00 on 14 Sep 2017

@bboreham

I don't really know what kubeadm could do, since the solution seems to relate to the underlying network. Maybe add options to inform your desired "public interface" and "private interface" and have kubeadm recommend network config changes?

I don't think kubeadm should be spitting out OS or distro-specific configuration instructions for host networking. I think it's the responsibility of the operator to configure their host appropriately because otherwise it becomes a rabbit hole. We can certainly make it a requirement, though.

What should kubeadm expect for things to work? That if the user wants to use a non-default NIC, they need to add a static route in Linux? Is this a general enough use-case for us to add it as a system requirement?

jamiehannaford on 9 Oct 2017

@bboreham Any ideas on how we can improve our documentation here? Otherwise I'm in favour of closing this because:

it seems to relate to a user's network environment, not kubeadm
there's no single way to clarify those expectations

jamiehannaford on 2 Nov 2017

[Aside: it bugs me I have to read up and down and through other issues to page the context back in. The problem people wanted resolved is absolutely nothing to do with the title of this issue]

In the setup docs you could say "if you have more than one network adapter, and your Kubernetes components are not reachable on the default route, we recommend you add IP route(s) so Kubernetes cluster addresses go via the appropriate adapter".

bboreham on 2 Nov 2017

[Aside: it bugs me I have to read up and down and through other issues to page the context back in. The problem people wanted resolved is absolutely nothing to do with the title of this issue]

You are not the only one! 😅

In the setup docs you could say "if you have more than one network adapter, and your Kubernetes components are not reachable on the default route, we recommend you add IP route(s) so Kubernetes cluster addresses go via the appropriate adapter".

Cool, I'll try to submit a docs PR for this tomorrow and close this out.

jamiehannaford on 2 Nov 2017

❤1

This is now documented in https://github.com/kubernetes/website/pull/6265, so I'm going to close.

This issue seems to track a few different problems at once, so if you're still running into a potential bug, please open a new issue so can better target the root cause.

jamiehannaford on 10 Nov 2017

FWIW, if you use kubeadm to start the cluster, if you specify the "pod-network-cidr", that'll get passed to the kube-proxy when it starts as the "cluster-cidr". For example, weave defaults to using "10.32.0.0/12"...so I used kubeadm init --kubernetes-version=v.1.8.4 --pod-network-cidr=10.32.0.0/12 which started kube-proxy with cluster-cidr=10.32.0.0/12

mindscratch on 29 Nov 2017

👍1

@bboreham I'm new to this...Would there be an example on how to implement your suggestion "add IP route(s) so Kubernetes cluster addresses go via the appropriate adapter"?

bamb00 on 30 Nov 2017

@bamb00 scroll up; there is an example at https://github.com/kubernetes/kubeadm/issues/102#issuecomment-291532883

Caution: if you make a wrong step it may will result in your machine being inaccessible. Generally this will come back after a reboot, unless you configured the bad route to be there on startup.

I do not know an easy way to learn Linux network configuration.

bboreham on 30 Nov 2017

@mindscratch do note this issue has nothing to do with "cluster-cidr"; that was a red herring eliminated around seven months ago. Please open a new issue if you are having new problems.

bboreham on 30 Nov 2017

Semi-serious suggestion for fixing this specific case without requiring the kube-proxy to use ! -s $podCIDR to distinguish host source address:

$ sudo ip ro add local 10.96.0.0/12 table local dev lo
$ sudo iptables -t nat -I KUBE-SERVICES -s 10.96.0.0/12 -d 10.96.0.0/12 -j KUBE-MARK-MASQ

(or possibly some variation with an explicit ... src 10.96.0.0 on the local route... the table local is probably also unnecessary and a bad idea)

$ ip ro get 10.96.0.1
local 10.96.0.1 dev lo  src 10.96.0.1 
    cache <local> 
$ curl -vk https://10.96.0.1
...
* Connected to 10.96.0.1 (10.96.0.1) port 443 (#0)

11:32:20.671085 0c:c4:7a:54:0a:e6 > 44:aa:50:04:3d:00, ethertype IPv4 (0x0800), length 74: 10.80.4.149.59334 > 10.80.4.147.6443: Flags [S], seq 2286812584, win 43690, options [mss 65495,sackOK,TS val 209450 ecr 0,nop,wscale 8], length 0
11:32:20.671239 44:aa:50:04:3d:00 > 0c:c4:7a:54:0a:e6, ethertype IPv4 (0x0800), length 74: 10.80.4.147.6443 > 10.80.4.149.59334: Flags [S.], seq 1684666695, ack 2286812585, win 28960, options [mss 1460,sackOK,TS val 208877 ecr 209450,nop,wscale 8], length 0
11:32:20.671315 0c:c4:7a:54:0a:e6 > 44:aa:50:04:3d:00, ethertype IPv4 (0x0800), length 66: 10.80.4.149.59334 > 10.80.4.147.6443: Flags [.], ack 1, win 171, options [nop,nop,TS val 209450 ecr 208877], length 0

However, I have no idea if that covers all of the expected behaviors of those source-specific kube-proxy MASQ rules...

EDIT: this also has all kinds of side-effects for connections to unconfigured service VIPs... they will end up connecting to any matching host network namespace services.

EDIT2: However, even that is probably better than the current behavior of leaking connections to unconfigured 10.96.X.Y service VIPs out via the default route... which is vaguely unsettling

SpComb on 6 Mar 2018

Kubeadm: nodes with multiple network interfaces can fail to talk to services

Most helpful comment

All 64 comments

Related issues