Agent connect to k3s server by the server's host wan ip, when the server run in docker
ali-vm1the server's host wan ipPort: 47.98.xxx.xxx:7441
run with docker-compose, the container's ip:
[root@ali-vm1 v070-t]# dcp exec server bash
[root@k3-server /]#
[root@k3-server /]# ip a |grep inet
inet 127.0.0.1/8 scope host lo
inet 2.3.1.2/24 brd 2.3.1.255 scope global eth0
hw-vm1[root@hw-vm1 v070]# dcp -f node.yml up
Recreating v070_node_1 ... done
Attaching to v070_node_1
node_1 | time="2019-08-09T14:00:52.000382372+08:00" level=info msg="Starting k3s agent v0.7.0 (61bdd852)"
node_1 | time="2019-08-09T14:00:54.104057009+08:00" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
node_1 | time="2019-08-09T14:00:54.104348928+08:00" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
node_1 | time="2019-08-09T14:00:54.104841131+08:00" level=info msg="Waiting for containerd startup: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: no such file or directory\""
node_1 | time="2019-08-09T14:00:55.107259392+08:00" level=info msg="module br_netfilter was already loaded"
node_1 | time="2019-08-09T14:00:55.107335659+08:00" level=info msg="module overlay was already loaded"
node_1 | time="2019-08-09T14:00:55.107344421+08:00" level=info msg="module nf_conntrack was already loaded"
node_1 | time="2019-08-09T14:00:55.236427480+08:00" level=info msg="Connecting to proxy" url="wss://2.3.1.2:6443/v1-k3s/connect"
got this and stuck: conn server with the containers's ip: 2.3.1.2, not the server's host wan ip.
msg="Connecting to proxy" url="wss://2.3.1.2:6443/v1-k3s/connect"
[root@ali-vm1 v070-t]# cat docker-compose.yml
version: '2'
services:
server:
image: reg.xx.com/k-spe/att-k3s:v070
command: server --disable-agent --cluster-cidr=7.0.0.0/16 --service-cidr=6.7.8.0/23 --cluster-domain=t2.k3s --tls-san=47.98.xxx.xxx --kube-apiserver-arg log-file=/tmp/kubeapi.log --kube-apiserver-arg bind-address=0.0.0.0 --no-deploy=traefik --no-deploy=servicelb
#...
privileged: true
ports:
- "7441:6443"
[root@hw-vm1 v070]# cat node.yml
version: '2'
services:
node:
image: reg.xx.com/k-spe/att-k3s:v070
command: agent --kubelet-arg="address=0.0.0.0"
privileged: true
network_mode: "host"
environment:
- K3S_URL=https://47.98.xxx.xxx:7441
- K3S_CLUSTER_SECRET=somethingtotallyrandom
this works fine with k3s v0.4.0
pkg/agent/tunnel/tunnel.go line 74:
addresses := []string{config.ServerAddress}
endpoint, _ := client.CoreV1().Endpoints("default").Get("kubernetes", metav1.GetOptions{})
if endpoint != nil {
addresses = getAddresses(endpoint)
I've got the code here, Can be a flag not to use the address from k8s that if not in HA mode?
The address from k8s is diff with the real, when k3s-server in docker, or k3s-server in nat. and k3s-agent conns not in the same lan env.
@erikwilson
We can probably add a flag to disable the load-balancer, but there are a couple concerns I have.
The load balancer should fail over to the original server url, so it should eventually connect if the endpoints are not routable.
The endpoints should be routable tho, I am guessing you are using the flags you are for a reason, but I suspect there is a larger configuration issue.
for rappid validation, I've just notes this:
pkg/agent/tunnel/tunnel.go line 74:
addresses := []string{config.ServerAddress}
/*endpoint, _ := client.CoreV1().Endpoints("default").Get("kubernetes", metav1.GetOptions{})
if endpoint != nil {
addresses = getAddresses(endpoint)
if onChange != nil {
onChange(addresses)
}
}*/
the nat cluster working now:
[root@(⎈ |default:default) ~]$ kc get node
NAME STATUS ROLES AGE VERSION
ali-vm1 Ready worker 7d22h v1.14.4-k3s.1
hw-vm1 Ready worker 77s v1.14.5-k3s.1
[root@(⎈ |default:default) ~]$ kc get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cattle-system cattle-cluster-agent-679b8c965d-sxl5r 1/1 Running 0 6d13h 7.0.0.17 ali-vm1 <none> <none>
cattle-system cattle-node-agent-9t7rn 1/1 Running 0 87s 192.168.0.105 hw-vm1 <none> <none>
cattle-system cattle-node-agent-v46c2 1/1 Running 0 6d13h 172.16.168.255 ali-vm1 <none> <none>
node_1 | I0810 09:17:52.061338 6 iptables.go:155] Adding iptables rule: ! -s 7.0.0.0/16 -d 7.0.1.0/24 -j RETURN
node_1 | I0810 09:17:52.063115 6 iptables.go:155] Adding iptables rule: ! -s 7.0.0.0/16 -d 7.0.0.0/16 -j MASQUERADE --random-fully
node_1 | time="2019-08-10T09:17:53.625946781+08:00" level=info msg="Tunnel endpoint watch event: [2.3.0.2:6443]"
node_1 | time="2019-08-10T09:17:53.625970645+08:00" level=info msg="Updating load balancer server addresses -> [2.3.0.2:6443 47.98.xxx.xxx:7442]"
node_1 | time="2019-08-10T09:17:53.626179200+08:00" level=info msg="Stopped tunnel to 127.0.0.1:22104"
node_1 | time="2019-08-10T09:17:53.626202055+08:00" level=info msg="Connecting to proxy" url="wss://2.3.0.2:6443/v1-k3s/connect"
node_1 | time="2019-08-10T09:17:53.626327829+08:00" level=info msg="Proxy done" err="context canceled" url="wss://127.0.0.1:22104/v1-k3s/connect"
node_1 | time="2019-08-10T09:20:00.945914731+08:00" level=error msg="Failed to connect to proxy" error="dial tcp 2.3.0.2:6443: connect: connection timed out"
node_1 | time="2019-08-10T09:20:00.946800772+08:00" level=error msg="Remotedialer proxy error" error="dial tcp 2.3.0.2:6443: connect: connection timed out"
node_1 | time="2019-08-10T09:20:05.946919710+08:00" level=info msg="Connecting to proxy" url="wss://2.3.0.2:6443/v1-k3s/connect"
node_1 | time="2019-08-10T09:22:13.288332093+08:00" level=error msg="Failed to connect to proxy" error="dial tcp 2.3.0.2:6443: connect: connection timed out"
node_1 | time="2019-08-10T09:22:13.288366468+08:00" level=error msg="Remotedialer proxy error" error="dial tcp 2.3.0.2:6443: connect: connection timed out"
node_1 | time="2019-08-10T09:22:18.288474629+08:00" level=info msg="Connecting to proxy" url="wss://2.3.0.2:6443/v1-k3s/connect"
node_1 | W0810 09:22:48.816723 6 info.go:52] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
node_1 | time="2019-08-10T09:24:25.512354666+08:00" level=error msg="Failed to connect to proxy" error="dial tcp 2.3.0.2:6443: connect: connection timed out"
node_1 | time="2019-08-10T09:24:25.512389515+08:00" level=error msg="Remotedialer proxy error" error="dial tcp 2.3.0.2:6443: connect: connection timed out"
node_1 | time="2019-08-10T09:24:30.512490123+08:00" level=info msg="Connecting to proxy" url="wss://2.3.0.2:6443/v1-k3s/connect"
node_1 | time="2019-08-10T09:26:37.736346722+08:00" level=error msg="Failed to connect to proxy" error="dial tcp 2.3.0.2:6443: connect: connection timed out"
node_1 | time="2019-08-10T09:26:37.736381292+08:00" level=error msg="Remotedialer proxy error" error="dial tcp 2.3.0.2:6443: connect: connection timed out"
node_1 | time="2019-08-10T09:26:42.736474014+08:00" level=info msg="Connecting to proxy" url="wss://2.3.0.2:6443/v1-k3s/connect"
/*endpoint, ok := ev.Object.(*v1.Endpoints)
if !ok {
logrus.Errorf("Tunnel could not case event object to endpoint: %v", ev)
continue watching
}*/
//newAddresses := getAddresses(endpoint)
newAddresses := []string{config.ServerAddress}
node_1 | I0810 10:15:33.218430 6 conntrack.go:52] Setting nf_conntrack_max to 131072
node_1 | I0810 10:15:33.218815 6 config.go:202] Starting service config controller
node_1 | I0810 10:15:33.218834 6 controller_utils.go:1027] Waiting for caches to sync for service config controller
node_1 | I0810 10:15:33.218843 6 config.go:102] Starting endpoints config controller
node_1 | I0810 10:15:33.218850 6 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
node_1 | I0810 10:15:33.295689 6 kuberuntime_manager.go:950] updating runtime config through cri with podcidr 7.0.1.0/24
node_1 | I0810 10:15:33.296182 6 kubelet_network.go:69] Setting Pod CIDR: -> 7.0.1.0/24
node_1 | I0810 10:15:33.300242 6 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "k8s-ssl" (UniqueName: "kubernetes.io/host-path/d68fb52a-bb0c-11e9-9aa5-024202030002-k8s-ssl") pod "cattle-node-agent-9t7rn" (UID: "d68fb52a-bb0c-11e9-9aa5-024202030002")
node_1 | I0810 10:15:33.300279 6 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "var-run" (UniqueName: "kubernetes.io/host-path/d68fb52a-bb0c-11e9-9aa5-024202030002-var-run") pod "cattle-node-agent-9t7rn" (UID: "d68fb52a-bb0c-11e9-9aa5-024202030002")
node_1 | I0810 10:15:33.300306 6 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "run" (UniqueName: "kubernetes.io/host-path/d68fb52a-bb0c-11e9-9aa5-024202030002-run") pod "cattle-node-agent-9t7rn" (UID: "d68fb52a-bb0c-11e9-9aa5-024202030002")
node_1 | I0810 10:15:33.300324 6 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "cattle-credentials" (UniqueName: "kubernetes.io/secret/d68fb52a-bb0c-11e9-9aa5-024202030002-cattle-credentials") pod "cattle-node-agent-9t7rn" (UID: "d68fb52a-bb0c-11e9-9aa5-024202030002")
node_1 | I0810 10:15:33.300343 6 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "cattle-token-2f5xk" (UniqueName: "kubernetes.io/secret/d68fb52a-bb0c-11e9-9aa5-024202030002-cattle-token-2f5xk") pod "cattle-node-agent-9t7rn" (UID: "d68fb52a-bb0c-11e9-9aa5-024202030002")
node_1 | I0810 10:15:33.300350 6 reconciler.go:154] Reconciler: start to sync state
node_1 | I0810 10:15:33.320760 6 controller_utils.go:1034] Caches are synced for endpoints config controller
node_1 | I0810 10:15:33.392095 6 kubelet_node_status.go:112] Node hw-vm1 was previously registered
node_1 | I0810 10:15:33.392114 6 kubelet_node_status.go:73] Successfully registered node hw-vm1
node_1 | I0810 10:15:33.418924 6 controller_utils.go:1034] Caches are synced for service config controller
node_1 | I0810 10:15:34.169997 6 kube.go:134] Node controller sync successful
node_1 | I0810 10:15:34.171148 6 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
node_1 | I0810 10:15:34.174177 6 flannel.go:75] Wrote subnet file to /run/flannel/subnet.env
node_1 | I0810 10:15:34.174185 6 flannel.go:79] Running backend.
node_1 | I0810 10:15:34.174192 6 vxlan_network.go:60] watching for new subnet leases
Yah, the reverse tunnel uses that also, so not really a load-balancer issue. Sounds like your endpoints should be routable.
Yah, the reverse tunnel uses that also, so not really a load-balancer issue. Sounds like your endpoints should be routable.
Thx for the working and the reply.
with a scenary of this:
for convenience or graceful reason, Just run server in docker (not network_mode: host), and expose the 6443 port to outsite of the host machine, and with HA mode.
Weather we can with a satic conf of the HA master nodes, just from the config file or config params?
this will lose the feature of dynamic monitoring master's node ip, but this ip not offen change.
or any good ideas?
I am curious, what is the purpose of setting --kube-apiserver-arg bind-address=0.0.0.0 and --kubelet-arg="address=0.0.0.0"? What network devices are available?
I am curious, what is the purpose of setting
--kube-apiserver-arg bind-address=0.0.0.0and--kubelet-arg="address=0.0.0.0"? What network devices are available?
current my using arch:
running in docker without network_mode: host; just expose the inner 6443 port.
[root@hw-vm1 ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8f676fb76ea3 reg.xxx.com/k-xxx/att-k3s-prs "/entry.sh agent -..." 2 minutes ago Up 2 minutes v070_node_1
3326ae640eb6 rancher/rancher:v2.2.6 "entrypoint.sh" 6 days ago Up 6 days 0.0.0.0:8880->80/tcp, 0.0.0.0:8443->443/tcp rancher
[root@hw-vm1 ~]# docker exec -it v070_node_1 bash
[root@hw-vm1 /]# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:15 ? 00:00:00 bash /entry.sh agent --kubelet-arg=address=0.0.0.0 --pause-image=registry.cn-hangzhou.aliyuncs.
rpc 5 1 0 10:15 ? 00:00:00 rpcbind -f
root 6 1 1 10:15 ? 00:00:02 k3s agent --kubelet-arg=address=0.0.0.0 --pause-image=registry.cn-hangzhou.aliyuncs.com/google_
root 16 6 0 10:15 ? 00:00:00 containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/cont
root 77 16 0 10:15 ? 00:00:00 containerd-shim -namespace k8s.io -workdir /var/lib/rancher/k3s/agent/containerd/io.containerd.
root 94 77 0 10:15 ? 00:00:00 /pause
root 125 16 0 10:15 ? 00:00:00 containerd-shim -namespace k8s.io -workdir /var/lib/rancher/k3s/agent/containerd/io.containerd.
root 141 125 0 10:15 ? 00:00:00 agent
root 320 0 0 10:17 ? 00:00:00 bash
root 361 320 0 10:17 ? 00:00:00 ps -ef
[root@hw-vm1 /]# pstree
bash-+-k3s-agent---containerd-+-containerd-shim---pause
| `-containerd-shim---agent
`-rpcbind
[root@hw-vm1 /]# k3s crictl ps
CONTAINER ID IMAGE CREATED STATE NAME ATTEMPT POD ID
432a39da4d6e8 ce6bb2c8f5c81 2 minutes ago Running agent 1 d6aa890dc8212
I started useing k3s from version v0.3.x, setting --kube-apiserver-arg bind-address=0.0.0.0 and --kubelet-arg="address=0.0.0.0" just from previous using experience.
And now I've not tested the result of not setting these two flags on my current deploy arch.
After the upper change, just few warnings of got the info of maching-id failed:
node_1 | W0810 10:20:33.124208 6 info.go:52] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
node_1 | W0810 10:25:33.110251 6 info.go:52] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
node_1 | W0810 10:30:33.110257 6 info.go:52] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"
Awesome, looks like it is working! The kube-apiserver flag may have been needed at one point for metrics server to work, but is probably causing problems with the current configuration.
Hopefully that helps, might be worth checking out https://github.com/rancher/k3d also. If there is any more info I can give please let me know.
Hopefully that helps, might be worth checking out https://github.com/rancher/k3d also. If there is any more info I can give please let me know.
I will, thx~
Awesome, looks like it is working! The kube-apiserver flag may have been needed at one point for metrics server to work, but is probably causing problems with the current configuration.
yes, truly it is, in this mode can only run some standalone pod in the lan node, or you need to add route to the kubernetes cluster or other node (as my hw-vm1 and ali-vm1 all be vpc mode, the node's ip is the vm's lan ip, not the wan ip):
panic: Get https://6.7.8.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 6.7.8.1:443: connect: no route to host
goroutine 1 [running]:
main.main()
/go/src/github.com/kubernetes-incubator/metrics-server/cmd/metrics-server/metrics-server.go:39 +0x13b
[root@(⎈ |default:kube-system) ~]$ kc logs -f metrics-server-f5896c776-xwgsc
I0810 04:01:54.980525 1 serving.go:273] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
W0810 04:01:56.352480 1 authentication.go:245] Unable to get configmap/extension-apiserver-authentication in kube-system. Usually fixed by 'kubectl create rolebinding -n kube-system ROLE_NAME --role=extension-apiserver-authentication-reader --serviceaccount=YOUR_NS:YOUR_SA'
Error: Get https://6.7.8.1:443/api/v1/namespaces/kube-system/configmaps/extension-apiserver-authentication: dial tcp 6.7.8.1:443: connect: no route to host
[root@(⎈ |default:default) ~]$ kc describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 6.7.8.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 2.3.0.2:6443
Session Affinity: None
Events: <none>
my former cluster use v040 of k3s(no HA feature)
[root@(⎈ |default:default) ~]$ kc describe svc kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 6.7.8.1
Port: https 443/TCP
TargetPort: 6445/TCP
Endpoints: 127.0.0.1:6445
Session Affinity: None
Events: <none>
we can see the Endpoints's ip change:
1.the ip changed to the docker's ip from the former's localhost;
2.the port changed to 6443 from the for former's 6445
updates:
kilo to keep nat nodes's connection squat/kilo#12network_mod:"host" too @huapox hi, i Use aliyun vm and meet this problem too, Did you solve the problem by use kilo at last ?
is there any k3s-agnent cli flags can read node public ip and use it?
thanks
Most helpful comment
@huapox hi, i Use aliyun vm and meet this problem too, Did you solve the problem by use kilo at last ?
is there any k3s-agnent cli flags can read node public ip and use it?
thanks