What kind of request is this (question/bug/enhancement/feature request): Bug
Steps to reproduce (least amount of steps as possible):
Upgrade from v2.2.3 to v2.2.4
Result:
Rancher Server fails to start with the following error mesages:
E0606 07:39:20.296926 8 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v3.Cluster: json: cannot unmarshal number into Go value of type string
Other details that may be helpful:
After a Downgrade to v2.2.3 Rancher is running fine again
Environment information
rancher/rancher/rancher/server image tag or shown bottom left in the UI): v2.2.4Ran into same issue myself trying to update. Although mine is an HA setup with my own certs.
@ptrunk do you also have your own certs as @DJAyth mentioned in their setup?
I am not able to reproduce the issue. I am using self-signed.
@davidnuzik yes i use Traefik in front of Rancher with a letsencrypt certificate. So my (single) rancher is running with --no-cacerts.
Thanks @ptrunk I or someone on my team will investigate further.
@ptrunk can you share the output of kubectl get cluster --all-namespaces -o json (with the sensitive info obfuscated)
@ptrunk @DJAyth Do you have openstack lb configured?
Yes as I matter of fact I do. The rancher setup sits behind an Openstack LB, in the root cluster that gets created when the Rancher setup is knows nothing of Openstack to my knowledge.
Also my SSL setup terminates at Rancher itself on the nginx-ingress, which is different than @ptrunk I believe.
@DJAyth our theory is that the bug can be caused by https://github.com/rancher/types/commit/17ad82ec47246e37d734cf9e87ee6590ea60315d - a fix for https://github.com/rancher/rancher/issues/14577. If your user cluster objects have OpenStack with LB provider configured, and monitorDelay/monitorTimeout parameters set, they'd have to be changed from int to string. We are working on reproducing the issue and figuring out steps to mitigate; will update the bug shortly.
@mrajashree Yes my cluster which is managed by Rancher is using the openstack cloud provider with the openstack lb
@alena1108
root@18d92dabdd1a:/var/lib/rancher# kubectl get cluster --all-namespaces -o json
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "management.cattle.io/v3",
"kind": "Cluster",
"metadata": {
"annotations": {
"authz.management.cattle.io/creator-role-bindings": "{\"created\":[\"cluster-owner\"],\"required\":[\"cluster-owner\"]}",
"field.cattle.io/creatorId": "user-bkxx5",
"field.cattle.io/overwriteAppAnswers": "{\"answers\":{\"exporter-node.enabled\":\"true\",\"exporter-node.ports.metrics.port\":\"9796\",\"exporter-kubelets.https\":\"true\",\"exporter-node.resources.limits.cpu\":\"200m\",\"exporter-node.resources.limits.memory\":\"200Mi\",\"prometheus.retention\":\"24h\",\"grafana.persistence.enabled\":\"true\",\"prometheus.persistence.enabled\":\"true\",\"prometheus.persistence.storageClass\":\"default\",\"grafana.persistence.storageClass\":\"default\",\"grafana.persistence.size\":\"1Gi\",\"prometheus.persistence.size\":\"10Gi\",\"prometheus.resources.core.requests.cpu\":\"750m\",\"prometheus.resources.core.limits.cpu\":\"1000m\",\"prometheus.resources.core.requests.memory\":\"750Mi\",\"prometheus.resources.core.limits.memory\":\"1000Mi\"}}",
"lifecycle.cattle.io/create.cluster-agent-controller-cleanup": "true",
"lifecycle.cattle.io/create.cluster-provisioner-controller": "true",
"lifecycle.cattle.io/create.cluster-scoped-gc": "true",
"lifecycle.cattle.io/create.mgmt-cluster-rbac-remove": "true",
"networking.management.cattle.io/enable-network-policy": "false",
"provisioner.cattle.io/ke-driver-update": "updated"
},
"creationTimestamp": "2019-05-16T13:48:41Z",
"finalizers": [
"controller.cattle.io/cluster-agent-controller-cleanup",
"controller.cattle.io/cluster-scoped-gc",
"controller.cattle.io/cluster-provisioner-controller",
"controller.cattle.io/mgmt-cluster-rbac-remove"
],
"generation": 1,
"labels": {
"cattle.io/creator": "norman"
},
"name": "c-jp966",
"resourceVersion": "2054723",
"selfLink": "/apis/management.cattle.io/v3/clusters/c-jp966",
"uid": "504dbedc-77e1-11e9-a9ac-02420a000140"
},
"spec": {
"description": "Dev Cluster",
"desiredAgentImage": "",
"desiredAuthImage": "",
"displayName": "dev",
"dockerRootDir": "/var/lib/docker",
"enableClusterAlerting": false,
"enableClusterMonitoring": true,
"enableNetworkPolicy": false,
"internal": false,
"localClusterAuthEndpoint": {
"enabled": false
},
"rancherKubernetesEngineConfig": {
"addonJobTimeout": 30,
"authentication": {
"strategy": "x509"
},
"authorization": {},
"bastionHost": {},
"cloudProvider": {
"name": "openstack",
"openstackCloudProvider": {
"blockStorage": {
"bs-version": "",
"ignore-volume-az": false,
"trust-device-path": false
},
"global": {
"auth-url": "https://keystone.url",
"ca-file": "",
"domain-id": "xxx",
"domain-name": "",
"password": "xxx",
"region": "",
"tenant-id": "xxx",
"tenant-name": "",
"trust-id": "",
"user-id": "",
"username": "xxx"
},
"loadBalancer": {
"create-monitor": false,
"floating-network-id": "",
"lb-method": "",
"lb-provider": "",
"lb-version": "",
"manage-security-groups": false,
"monitor-delay": 0,
"monitor-max-retries": 0,
"monitor-timeout": 0,
"subnet-id": "64cf703f-9504-4b32-baa3-741905e8b43a",
"use-octavia": true
},
"metadata": {
"request-timeout": 0,
"search-order": ""
},
"route": {
"router-id": ""
}
}
},
"dns": {
"provider": "coredns",
"upstreamnameservers": [
"1.1.1.1",
"9.9.9.9"
]
},
"ignoreDockerVersion": true,
"ingress": {
"provider": "nginx"
},
"kubernetesVersion": "v1.14.1-rancher1-1",
"monitoring": {
"provider": "metrics-server"
},
"network": {
"plugin": "canal"
},
"restore": {},
"services": {
"etcd": {
"backupConfig": {
"enabled": true,
"intervalHours": 12,
"retention": 6,
"s3BackupConfig": null
},
"creation": "12h",
"extraArgs": {
"election-timeout": "5000",
"heartbeat-interval": "500"
},
"retention": "72h",
"snapshot": false
},
"kubeApi": {
"serviceNodePortRange": "30000-32767"
},
"kubeController": {},
"kubelet": {},
"kubeproxy": {},
"scheduler": {}
},
"sshAgentAuth": false,
"systemImages": {}
}
},
"status": {
"agentImage": "rancher/rancher-agent:v2.2.3",
"allocatable": {
"cpu": "8",
"memory": "16121976Ki",
"pods": "220"
},
"apiEndpoint": "https://192.168.101.4:6443",
"appliedEnableNetworkPolicy": false,
"appliedPodSecurityPolicyTemplateId": "",
"appliedSpec": {
"description": "Dev Cluster",
"desiredAgentImage": "",
"desiredAuthImage": "",
"displayName": "dev",
"dockerRootDir": "/var/lib/docker",
"enableClusterAlerting": false,
"enableClusterMonitoring": true,
"enableNetworkPolicy": false,
"internal": false,
"localClusterAuthEndpoint": {
"enabled": false
},
"rancherKubernetesEngineConfig": {
"addonJobTimeout": 30,
"authentication": {
"strategy": "x509"
},
"authorization": {},
"bastionHost": {},
"cloudProvider": {
"name": "openstack",
"openstackCloudProvider": {
"blockStorage": {
"bs-version": "",
"ignore-volume-az": false,
"trust-device-path": false
},
"global": {
"auth-url": "https://keystone.url",
"ca-file": "",
"domain-id": "xxx",
"domain-name": "",
"password": "xxx",
"region": "",
"tenant-id": "xxx",
"tenant-name": "",
"trust-id": "",
"user-id": "",
"username": "xxx"
},
"loadBalancer": {
"create-monitor": false,
"floating-network-id": "",
"lb-method": "",
"lb-provider": "",
"lb-version": "",
"manage-security-groups": false,
"monitor-delay": 0,
"monitor-max-retries": 0,
"monitor-timeout": 0,
"subnet-id": "64cf703f-9504-4b32-baa3-741905e8b43a",
"use-octavia": true
},
"metadata": {
"request-timeout": 0,
"search-order": ""
},
"route": {
"router-id": ""
}
}
},
"dns": {
"provider": "coredns",
"upstreamnameservers": [
"1.1.1.1",
"9.9.9.9"
]
},
"ignoreDockerVersion": true,
"ingress": {
"provider": "nginx"
},
"kubernetesVersion": "v1.14.1-rancher1-1",
"monitoring": {
"provider": "metrics-server"
},
"network": {
"plugin": "canal"
},
"nodes": [
{
"address": "192.168.101.13",
"hostnameOverride": "dkw02",
"nodeName": "c-jp966:m-0f8b84fb7c0c",
"port": "22",
"role": [
"worker"
],
"user": "root"
},
{
"address": "192.168.101.4",
"hostnameOverride": "dkm01",
"nodeName": "c-jp966:m-449da24cb51f",
"port": "22",
"role": [
"etcd",
"controlplane"
],
"user": "root"
},
{
"address": "192.168.101.7",
"hostnameOverride": "dkw01",
"nodeName": "c-jp966:m-9be6efc52982",
"port": "22",
"role": [
"worker"
],
"user": "root"
}
],
"restore": {},
"services": {
"etcd": {
"backupConfig": {
"enabled": true,
"intervalHours": 12,
"retention": 6,
"s3BackupConfig": null
},
"creation": "12h",
"extraArgs": {
"election-timeout": "5000",
"heartbeat-interval": "500"
},
"retention": "72h",
"snapshot": false
},
"kubeApi": {
"serviceNodePortRange": "30000-32767"
},
"kubeController": {},
"kubelet": {},
"kubeproxy": {},
"scheduler": {}
},
"sshAgentAuth": false,
"systemImages": {
"alpine": "rancher/rke-tools:v0.1.28",
"calicoCni": "rancher/calico-cni:v3.4.0",
"calicoCtl": "rancher/calico-ctl:v2.0.0",
"calicoNode": "rancher/calico-node:v3.4.0",
"canalCni": "rancher/calico-cni:v3.4.0",
"canalFlannel": "rancher/coreos-flannel:v0.10.0",
"canalNode": "rancher/calico-node:v3.4.0",
"certDownloader": "rancher/rke-tools:v0.1.28",
"coredns": "coredns/coredns:1.3.1",
"corednsAutoscaler": "rancher/cluster-proportional-autoscaler:1.3.0",
"dnsmasq": "rancher/k8s-dns-dnsmasq-nanny:1.15.0",
"etcd": "rancher/coreos-etcd:v3.3.10-rancher1",
"flannel": "rancher/coreos-flannel:v0.10.0-rancher1",
"flannelCni": "rancher/flannel-cni:v0.3.0-rancher1",
"ingress": "rancher/nginx-ingress-controller:0.21.0-rancher3",
"ingressBackend": "rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1",
"kubedns": "rancher/k8s-dns-kube-dns:1.15.0",
"kubednsAutoscaler": "rancher/cluster-proportional-autoscaler:1.3.0",
"kubednsSidecar": "rancher/k8s-dns-sidecar:1.15.0",
"kubernetes": "rancher/hyperkube:v1.14.1-rancher1",
"kubernetesServicesSidecar": "rancher/rke-tools:v0.1.28",
"metricsServer": "rancher/metrics-server:v0.3.1",
"nginxProxy": "rancher/rke-tools:v0.1.28",
"podInfraContainer": "rancher/pause:3.1",
"weaveCni": "weaveworks/weave-npc:2.5.0",
"weaveNode": "weaveworks/weave-kube:2.5.0"
}
}
},
"authImage": "",
"caCert": "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN3akNDQWFxZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFTTVJBd0RnWURWUVFERXdkcmRXSmwKTFdOaE1CNFhEVEU1TURVeE5qRXpORGcxTWxvWERUSTVNRFV4TXpFek5EZzFNbG93RWpFUU1BNEdBMVVFQXhNSAphM1ZpWlMxallUQ0NBU0l3RFFZSktvWklodmNOQVFFQkJRQURnZ0VQQURDQ0FRb0NnZ0VCQU95c1BRdE9RcUhnCldhSGZJalo3b2NUSFZ2NzFoVGFmUmVkelhRS212WkJhaGM5N2lVYXdWT3dQN1NOQmFqUlNiRlhFSmpIanZ0cUUKdkk5THcyZFE0YUh3dmpqTzY4REVaSVlob0ozS2Nxb3VjYUZBQnd1ZDVCNGFRRi83dThkbTFRZlpBM1BPNmhhawo1UTRUMkE2eDNkZFB6eXlVaTBjUGoxUk1zTW15Z3gyUklnakRjZUZmVGY5Z2JtTG5Ec2Y1bVZGdjE4b1JpMmdjCjZrbk1tYm1lZnAwbGovc29mNTAxVHo1ckYrRUpZZUVJUHRGQm1ZRUg2cDkySjVzK2FhQWRYS29rKzVkaCswQlMKa3gzc2FKZWJud3RMd25kcUVMdmd1K3NRS051LzM1akluKzFQVzZ4dll2RGRiMHpudms1cTVhZkZJWmNDZnhaawpVZzV5bzFYWlg5RUNBd0VBQWFNak1DRXdEZ1lEVlIwUEFRSC9CQVFEQWdLa01BOEdBMVVkRXdFQi93UUZNQU1CCkFmOHdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBR1dhTFdaRVd3b2pHMGw1KzMwUmNlUWlPZ0d0MStvcUx3ZHoKazlGUkdvZW5Gc0VpNWY0Q1RVWVVodkg5Zm1mTDR3eHBTTkNhY1Z3alY3a3NJTitHK2ErTzM4anhWT0dyKzdpQgoydmd6VnU0bFR4ZTk0djliZVk2Y0JhN0l2MWdWdFk2NTBySkwxMlFMS29OYUgrT3ltVmptMVIxNWxTNmhYOEQ0Cms1NWZSQk1vaCtrR0tmbXBDSkxsUzVtbEo4VVgwVzVtWThWNndSM2JLWmtwVEJQcm9walZGeWRHTEEvQXF0RWYKWjVuU2J6US9UazRXQS9BMjErZUFtZDNHcTdZQkIyRkZFV2pDREcyYk1IVStVVm9xallyN1BKL3QyaGhaU3M4MAp1azdHbTdqTXBWRy8zOWtpWWpvV3dqRHdoWnQ2eVc5eTdsNXA5Tzk3N09zZCs4OVFub289Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K",
"capabilities": {
"ingressCapabilities": [
{
"ingressProvider": "Nginx"
}
],
"loadBalancerCapabilities": {
"enabled": false
},
"nodePortRange": "30000-32767"
},
"capacity": {
"cpu": "8",
"memory": "16326776Ki",
"pods": "220"
},
"componentStatuses": [
{
"conditions": [
{
"message": "ok",
"status": "True",
"type": "Healthy"
}
],
"name": "controller-manager"
},
{
"conditions": [
{
"message": "{\"health\":\"true\"}",
"status": "True",
"type": "Healthy"
}
],
"name": "etcd-0"
},
{
"conditions": [
{
"message": "ok",
"status": "True",
"type": "Healthy"
}
],
"name": "scheduler"
}
],
"conditions": [
{
"status": "True",
"type": "Pending"
},
{
"lastUpdateTime": "2019-05-16T13:50:47Z",
"status": "True",
"type": "Provisioned"
},
{
"lastUpdateTime": "2019-05-16T13:51:02Z",
"status": "True",
"type": "Waiting"
},
{
"lastUpdateTime": "2019-05-16T13:48:41Z",
"status": "True",
"type": "BackingNamespaceCreated"
},
{
"lastUpdateTime": "2019-05-16T13:48:41Z",
"status": "True",
"type": "DefaultProjectCreated"
},
{
"lastUpdateTime": "2019-05-16T13:48:41Z",
"status": "True",
"type": "SystemProjectCreated"
},
{
"lastUpdateTime": "2019-05-16T13:48:41Z",
"status": "True",
"type": "InitialRolesPopulated"
},
{
"lastUpdateTime": "2019-05-16T13:48:41Z",
"status": "True",
"type": "CreatorMadeOwner"
},
{
"lastUpdateTime": "2019-05-16T13:48:41Z",
"status": "True",
"type": "NoDiskPressure"
},
{
"lastUpdateTime": "2019-05-16T13:48:41Z",
"status": "True",
"type": "NoMemoryPressure"
},
{
"lastUpdateTime": "2019-05-16T13:50:47Z",
"status": "True",
"type": "GlobalAdminsSynced"
},
{
"lastUpdateTime": "2019-05-16T13:50:48Z",
"status": "False",
"type": "AlertingEnabled"
},
{
"lastUpdateTime": "2019-05-16T13:50:49Z",
"status": "True",
"type": "SystemAccountCreated"
},
{
"lastUpdateTime": "2019-05-16T13:50:49Z",
"status": "True",
"type": "AgentDeployed"
},
{
"lastUpdateTime": "2019-05-17T22:40:45Z",
"status": "True",
"type": "PrometheusOperatorDeployed"
},
{
"lastUpdateTime": "2019-05-20T15:58:07Z",
"status": "True",
"type": "Updated"
},
{
"lastUpdateTime": "2019-05-16T13:51:00Z",
"status": "True",
"type": "ServiceAccountMigrated"
},
{
"lastUpdateTime": "2019-06-06T08:17:40Z",
"status": "True",
"type": "Ready"
},
{
"lastUpdateTime": "2019-05-31T15:56:21Z",
"status": "True",
"type": "MonitoringEnabled"
}
],
"driver": "rancherKubernetesEngine",
"limits": {
"cpu": "2760m",
"memory": "2565Mi",
"pods": "0"
},
"monitoringStatus": {
"conditions": [
{
"lastUpdateTime": "2019-05-17T22:41:21Z",
"status": "True",
"type": "MetricExpressionDeployed"
}
],
"grafanaEndpoint": "/k8s/clusters/c-jp966/api/v1/namespaces/cattle-prometheus/services/http:access-grafana:80/proxy/"
},
"requested": {
"cpu": "2590m",
"memory": "1817Mi",
"pods": "25"
},
"serviceAccountToken": "xxx",
"version": {
"buildDate": "2019-04-08T17:02:58Z",
"compiler": "gc",
"gitCommit": "b7394102d6ef778017f2ca4046abbaa23b88c290",
"gitTreeState": "clean",
"gitVersion": "v1.14.1",
"goVersion": "go1.12.1",
"major": "1",
"minor": "14",
"platform": "linux/amd64"
}
}
}
],
"kind": "List",
"metadata": {
"resourceVersion": "",
"selfLink": ""
}
}
@alena1108 Thank you very much! You are absolutely right. Changing the values of monitor-delay and monitor-timeout manually by changing the cluster specs (kubectl edit cluster c-xxx) has fixed the problem (at least for me).
Steps to mitigate:
* run kubectl on a rancher management plane, find the affected cluster
* kubectl edit clustername --namespace clusternamespace
* Find the param in the clusterSpec and ClusterStatus, quote the param and add a unit to it. Example: if it was 30, change it to "30s", if it was 0, change it to ""
* Click save
With the fix causing the issue, the assumption was made that the MonitorDelay and MonitorTimeout would never be set on user clusters as with them set the kubelet never starts. But Rancher api framework defaults the int value to 0 if not set. And that causes issues on upgrade.
I'll give that a shot tomorrow morning. Editing the cluster via kubectl will work, but a problem still exists if you try to modify the cluster in the Rancher UI. I can set it to "" with kubectl, but if I edit the cluster in the UI it interprets those as 0 still. But if you save it in the UI they get set back to 0.
As mentioned will test this out tomorrow.
@DJAyth
but a problem still exists if you try to modify the cluster in the Rancher UI. I can set it to "" with kubectl, but if I edit the cluster in the UI it interprets those as 0 still.
Rancher 2.2.4 UI will not set it to 0 anymore once you upgrade your existing clusters via kubectl prior to upgrade. On every subsequent cluster upgrade in 2.2.4, or for newly created clusters, the value will be defaulted to "", as the type in the schema got changed to string.
update the steps of resetting the value from 0 to "" should be performed after the upgrade is attempted, not prior to that. As 2.2.3 understands int, not string, and it would break.
Perfect. I'll give this a shot. Many thanks all.
Hi,
we have the same Problem with the upgrade to 2.2.4 and enabled Openstack LBAAs in 2 Clusters. I tried the workaround, edited the clusters and changed the values for monitor-delay & monitor-timeout.
After that i did an helm upgrade rancher rancher-latest/rancher --reuse-values
But the upgrade fails again:
kubectl --kubeconfig=rancherserver/kube_config_rancher-cluster.yml describe pod -n cattle-system rancher-b95ffcd98-zggn8
Name: rancher-b95ffcd98-zggn8
Namespace: cattle-system
Priority: 0
PriorityClassName: <none>
Node: rancherserver2.XXXXX/10.250.0.116
Start Time: Thu, 06 Jun 2019 21:26:29 +0200
Labels: app=rancher
pod-template-hash=b95ffcd98
release=rancher
Annotations: cni.projectcalico.org/podIP: 192.168.2.28/32
Status: Running
IP: 192.168.2.28
Controlled By: ReplicaSet/rancher-b95ffcd98
Containers:
rancher:
Container ID: docker://9612e698932db42a29468fccc37f987541b00c7a9023ef5c293cfc89bd8197b0
Image: rancher/rancher:v2.2.4
Image ID: docker-pullable://rancher/rancher@sha256:b22d5e239b8750edb639d0fe181c528fccacea67fb23412dff3c5f66e57e44a2
Port: 80/TCP
Host Port: 0/TCP
Args:
--no-cacerts
--http-listen-port=80
--https-listen-port=443
--add-local=auto
State: Running
Started: Thu, 06 Jun 2019 21:28:33 +0200
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 06 Jun 2019 21:26:31 +0200
Finished: Thu, 06 Jun 2019 21:28:32 +0200
Ready: False
Restart Count: 1
Liveness: http-get http://:80/healthz delay=60s timeout=1s period=30s #success=1 #failure=3
Readiness: http-get http://:80/healthz delay=5s timeout=1s period=30s #success=1 #failure=3
Environment:
CATTLE_NAMESPACE: cattle-system
CATTLE_PEER_SERVICE: rancher
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from rancher-token-m7mqf (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
rancher-token-m7mqf:
Type: Secret (a volume populated by a Secret)
SecretName: rancher-token-m7mqf
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m33s default-scheduler Successfully assigned cattle-system/rancher-b95ffcd98-zggn8 to rancherserver2.XXXXX
Normal Pulled 2m30s (x2 over 4m32s) kubelet, rancherserver2.XXXXX Container image "rancher/rancher:v2.2.4" already present on machine
Normal Killing 2m30s kubelet, rancherserver2.XXXXX Killing container with id docker://rancher:Container failed liveness probe.. Container will be killed and recreated.
Normal Created 2m29s (x2 over 4m31s) kubelet, rancherserver2.XXXXX Created container
Normal Started 2m29s (x2 over 4m30s) kubelet, rancherserver2.XXXXX Started container
Warning Unhealthy 12s (x9 over 4m12s) kubelet, rancherserver2.XXXXX Readiness probe failed: Get http://192.168.2.28:80/healthz: dial tcp 192.168.2.28:80: connect: connection refused
Warning Unhealthy 0s (x6 over 3m30s) kubelet, rancherserver2.XXXXXX Liveness probe failed: Get http://192.168.2.28:80/healthz: dial tcp 192.168.2.28:80: connect: connection refused
@Redbierbaron are you seeing this same error?
Failed to list *v3.Cluster: json: cannot unmarshal number into Go value of type string
Also you don't need to upgrade rancher again after editing the clusters since you seem to be already upgraded to 2.2.4?
@Redbierbaron what did you edit the values of monitor-delay & monitor-timeout to?
I just tested it out on HA setup and it works for me,
@mrajashree i did the same steps:
Failed to list *v3.Cluster: json: cannot unmarshal number into Go value of type stringLog output from rancher 2.2.4 container:
2019/06/07 06:13:58 [INFO] Adding peer wss://192.168.0.22/v3/connect, 192.168.0.22
2019/06/07 06:13:58 [INFO] Adding peer wss://192.168.1.28/v3/connect, 192.168.1.28
2019/06/07 06:13:58 [INFO] Adding peer wss://192.168.2.23/v3/connect, 192.168.2.23
2019/06/07 06:13:58 [INFO] Listening on :443
2019/06/07 06:13:58 [INFO] Listening on :80
I0607 06:13:58.194517 10 http.go:110] HTTP2 has been explicitly disabled
2019/06/07 06:13:58 [ERROR] Failed to connect to peer wss://192.168.2.23/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:13:58 [ERROR] Failed to connect to peer wss://192.168.0.22/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:13:58 [ERROR] Failed to connect to peer wss://192.168.1.28/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:13:58 [INFO] Stopping cluster agent for c-7cbb2
I0607 06:13:58.693483 10 http.go:110] HTTP2 has been explicitly disabled
2019/06/07 06:13:59 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:13:59 [INFO] Starting cluster controllers for local
I0607 06:13:59.359300 10 leaderelection.go:187] attempting to acquire leader lease kube-system/cattle-controllers...
2019/06/07 06:13:59 [INFO] Starting cluster agent for local [owner=false]
2019/06/07 06:13:59 [ERROR] ClusterController c-7cbb2 failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:13:59 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:13:59 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:13:59 [ERROR] ClusterController c-7cbb2 failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:13:59 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:13:59 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:13:59 [ERROR] Failed syncing peers [{192.168.2.30 [192.168.0.22 192.168.1.28 192.168.2.23] false false}]: failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:13:59 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:13:59 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:13:59 [ERROR] ClusterController c-c8g4g failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:00 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:00 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:00 [ERROR] ClusterController local failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:00 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:00 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:00 [ERROR] ClusterController c-c8g4g failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:00 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:00 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:00 [ERROR] ClusterController c-7cbb2 failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:00 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:00 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:00 [ERROR] ClusterController c-c8g4g failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:00 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:00 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:00 [ERROR] ClusterController local failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:01 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:02 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:02 [ERROR] ClusterController local failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:02 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:02 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:02 [ERROR] ClusterController c-7cbb2 failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:02 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:02 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:02 [ERROR] ClusterController c-c8g4g failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:03 [ERROR] Failed to connect to peer wss://192.168.0.22/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:14:03 [ERROR] Failed to connect to peer wss://192.168.2.23/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:14:03 [ERROR] Failed to connect to peer wss://192.168.1.28/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:14:04 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:04 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:04 [ERROR] ClusterController local failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:06 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:06 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:06 [ERROR] ClusterController c-7cbb2 failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:06 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:06 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:06 [ERROR] ClusterController c-c8g4g failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:08 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:08 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:08 [ERROR] ClusterController local failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:08 [ERROR] Failed to connect to peer wss://192.168.2.23/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:14:08 [ERROR] Failed to connect to peer wss://192.168.0.22/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:14:08 [ERROR] Failed to connect to peer wss://192.168.1.28/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:14:13 [ERROR] Failed to connect to peer wss://192.168.2.23/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:14:13 [ERROR] Failed to connect to peer wss://192.168.0.22/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:14:13 [ERROR] Failed to connect to peer wss://192.168.1.28/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:14:14 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:14 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:14 [ERROR] ClusterController c-7cbb2 failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:14 [INFO] Stopping cluster agent for c-7cbb2
2019/06/07 06:14:15 [INFO] Handling backend connection request [192.168.0.22]
2019/06/07 06:14:15 [INFO] Handling backend connection request [192.168.2.23]
2019/06/07 06:14:15 [INFO] Handling backend connection request [192.168.1.28]
2019/06/07 06:14:15 [INFO] Removing peer 192.168.0.22
2019/06/07 06:14:15 [INFO] Stopping cluster agent for c-c8g4g
2019/06/07 06:14:15 [ERROR] ClusterController c-c8g4g failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect, failed to start user controllers for cluster c-c8g4g: failed to contact server: Get https://10.250.2.116:6443/version?timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:15 [INFO] Starting cluster controllers for c-7cbb2
2019/06/07 06:14:15 [INFO] error in remotedialer server [400]: websocket: close 1006 unexpected EOF
2019/06/07 06:14:15 [INFO] Starting cluster agent for c-7cbb2 [owner=false]
2019/06/07 06:14:15 [INFO] Starting cluster controllers for c-c8g4g
2019/06/07 06:14:15 [INFO] Starting cluster agent for c-c8g4g [owner=false]
E0607 06:14:25.173896 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.174632 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.174706 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.174853 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.175074 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.175233 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.175298 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.175366 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.178985 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.179154 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.179203 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.179236 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.179298 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.179347 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:25.179551 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.ClusterRoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/watch/clusterrolebindings?resourceVersion=13896140&timeout=30m0s&timeoutSeconds=420: waiting for cluster agent to connect
E0607 06:14:25.179708 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Namespace: Get https://10.250.1.123:6443/api/v1/watch/namespaces?resourceVersion=13896167&timeout=30m0s&timeoutSeconds=489: waiting for cluster agent to connect
E0607 06:14:25.179833 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.RoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/watch/rolebindings?resourceVersion=13896141&timeout=30m0s&timeoutSeconds=391: waiting for cluster agent to connect
E0607 06:14:25.179990 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Service: Get https://10.250.1.123:6443/api/v1/watch/services?resourceVersion=13967675&timeout=30m0s&timeoutSeconds=335: waiting for cluster agent to connect
E0607 06:14:25.180059 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Endpoints: Get https://10.250.1.123:6443/api/v1/watch/endpoints?resourceVersion=14103482&timeout=30m0s&timeoutSeconds=431: waiting for cluster agent to connect
E0607 06:14:25.180392 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.ClusterRole: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/watch/clusterroles?resourceVersion=13898514&timeout=30m0s&timeoutSeconds=439: waiting for cluster agent to connect
E0607 06:14:25.180720 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Role: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/watch/roles?resourceVersion=11190268&timeout=30m0s&timeoutSeconds=524: waiting for cluster agent to connect
2019/06/07 06:14:25 [INFO] Handling backend connection request [c-c8g4g:m-frrh5]
2019/06/07 06:14:25 [INFO] Handling backend connection request [c-c8g4g:m-6z5r7]
E0607 06:14:25.792229 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Endpoints: Get https://10.250.2.116:6443/api/v1/watch/endpoints?resourceVersion=10020221&timeout=30m0s&timeoutSeconds=314: tunnel disconnect
E0607 06:14:26.180090 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:26.181013 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Namespace: Get https://10.250.1.123:6443/api/v1/namespaces?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:26.182038 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.RoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/rolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:26.184885 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Service: Get https://10.250.1.123:6443/api/v1/services?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:26.186042 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Endpoints: Get https://10.250.1.123:6443/api/v1/endpoints?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:26.187371 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRole: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterroles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:26.188219 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Role: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/roles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:26 [ERROR] Failed syncing peers [{192.168.2.30 [192.168.2.23 192.168.1.28] true false}]: failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: waiting for cluster agent to connect
E0607 06:14:26.980279 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Namespace: Get https://10.250.2.116:6443/api/v1/watch/namespaces?resourceVersion=8232163&timeout=30m0s&timeoutSeconds=513: tunnel disconnect
E0607 06:14:26.980651 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.ClusterRole: Get https://10.250.2.116:6443/apis/rbac.authorization.k8s.io/v1/watch/clusterroles?resourceVersion=7152695&timeout=30m0s&timeoutSeconds=325: tunnel disconnect
E0607 06:14:26.980885 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.ClusterRoleBinding: Get https://10.250.2.116:6443/apis/rbac.authorization.k8s.io/v1/watch/clusterrolebindings?resourceVersion=9652537&timeout=30m0s&timeoutSeconds=583: tunnel disconnect
E0607 06:14:26.980964 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Role: Get https://10.250.2.116:6443/apis/rbac.authorization.k8s.io/v1/watch/roles?resourceVersion=478963&timeout=30m0s&timeoutSeconds=484: tunnel disconnect
E0607 06:14:27.180609 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:27.181530 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Namespace: Get https://10.250.1.123:6443/api/v1/namespaces?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:27.182508 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.RoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/rolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:27.187077 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Endpoints: Get https://10.250.1.123:6443/api/v1/endpoints?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:27.188270 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Service: Get https://10.250.1.123:6443/api/v1/services?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:27.188641 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRole: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterroles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:27.189620 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Role: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/roles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:28.181354 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:28.182074 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Namespace: Get https://10.250.1.123:6443/api/v1/namespaces?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:28.183638 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.RoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/rolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:28.187594 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Endpoints: Get https://10.250.1.123:6443/api/v1/endpoints?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:28.188606 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Service: Get https://10.250.1.123:6443/api/v1/services?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:28.189748 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRole: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterroles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:28.191001 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Role: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/roles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
2019/06/07 06:14:35 [INFO] Handling backend connection request [c-7cbb2:m-flcgl]
2019/06/07 06:14:35 [INFO] Handling backend connection request [local]
2019/06/07 06:14:36 [INFO] Handling backend connection request [c-7cbb2:m-t7bb9]
2019/06/07 06:14:38 [ERROR] ClusterController c-7cbb2 [user-controllers-controller] failed with : failed to start user controllers for cluster c-7cbb2: failed to contact server: Get https://10.250.1.123:6443/version?timeout=30s: tunnel disconnect
E0607 06:14:38.505823 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRole: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterroles?limit=500&resourceVersion=0&timeout=30s: tunnel disconnect
E0607 06:14:38.505891 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Namespace: Get https://10.250.1.123:6443/api/v1/namespaces?limit=500&resourceVersion=0&timeout=30s: tunnel disconnect
E0607 06:14:38.505953 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Service: Get https://10.250.1.123:6443/api/v1/services?limit=500&resourceVersion=0&timeout=30s: tunnel disconnect
E0607 06:14:38.506027 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.RoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/rolebindings?limit=500&resourceVersion=0&timeout=30s: tunnel disconnect
E0607 06:14:38.506079 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Role: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/roles?limit=500&resourceVersion=0&timeout=30s: tunnel disconnect
E0607 06:14:38.506174 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Endpoints: Get https://10.250.1.123:6443/api/v1/endpoints?limit=500&resourceVersion=0&timeout=30s: tunnel disconnect
E0607 06:14:38.506265 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?limit=500&resourceVersion=0&timeout=30s: tunnel disconnect
2019/06/07 06:14:42 [INFO] Handling backend connection request [c-c8g4g:m-wg8fx]
2019/06/07 06:14:45 [INFO] error in remotedialer server [400]: read tcp 192.168.2.30:80->192.168.0.0:53314: i/o timeout
2019/06/07 06:14:46 [INFO] error in remotedialer server [400]: read tcp 192.168.2.30:80->10.250.0.116:47922: i/o timeout
2019/06/07 06:14:47 [INFO] Adding peer wss://192.168.0.24/v3/connect, 192.168.0.24
2019/06/07 06:14:48 [INFO] error in remotedialer server [400]: websocket: close 1006 unexpected EOF
2019/06/07 06:14:48 [ERROR] Failed to serve peer connection 192.168.1.28: websocket: close 1006 unexpected EOF
2019/06/07 06:14:48 [INFO] Handling backend connection request [192.168.0.24]
2019/06/07 06:14:48 [INFO] Removing peer 192.168.1.28
2019/06/07 06:14:52 [INFO] error in remotedialer server [400]: read tcp 192.168.2.30:80->192.168.0.0:53344: i/o timeout
2019/06/07 06:14:58 [INFO] error in remotedialer server [400]: websocket: close 1006 unexpected EOF
E0607 06:14:58.195928 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.196023 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.205583 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.ClusterRoleBinding: Get https://10.250.2.116:6443/apis/rbac.authorization.k8s.io/v1/watch/clusterrolebindings?resourceVersion=9652537&timeout=30m0s&timeoutSeconds=555: tunnel disconnect
2019/06/07 06:14:58 [INFO] error in remotedialer server [400]: websocket: close 1006 unexpected EOF
2019/06/07 06:14:58 [INFO] error in remotedialer server [400]: websocket: close 1006 unexpected EOF
E0607 06:14:58.208680 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.209320 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.209366 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.209392 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
W0607 06:14:58.209866 10 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.ClusterRole ended with: very short watch: github.com/rancher/norman/controller/generic_controller.go:175: Unexpected watch close - watch lasted less than a second and no items received
E0607 06:14:58.519873 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.520117 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.520230 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.520150 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.520183 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
W0607 06:14:58.520516 10 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.RoleBinding ended with: very short watch: github.com/rancher/norman/controller/generic_controller.go:175: Unexpected watch close - watch lasted less than a second and no items received
W0607 06:14:58.520570 10 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.Service ended with: very short watch: github.com/rancher/norman/controller/generic_controller.go:175: Unexpected watch close - watch lasted less than a second and no items received
W0607 06:14:58.520635 10 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.Endpoints ended with: very short watch: github.com/rancher/norman/controller/generic_controller.go:175: Unexpected watch close - watch lasted less than a second and no items received
E0607 06:14:58.521904 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.522233 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.522385 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.523129 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Service: Get https://10.250.1.123:6443/api/v1/watch/services?resourceVersion=13967675&timeout=30m0s&timeoutSeconds=571: waiting for cluster agent to connect
E0607 06:14:58.523337 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Namespace: Get https://10.250.1.123:6443/api/v1/watch/namespaces?resourceVersion=13896167&timeout=30m0s&timeoutSeconds=503: waiting for cluster agent to connect
E0607 06:14:58.526138 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.526534 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.527062 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.ClusterRoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/watch/clusterrolebindings?resourceVersion=13896140&timeout=30m0s&timeoutSeconds=407: waiting for cluster agent to connect
E0607 06:14:58.526590 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.527442 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.RoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/watch/rolebindings?resourceVersion=13896141&timeout=30m0s&timeoutSeconds=512: waiting for cluster agent to connect
E0607 06:14:58.526633 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:14:58.527903 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Endpoints: Get https://10.250.1.123:6443/api/v1/watch/endpoints?resourceVersion=14103594&timeout=30m0s&timeoutSeconds=339: waiting for cluster agent to connect
E0607 06:14:58.528008 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Role: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/watch/roles?resourceVersion=12424649&timeout=30m0s&timeoutSeconds=394: waiting for cluster agent to connect
E0607 06:14:58.529388 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.ClusterRole: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/watch/clusterroles?resourceVersion=13898514&timeout=30m0s&timeoutSeconds=480: waiting for cluster agent to connect
E0607 06:14:59.523617 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Service: Get https://10.250.1.123:6443/api/v1/services?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:59.524723 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Namespace: Get https://10.250.1.123:6443/api/v1/namespaces?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:59.527991 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:59.528519 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.RoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/rolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:59.529667 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Endpoints: Get https://10.250.1.123:6443/api/v1/endpoints?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:59.533158 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRole: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterroles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:14:59.534011 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Role: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/roles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:00.524110 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Service: Get https://10.250.1.123:6443/api/v1/services?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:00.525285 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Namespace: Get https://10.250.1.123:6443/api/v1/namespaces?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:00.528585 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:00.529406 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.RoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/rolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:00.530517 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Endpoints: Get https://10.250.1.123:6443/api/v1/endpoints?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:00.533672 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRole: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterroles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:00.534630 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Role: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/roles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:01.524610 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Service: Get https://10.250.1.123:6443/api/v1/services?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:01.525750 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Namespace: Get https://10.250.1.123:6443/api/v1/namespaces?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:01.528951 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:01.530376 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.RoleBinding: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/rolebindings?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:01.531547 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Endpoints: Get https://10.250.1.123:6443/api/v1/endpoints?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:01.534060 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ClusterRole: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/clusterroles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
E0607 06:15:01.535079 10 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.Role: Get https://10.250.1.123:6443/apis/rbac.authorization.k8s.io/v1/roles?limit=500&resourceVersion=0&timeout=30s: waiting for cluster agent to connect
2019/06/07 06:15:02 [INFO] Handling backend connection request [c-7cbb2:m-jxc89]
2019/06/07 06:15:03 [INFO] Handling backend connection request [c-c8g4g:m-jhfl2]
2019/06/07 06:15:08 [INFO] Handling backend connection request [c-7cbb2:m-ktxln]
2019/06/07 06:15:08 [INFO] Handling backend connection request [c-c8g4g:m-6z5r7]
2019/06/07 06:15:08 [INFO] Handling backend connection request [local]
2019/06/07 06:15:08 [INFO] Handling backend connection request [c-7cbb2:m-jxc89]
2019/06/07 06:15:13 [INFO] Handling backend connection request [c-7cbb2:m-t7bb9]
2019/06/07 06:15:13 [INFO] error in remotedialer server [400]: read tcp 192.168.2.30:80->192.168.0.0:53798: i/o timeout
2019/06/07 06:15:16 [INFO] Handling backend connection request [c-7cbb2:m-4tlrs]
2019/06/07 06:15:18 [INFO] error in remotedialer server [400]: read tcp 192.168.2.30:80->192.168.1.0:47712: i/o timeout
2019/06/07 06:15:18 [INFO] error in remotedialer server [400]: read tcp 192.168.2.30:80->192.168.1.0:47718: i/o timeout
2019/06/07 06:15:26 [INFO] error in remotedialer server [400]: websocket: close 1006 unexpected EOF
E0607 06:15:26.082295 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:26.082345 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:26.082448 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:26.082471 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:26.082780 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:26.082802 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:26.082831 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
W0607 06:15:26.087924 10 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.Role ended with: too old resource version: 11190268 (12424649)
2019/06/07 06:15:26 [INFO] error in remotedialer server [400]: read tcp 192.168.2.30:80->192.168.0.0:53898: i/o timeout
W0607 06:15:31.028242 10 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.Role ended with: too old resource version: 477283 (478963)
2019/06/07 06:15:34 [INFO] Handling backend connection request [c-c8g4g:m-sswhn]
E0607 06:15:37.079819 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
2019/06/07 06:15:40 [INFO] Handling backend connection request [c-c8g4g:m-wg8fx]
2019/06/07 06:15:41 [INFO] Handling backend connection request [c-c8g4g:m-6z5r7]
2019/06/07 06:15:46 [INFO] Adding peer wss://192.168.1.30/v3/connect, 192.168.1.30
2019/06/07 06:15:47 [ERROR] Failed to serve peer connection 192.168.2.23: websocket: close 1006 unexpected EOF
2019/06/07 06:15:47 [INFO] error in remotedialer server [400]: websocket: close 1006 unexpected EOF
E0607 06:15:47.078038 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:47.078047 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:47.078131 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:47.078545 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:47.078609 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:47.079481 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:47.079575 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
2019/06/07 06:15:47 [INFO] Removing peer 192.168.2.23
2019/06/07 06:15:49 [INFO] Handling backend connection request [192.168.1.30]
2019/06/07 06:15:50 [INFO] error in remotedialer server [400]: read tcp 192.168.2.30:80->192.168.0.0:53968: i/o timeout
2019/06/07 06:15:55 [INFO] Handling backend connection request [c-7cbb2:m-4tlrs]
E0607 06:15:56.372190 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.372436 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.372474 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.372454 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.372519 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.372944 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.372972 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.393681 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Role: Get https://10.250.2.116:6443/apis/rbac.authorization.k8s.io/v1/watch/roles?resourceVersion=478963&timeout=30m0s&timeoutSeconds=476: tunnel disconnect
2019/06/07 06:15:56 [INFO] error in remotedialer server [400]: websocket: close 1006 unexpected EOF
2019/06/07 06:15:56 [INFO] error in remotedialer server [400]: read tcp 192.168.2.30:80->10.250.0.116:48140: read: connection reset by peer
E0607 06:15:56.403368 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.403475 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.404067 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.404323 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.405687 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Namespace: Get https://10.250.2.116:6443/api/v1/watch/namespaces?resourceVersion=8232163&timeout=30m0s&timeoutSeconds=376: tunnel disconnect
E0607 06:15:56.405843 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Service: Get https://10.250.2.116:6443/api/v1/watch/services?resourceVersion=9657754&timeout=30m0s&timeoutSeconds=568: tunnel disconnect
E0607 06:15:56.406710 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.RoleBinding: Get https://10.250.2.116:6443/apis/rbac.authorization.k8s.io/v1/watch/rolebindings?resourceVersion=8193185&timeout=30m0s&timeoutSeconds=581: tunnel disconnect
E0607 06:15:56.406866 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.ClusterRole: Get https://10.250.2.116:6443/apis/rbac.authorization.k8s.io/v1/watch/clusterroles?resourceVersion=7152695&timeout=30m0s&timeoutSeconds=437: tunnel disconnect
2019/06/07 06:15:56 [INFO] error in remotedialer server [400]: websocket: close 1006 unexpected EOF
2019/06/07 06:15:56 [INFO] error in remotedialer server [400]: websocket: close 1006 unexpected EOF
E0607 06:15:56.924283 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
W0607 06:15:56.924341 10 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.Endpoints ended with: very short watch: github.com/rancher/norman/controller/generic_controller.go:175: Unexpected watch close - watch lasted less than a second and no items received
2019/06/07 06:15:56 [INFO] error in remotedialer server [400]: websocket: close 1006 unexpected EOF
E0607 06:15:56.924657 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
W0607 06:15:56.924677 10 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.ClusterRoleBinding ended with: very short watch: github.com/rancher/norman/controller/generic_controller.go:175: Unexpected watch close - watch lasted less than a second and no items received
E0607 06:15:56.924715 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.924732 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
W0607 06:15:56.924736 10 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.RoleBinding ended with: very short watch: github.com/rancher/norman/controller/generic_controller.go:175: Unexpected watch close - watch lasted less than a second and no items received
E0607 06:15:56.929528 10 reflector.go:251] github.com/rancher/norman/controller/generic_controller.go:175: Failed to watch *v1.Service: Get https://10.250.1.123:6443/api/v1/watch/services?resourceVersion=13967675&timeout=30m0s&timeoutSeconds=495: failed to find Session for client c-7cbb2:m-z7nq8
E0607 06:15:56.929605 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
E0607 06:15:56.929685 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
W0607 06:15:56.929781 10 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.Role ended with: very short watch: github.com/rancher/norman/controller/generic_controller.go:175: Unexpected watch close - watch lasted less than a second and no items received
E0607 06:15:56.929941 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
2019/06/07 06:15:57 [INFO] Handling backend connection request [local:machine-wzbv6]
2019/06/07 06:15:57 [INFO] Handling backend connection request [c-7cbb2]
2019/06/07 06:15:57 [INFO] Handling backend connection request [local:machine-4hpdd]
E0607 06:15:59.735036 10 streamwatcher.go:109] Unable to decode an event from the watch stream: tunnel disconnect
W0607 06:15:59.740022 10 reflector.go:270] github.com/rancher/norman/controller/generic_controller.go:175: watch of *v1.Role ended with: too old resource version: 477283 (478963)
after 10 Minutes of waiting, hundreds of error and restarts of the rancher-containers, the upgrade finishs.... the errors in the log correspond an issue https://github.com/rancher/rancher/issues/20265 (apparently nobody of the rancher team interested in....)
@Redbierbaron looks like a different issue, as the logs don't show cannot unmarshal number into Go value of type string anymore, and instead you see:
2019/06/07 06:13:58 [ERROR] Failed to connect to peer wss://192.168.2.23/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:13:58 [ERROR] Failed to connect to peer wss://192.168.0.22/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:13:58 [ERROR] Failed to connect to peer wss://192.168.1.28/v3/connect [local ID=192.168.2.30]: websocket: bad handshake
2019/06/07 06:13:58 [INFO] Stopping cluster agent for c-7cbb2
So yeah, if this could just work that would be great. ;)
This morning I edited the cluster via kubectl changing the values for the OpenStack LB to double quotes instead of the 0 it was. However, I still see the exact same behaviour, update fails, same error out of the pod. However, to top it off this time, now my second cluster is missing from the Rancher UI! It still shows with kubectl so it's there, but the UI doesn't see it at all.
Went back through the entire cluster and reset the values back to what they were and the cluster is back in the UI now.
@DJAyth can you try the steps mentioned by @kinarashah - she specifically tested it for HA use case:
In 2.2.3, those values were 0
I upgraded to 2.2.4, upgrade fails
I edited clusters to make them ""
pods start running and I can access the login page
If you change the value while being on 2.2.3, the 2.2.3 UI will fail to load as it doesn't understand string. I've updated my initial comment to reflect this info. Please let us know how if that works for you
This happens on OpenStack even when not using the load balancer. Those fields were still populated even while I was not specifying the loadBalancer block at all when creating the cluster.
The relevant subsection of "kubectl get cluster --all-namespaces -o json" looks like:
"loadBalancer": {
"create-monitor": false,
"floating-network-id": "",
"lb-method": "",
"lb-provider": "",
"lb-version": "",
"manage-security-groups": false,
"monitor-delay": 0,
"monitor-max-retries": 0,
"monitor-timeout": 0,
"subnet-id": "",
"use-octavia": false
},
Also found the workaround steps somewhat confusing. In that I was trying to find and edit those settings (monitor-delay and monitor-timeout) on the "local" cluster, cause that is the one to which I am re-deploying rancher server. And obviously not finding it cause "local" cluster is "imported" and won't show you these values. But in fact, it was the subordinate Custom cluster that I needed to edit. And the values for that one were affecting the inability for rancher container to start in the Rancher main "local" cluster. Just FYI for anyone struggling.
To summarize:
We had the same issue trying to update from 2..2.2 to 2.24, followed the instructions and had rancher running with 2.2.4 finally.
BUT, with every change of a client cluster (editing the cluster.yaml), the cluster is seen unhealthy by rancher.
This cluster is currently Updating.
[etcd] Failed to bring up Etcd Plane: [etcd] Etcd Cluster is not healthy
Enabled DEBUG shows the following error on the rancher application container:
`
etcd in client clusters are with 3 healthy member
That looks like known issues with rke (e.g. https://github.com/rancher/rke/issues/1244), where a missing rke cluster state file let rke create new certificates not recognizing that there is a cluster.
Never thought something like this can happen with rancher, with having rke state files as config map in every client cluster. I verified that the configmap full-cluster-state still contains original certificates, but the state files under /var/lib/rancher/management-state/rke in the rancher application container contains new certificates geenrated after update.
Now it seems that all our client clusters are unusable furthermore, either already in faulty state or will be with next change in cluster.yaml.
Anybody with same experiences? Any suggestion how to solve this? How to force rancher to use full-cluster-state configmap? How to avoid to get new certificates created which makes (production) client cluster unusable?
@galal-hussein has this been seen before? after upgrade to 2.2.4
I verified that the configmap full-cluster-state still contains original certificates, but the state files under /var/lib/rancher/management-state/rke in the rancher application container contains new certificates geenrated after update.
@galal-hussein, @kinarashah
I opened a separate issue for the above problem: https://github.com/rancher/rancher/issues/20909
Some more analysis showed that rancher master is constantly create new certificates for the faulty client cluster, but with this certificates the (unchanged, but working) etcds of client cluster can not be reached.
@surco1971 the issue you are seeing, is unrelated to the original issue described in the first comment. Can you file it as a separate bug?
@alena1108 , that is what I did - see above comment - https://github.com/rancher/rancher/issues/20909. And I mentioned it here, because the origin of all is to try to update to 2.2.4. So it is a kind of related, at least for users, which potentially can at the end having faulty client clusters.
I am seeing the same issue. My steps:
???
I got nothing. Help?
@tibers ideally, there's no need to do step 5.
Could you confirm step 3 was applied correctly by doing kubectl get cluster --all-namespaces -o json and checking values of monitor-delay & monitor-timeout?
Since we have a mix of openstack clusters and all sorts of other fun stuff in rancher, what I did was:
for h in `kubectl get clusters | grep -v NAME | cut -f 1 -d ' '`
do
kubectl edit cluster $h
done
Which edits EVERYTHING, and then while I was in vi I changed monitor-delay and monitor-timeout to just monitor-delay: "" and monitor-timeout: "". Then trying to get 2.2.4 installed still throws the error. Is there possibly another value which has to be set to ""?
@tibers you can set it to any acceptable value for delay and timeout based on your use case, just needs to be in string format
I went back and changed both those to "1s" and have the same problem:
reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v3.Cluster: json: cannot unmarshal number into Go value of type string
EDIT: I win the "learn to read" prize today, there were multiple loadbalancer: fields in there so there were multiple instances of each monitor line. Changing all of them fixed the issue.
I saw the problem in client v2.2.0:
cannot unmarshal string into Go struct field LoadBalancerOpenstackOpts.monitor-delay of type int64
But downloading latest rc for client v2.3. instead solved it for me.
Rancher upgrade from 2.2.3 to 2.2.9 went well with the workaround (manually editing those cluster values from 0 to "" with kubectl).
However now that I try to upgrade k8s from v1.13.5-rancher1-2 to v1.14.9-rancher1-1 in one of those clusters I'm getting:
[ERROR] ClusterController c-xxxxx [certificate-expiration] failed with : json: cannot unmarshal number into Go struct field LoadBalancerOpenstackOpts.monitor-delay of type string
I'm not really sure where could rancher be getting the int-type value since the values for monitor-delay and monitor-timeout are strings in the cluster configuration.
I tried including both parameters as string in the yaml file when editing the cluster from rancher web interface but I get the same results.
As a result the cluster is permanently stuck in a Updating state showing CA Certificate or Key is empty
Extra info: our clusters dont use Openstack LoadBalancer and they were provisioned with rancher older than 2.2.0
Has anyone tried this? @kinarashah do you maybe have an idea?
Thanks in advance for any help!
@pameladelgado could you post your cluster yaml?
@kinarashah sure
cluster.txt
I can also create a separate issue.
For anyone else seeing the same issue as @pameladelgado, we fixed it using: https://github.com/rancher/rancher/issues/20909#issuecomment-521352246
Cert rotate helped to fix this issue
rke cert rotate --config cluster.yml
Most helpful comment
@alena1108 Thank you very much! You are absolutely right. Changing the values of
monitor-delayandmonitor-timeoutmanually by changing the cluster specs (kubectl edit cluster c-xxx)has fixed the problem (at least for me).