When cluster is not changed, established connection count from each calico-node pod to the apiserver should remain (more or less) constant.
Established connection count from calico-node to apiserver increases overtime. (added 60 connections over 2 days). No leaking when using ETCD datastore driver.
Not sure
It accumulates in large amount of unclosed TCP connections on the apiservers, which we have to regularly kill all calico-node to reset them.
Posting the yaml file used to deploy our Calico (slightly modified from https://docs.projectcalico.org/v2.6/getting-started/kubernetes/installation/hosted/kubernetes-datastore/) Mainly:
# Calico Version v2.6.2
# https://docs.projectcalico.org/v2.6/releases#v2.6.2
# This manifest includes the following component versions:
# calico/node:v2.6.2
# calico/cni:v1.11.0
# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
# The CNI network configuration to install on each node.
cni_network_config: |-
{
"name": "k8s-pod-network",
"cniVersion": "0.1.0",
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "__KUBERNETES_NODE_NAME__",
"mtu": 1500,
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},
"policy": {
"type": "k8s",
"k8s_auth_token": "__SERVICEACCOUNT_TOKEN__"
},
"kubernetes": {
"k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__",
"kubeconfig": "__KUBECONFIG_FILEPATH__"
}
}
---
# HACK: This is the temporary patch for fixing the log spam from confd
# Seems fixed in a later beta version, we will keep it here until a new stable version is released.
# https://github.com/projectcalico/calico/issues/985
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-confd-patch
namespace: kube-system
data:
run: |-
#!/bin/sh
exec 2>&1
if [ "$DATASTORE_TYPE" = "kubernetes" ]
then
exec confd -confdir=/etc/calico/confd -interval=5 -backend=k8s
else
ETCD_NODE=${ETCD_ENDPOINTS:=${ETCD_SCHEME:=http}://${ETCD_AUTHORITY}}
ETCD_ENDPOINTS_CONFD=`echo "-node=$ETCD_NODE" | sed -e 's/,/ -node=/g'`
exec confd -confdir=/etc/calico/confd -interval=5 -watch \
$ETCD_ENDPOINTS_CONFD -client-key=${ETCD_KEY_FILE} \
-client-cert=${ETCD_CERT_FILE} -client-ca-keys=${ETCD_CA_CERT_FILE}
fi
---
# This manifest installs the calico/node container, as well
# as the Calico CNI plugins and network config on
# each master and worker node in a Kubernetes cluster.
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: calico-node
namespace: kube-system
labels:
k8s-app: calico-node
spec:
selector:
matchLabels:
k8s-app: calico-node
template:
metadata:
labels:
k8s-app: calico-node
annotations:
# This, along with the CriticalAddonsOnly toleration below,
# marks the pod as a critical add-on, ensuring it gets
# priority scheduling and that its resources are reserved
# if it ever gets evicted.
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
hostNetwork: true
serviceAccountName: calico-node
tolerations:
# Allow the pod to run on the master. This is required for
# the master to communicate with pods.
- key: dedicated
operator: "Exists"
effect: NoSchedule
# Mark the pod as a critical add-on for rescheduling.
- key: "CriticalAddonsOnly"
operator: "Exists"
# Minimize downtime during a rolling upgrade or deletion; tell Kubernetes to do a "force
# deletion": https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods.
terminationGracePeriodSeconds: 0
containers:
# Runs calico/node container on each Kubernetes node. This
# container programs network policy and routes on each
# host.
- name: calico-node
image: calico/node:v2.6.2
env:
# Use Kubernetes API as the backing datastore.
- name: DATASTORE_TYPE
value: "kubernetes"
# Enable felix info logging.
- name: FELIX_LOGSEVERITYSCREEN
value: "info"
# Cluster type to identify the deployment type
- name: CLUSTER_TYPE
value: "k8s,bgp"
# Disable file logging so `kubectl logs` works.
- name: CALICO_DISABLE_FILE_LOGGING
value: "true"
# Set Felix endpoint to host default action to ACCEPT.
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: "ACCEPT"
# Disable IPV6 on Kubernetes.
- name: FELIX_IPV6SUPPORT
value: "false"
# Set MTU for tunnel device used if ipip is enabled
- name: FELIX_IPINIPMTU
value: "1440"
# Wait for the datastore.
- name: WAIT_FOR_DATASTORE
value: "true"
# The Calico IPv4 pool to use. This should match `--cluster-cidr`
- name: CALICO_IPV4POOL_CIDR
value: "100.200.0.0/16"
# Enable IPIP
- name: CALICO_IPV4POOL_IPIP
value: "off"
# Enable IP-in-IP within Felix.
- name: FELIX_IPINIPENABLED
value: "true"
# Set based on the k8s node name.
- name: NODENAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
# Make sure IP is always autodetected even if already exists in the node resource configuration
# https://docs.projectcalico.org/v2.4/reference/node/configuration#ip-autodetection-methods
- name: IP
value: "autodetect"
- name: FELIX_HEALTHENABLED
value: "true"
securityContext:
privileged: true
resources:
requests:
cpu: 250m
livenessProbe:
httpGet:
path: /liveness
port: 9099
periodSeconds: 10
initialDelaySeconds: 10
failureThreshold: 6
readinessProbe:
httpGet:
path: /readiness
port: 9099
periodSeconds: 10
volumeMounts:
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /var/run/calico
name: var-run-calico
readOnly: false
# HACK: temporary fix for confd log spam
# https://github.com/projectcalico/calico/issues/985
- mountPath: /etc/service/enabled/confd/run
name: confd-patch
subPath: run
readOnly: true
# This container installs the Calico CNI binaries
# and CNI network config file on each node.
- name: install-cni
image: calico/cni:v1.11.0
command: ["/install-cni.sh"]
env:
# The CNI network config to install on each node.
- name: CNI_NETWORK_CONFIG
valueFrom:
configMapKeyRef:
name: calico-config
key: cni_network_config
# Set the hostname based on the k8s node name.
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
volumes:
# Used by calico/node.
- name: lib-modules
hostPath:
path: /lib/modules
- name: var-run-calico
hostPath:
path: /var/run/calico
# Used to install CNI.
- name: cni-bin-dir
hostPath:
path: /opt/cni/bin
- name: cni-net-dir
hostPath:
path: /etc/cni/net.d
# HACK: temporary fix for confd log spam
# https://github.com/projectcalico/calico/issues/985
- name: confd-patch
configMap:
name: calico-confd-patch
defaultMode: 0755
# Create all the CustomResourceDefinitions needed for
# Calico policy and networking mode.
---
apiVersion: apiextensions.k8s.io/v1beta1
description: Calico Global Felix Configuration
kind: CustomResourceDefinition
metadata:
name: globalfelixconfigs.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: GlobalFelixConfig
plural: globalfelixconfigs
singular: globalfelixconfig
---
apiVersion: apiextensions.k8s.io/v1beta1
description: Calico BGP Peers
kind: CustomResourceDefinition
metadata:
name: bgppeers.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: BGPPeer
plural: bgppeers
singular: bgppeer
---
apiVersion: apiextensions.k8s.io/v1beta1
description: Calico Global BGP Configuration
kind: CustomResourceDefinition
metadata:
name: globalbgpconfigs.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: GlobalBGPConfig
plural: globalbgpconfigs
singular: globalbgpconfig
---
apiVersion: apiextensions.k8s.io/v1beta1
description: Calico IP Pools
kind: CustomResourceDefinition
metadata:
name: ippools.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: IPPool
plural: ippools
singular: ippool
---
apiVersion: apiextensions.k8s.io/v1beta1
description: Calico Global Network Policies
kind: CustomResourceDefinition
metadata:
name: globalnetworkpolicies.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: GlobalNetworkPolicy
plural: globalnetworkpolicies
singular: globalnetworkpolicy
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: calico-node
namespace: kube-system
---
# Calico Version v2.6.2
# https://docs.projectcalico.org/v2.6/releases#v2.6.2
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: calico-node
rules:
- apiGroups: [""]
resources:
- namespaces
verbs:
- get
- list
- watch
- apiGroups: [""]
resources:
- pods/status
verbs:
- update
- apiGroups: [""]
resources:
- pods
verbs:
- get
- list
- watch
- apiGroups: [""]
resources:
- nodes
verbs:
- get
- list
- update
- watch
- apiGroups: ["extensions"]
resources:
- networkpolicies
verbs:
- get
- list
- watch
- apiGroups: ["crd.projectcalico.org"]
resources:
- globalfelixconfigs
- bgppeers
- globalbgpconfigs
- ippools
- globalnetworkpolicies
verbs:
- create
- get
- list
- update
- watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: calico-node
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: calico-node
subjects:
- kind: ServiceAccount
name: calico-node
namespace: kube-system
@javefang Please can you try with Calico v2.6.3? We made some fixes in that area that may have resolved this.
Awesome! I will give it a go tomorrow and report back in a few days (to see if the conncetion still leaks).
OK running 2.6.3 (and CNI 1.11.1) now. 2 hours passed, looking good so far! Will report back in 24h
With this result I'm convinced that the connection leak issue has been fixed by 2.6.3. Thanks for the great work guys!
(2.6.3 deployed at around 9am on 7 Dec, and no connection count increase since then)

Awesome, thank you for reporting your results back. :+1: