If you use the EKS setup and follow the instructions on how to configure an overlay network, you get a crashing cilium-operator. The issue is that the helm options you get from those instructions deploy the cilium-operator-generic, which doesn't support the ipam=eni config option.
The operator fails with this log
level=fatal msg="eni allocator is not supported by this version of cilium-operator-generic" subsys=cilium-operator-generic
General Information
How to reproduce the issue
deploy.yaml (rendered by helm)
---
# Source: cilium/charts/agent/templates/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: cilium
namespace: kube-system
---
# Source: cilium/charts/operator/templates/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: cilium-operator
namespace: kube-system
---
# Source: cilium/charts/config/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cilium-config
namespace: kube-system
data:
# Identity allocation mode selects how identities are shared between cilium
# nodes by setting how they are stored. The options are "crd" or "kvstore".
# - "crd" stores identities in kubernetes as CRDs (custom resource definition).
# These can be queried with:
# kubectl get ciliumid
# - "kvstore" stores identities in a kvstore, etcd or consul, that is
# configured below. Cilium versions before 1.6 supported only the kvstore
# backend. Upgrades from these older cilium versions should continue using
# the kvstore by commenting out the identity-allocation-mode below, or
# setting it to "kvstore".
identity-allocation-mode: crd
# If you want to run cilium in debug mode change this value to true
debug: "false"
# Enable IPv4 addressing. If enabled, all endpoints are allocated an IPv4
# address.
enable-ipv4: "true"
# Enable IPv6 addressing. If enabled, all endpoints are allocated an IPv6
# address.
enable-ipv6: "false"
enable-bpf-clock-probe: "true"
# If you want cilium monitor to aggregate tracing for packets, set this level
# to "low", "medium", or "maximum". The higher the level, the less packets
# that will be seen in monitor output.
monitor-aggregation: medium
# The monitor aggregation interval governs the typical time between monitor
# notification events for each allowed connection.
#
# Only effective when monitor aggregation is set to "medium" or higher.
monitor-aggregation-interval: 5s
# The monitor aggregation flags determine which TCP flags which, upon the
# first observation, cause monitor notifications to be generated.
#
# Only effective when monitor aggregation is set to "medium" or higher.
monitor-aggregation-flags: all
# bpf-policy-map-max specified the maximum number of entries in endpoint
# policy map (per endpoint)
bpf-policy-map-max: "16384"
# Specifies the ratio (0.0-1.0) of total system memory to use for dynamic
# sizing of the TCP CT, non-TCP CT, NAT and policy BPF maps.
bpf-map-dynamic-size-ratio: "0.0025"
# Pre-allocation of map entries allows per-packet latency to be reduced, at
# the expense of up-front memory allocation for the entries in the maps. The
# default value below will minimize memory usage in the default installation;
# users who are sensitive to latency may consider setting this to "true".
#
# This option was introduced in Cilium 1.4. Cilium 1.3 and earlier ignore
# this option and behave as though it is set to "true".
#
# If this value is modified, then during the next Cilium startup the restore
# of existing endpoints and tracking of ongoing connections may be disrupted.
# This may lead to policy drops or a change in loadbalancing decisions for a
# connection for some time. Endpoints may need to be recreated to restore
# connectivity.
#
# If this option is set to "false" during an upgrade from 1.3 or earlier to
# 1.4 or later, then it may cause one-time disruptions during the upgrade.
preallocate-bpf-maps: "false"
# Regular expression matching compatible Istio sidecar istio-proxy
# container image names
sidecar-istio-proxy-image: "cilium/istio_proxy"
# Encapsulation mode for communication between nodes
# Possible values:
# - disabled
# - vxlan (default)
# - geneve
tunnel: vxlan
# Name of the cluster. Only relevant when building a mesh of clusters.
cluster-name: default
# wait-bpf-mount makes init container wait until bpf filesystem is mounted
wait-bpf-mount: "false"
masquerade: "true"
enable-bpf-masquerade: "true"
egress-masquerade-interfaces: eth0
enable-xt-socket-fallback: "true"
install-iptables-rules: "true"
auto-direct-node-routes: "false"
kube-proxy-replacement: "probe"
enable-health-check-nodeport: "true"
node-port-bind-protection: "true"
enable-auto-protect-node-port-range: "true"
enable-session-affinity: "true"
enable-endpoint-health-checking: "true"
enable-well-known-identities: "false"
enable-remote-node-identity: "true"
operator-api-serve-addr: "127.0.0.1:9234"
ipam: "eni"
disable-cnp-status-updates: "true"
---
# Source: cilium/charts/agent/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cilium
rules:
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- get
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- namespaces
- services
- nodes
- endpoints
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- nodes
- nodes/status
verbs:
- patch
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- create
- get
- list
- watch
- update
- apiGroups:
- cilium.io
resources:
- ciliumnetworkpolicies
- ciliumnetworkpolicies/status
- ciliumclusterwidenetworkpolicies
- ciliumclusterwidenetworkpolicies/status
- ciliumendpoints
- ciliumendpoints/status
- ciliumnodes
- ciliumnodes/status
- ciliumidentities
verbs:
- '*'
---
# Source: cilium/charts/operator/templates/clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cilium-operator
rules:
- apiGroups:
- ""
resources:
# to automatically delete [core|kube]dns pods so that are starting to being
# managed by Cilium
- pods
verbs:
- get
- list
- watch
- delete
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
# to perform the translation of a CNP that contains `ToGroup` to its endpoints
- services
- endpoints
# to check apiserver connectivity
- namespaces
verbs:
- get
- list
- watch
- apiGroups:
- cilium.io
resources:
- ciliumnetworkpolicies
- ciliumnetworkpolicies/status
- ciliumclusterwidenetworkpolicies
- ciliumclusterwidenetworkpolicies/status
- ciliumendpoints
- ciliumendpoints/status
- ciliumnodes
- ciliumnodes/status
- ciliumidentities
- ciliumidentities/status
verbs:
- '*'
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- get
- list
- watch
---
# Source: cilium/charts/agent/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cilium
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cilium
subjects:
- kind: ServiceAccount
name: cilium
namespace: kube-system
---
# Source: cilium/charts/operator/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cilium-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cilium-operator
subjects:
- kind: ServiceAccount
name: cilium-operator
namespace: kube-system
---
# Source: cilium/charts/agent/templates/daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
k8s-app: cilium
name: cilium
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: cilium
template:
metadata:
annotations:
# This annotation plus the CriticalAddonsOnly toleration makes
# cilium to be a critical pod in the cluster, which ensures cilium
# gets priority scheduling.
# https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
k8s-app: cilium
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- cilium
topologyKey: kubernetes.io/hostname
containers:
- args:
- --config-dir=/tmp/cilium/config-map
command:
- cilium-agent
livenessProbe:
httpGet:
host: '127.0.0.1'
path: /healthz
port: 9876
scheme: HTTP
httpHeaders:
- name: "brief"
value: "true"
failureThreshold: 10
# The initial delay for the liveness probe is intentionally large to
# avoid an endless kill & restart cycle if in the event that the initial
# bootstrapping takes longer than expected.
initialDelaySeconds: 120
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
httpGet:
host: '127.0.0.1'
path: /healthz
port: 9876
scheme: HTTP
httpHeaders:
- name: "brief"
value: "true"
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 5
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CILIUM_K8S_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: CILIUM_FLANNEL_MASTER_DEVICE
valueFrom:
configMapKeyRef:
key: flannel-master-device
name: cilium-config
optional: true
- name: CILIUM_FLANNEL_UNINSTALL_ON_EXIT
valueFrom:
configMapKeyRef:
key: flannel-uninstall-on-exit
name: cilium-config
optional: true
- name: CILIUM_CLUSTERMESH_CONFIG
value: /var/lib/cilium/clustermesh/
- name: CILIUM_CNI_CHAINING_MODE
valueFrom:
configMapKeyRef:
key: cni-chaining-mode
name: cilium-config
optional: true
- name: CILIUM_CUSTOM_CNI_CONF
valueFrom:
configMapKeyRef:
key: custom-cni-conf
name: cilium-config
optional: true
image: "docker.io/cilium/cilium:v1.8.2"
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- "/cni-install.sh"
- "--enable-debug=false"
preStop:
exec:
command:
- /cni-uninstall.sh
name: cilium-agent
securityContext:
capabilities:
add:
- NET_ADMIN
- SYS_MODULE
privileged: true
volumeMounts:
- mountPath: /sys/fs/bpf
name: bpf-maps
- mountPath: /var/run/cilium
name: cilium-run
- mountPath: /host/opt/cni/bin
name: cni-path
- mountPath: /host/etc/cni/net.d
name: etc-cni-netd
- mountPath: /var/lib/cilium/clustermesh
name: clustermesh-secrets
readOnly: true
- mountPath: /tmp/cilium/config-map
name: cilium-config-path
readOnly: true
# Needed to be able to load kernel modules
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /run/xtables.lock
name: xtables-lock
hostNetwork: true
initContainers:
- name: wait-for-node-init
command: ['sh', '-c', 'until stat /tmp/cilium-bootstrap-time > /dev/null 2>&1; do echo "Waiting on node-init to run..."; sleep 1; done']
image: "docker.io/cilium/cilium:v1.8.2"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /tmp/cilium-bootstrap-time
name: cilium-bootstrap-file
- command:
- /init-container.sh
env:
- name: CILIUM_ALL_STATE
valueFrom:
configMapKeyRef:
key: clean-cilium-state
name: cilium-config
optional: true
- name: CILIUM_BPF_STATE
valueFrom:
configMapKeyRef:
key: clean-cilium-bpf-state
name: cilium-config
optional: true
- name: CILIUM_WAIT_BPF_MOUNT
valueFrom:
configMapKeyRef:
key: wait-bpf-mount
name: cilium-config
optional: true
image: "docker.io/cilium/cilium:v1.8.2"
imagePullPolicy: IfNotPresent
name: clean-cilium-state
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
volumeMounts:
- mountPath: /sys/fs/bpf
name: bpf-maps
mountPropagation: HostToContainer
- mountPath: /var/run/cilium
name: cilium-run
resources:
requests:
cpu: 100m
memory: 100Mi
restartPolicy: Always
priorityClassName: system-node-critical
serviceAccount: cilium
serviceAccountName: cilium
terminationGracePeriodSeconds: 1
tolerations:
- operator: Exists
volumes:
# To keep state between restarts / upgrades
- hostPath:
path: /var/run/cilium
type: DirectoryOrCreate
name: cilium-run
# To keep state between restarts / upgrades for bpf maps
- hostPath:
path: /sys/fs/bpf
type: DirectoryOrCreate
name: bpf-maps
# To install cilium cni plugin in the host
- hostPath:
path: /opt/cni/bin
type: DirectoryOrCreate
name: cni-path
# To install cilium cni configuration in the host
- hostPath:
path: /etc/cni/net.d
type: DirectoryOrCreate
name: etc-cni-netd
# To be able to load kernel modules
- hostPath:
path: /lib/modules
name: lib-modules
# To access iptables concurrently with other processes (e.g. kube-proxy)
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
- hostPath:
path: /tmp/cilium-bootstrap-time
type: FileOrCreate
name: cilium-bootstrap-file
# To read the clustermesh configuration
- name: clustermesh-secrets
secret:
defaultMode: 420
optional: true
secretName: cilium-clustermesh
# To read the configuration from the config map
- configMap:
name: cilium-config
name: cilium-config-path
updateStrategy:
rollingUpdate:
maxUnavailable: 2
type: RollingUpdate
---
# Source: cilium/charts/nodeinit/templates/daemonset.yaml
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: cilium-node-init
namespace: kube-system
labels:
app: cilium-node-init
spec:
selector:
matchLabels:
app: cilium-node-init
template:
metadata:
labels:
app: cilium-node-init
spec:
tolerations:
- operator: Exists
hostPID: true
hostNetwork: true
priorityClassName: system-node-critical
containers:
- name: node-init
image: "docker.io/cilium/startup-script:af2a99046eca96c0138551393b21a5c044c7fe79"
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
env:
- name: CHECKPOINT_PATH
value: /tmp/node-init.cilium.io
# STARTUP_SCRIPT is the script run on node bootstrap. Node
# bootstrapping can be customized in this script.
- name: STARTUP_SCRIPT
value: |
#!/bin/bash
set -o errexit
set -o pipefail
set -o nounset
mount | grep "/sys/fs/bpf type bpf" || {
# Mount the filesystem until next reboot
echo "Mounting BPF filesystem..."
mount bpffs /sys/fs/bpf -t bpf
# Configure systemd to mount after next boot
echo "Installing BPF filesystem mount"
cat >/tmp/sys-fs-bpf.mount <<EOF
[Unit]
Description=Mount BPF filesystem (Cilium)
Documentation=http://docs.cilium.io/
DefaultDependencies=no
Before=local-fs.target umount.target
After=swap.target
[Mount]
What=bpffs
Where=/sys/fs/bpf
Type=bpf
Options=rw,nosuid,nodev,noexec,relatime,mode=700
[Install]
WantedBy=multi-user.target
EOF
if [ -d "/etc/systemd/system/" ]; then
mv /tmp/sys-fs-bpf.mount /etc/systemd/system/
echo "Installed sys-fs-bpf.mount to /etc/systemd/system/"
elif [ -d "/lib/systemd/system/" ]; then
mv /tmp/sys-fs-bpf.mount /lib/systemd/system/
echo "Installed sys-fs-bpf.mount to /lib/systemd/system/"
fi
# Ensure that filesystem gets mounted on next reboot
systemctl enable sys-fs-bpf.mount
systemctl start sys-fs-bpf.mount
}
echo "Link information:"
ip link
echo "Routing table:"
ip route
echo "Addressing:"
ip -4 a
ip -6 a
date > /tmp/cilium-bootstrap-time
echo "Node initialization complete"
---
# Source: cilium/charts/operator/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
io.cilium/app: operator
name: cilium-operator
name: cilium-operator
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
io.cilium/app: operator
name: cilium-operator
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
labels:
io.cilium/app: operator
name: cilium-operator
spec:
containers:
- args:
- --config-dir=/tmp/cilium/config-map
- --debug=$(CILIUM_DEBUG)
command:
- cilium-operator-generic
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CILIUM_K8S_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: CILIUM_DEBUG
valueFrom:
configMapKeyRef:
key: debug
name: cilium-config
optional: true
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: AWS_ACCESS_KEY_ID
name: cilium-aws
optional: true
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AWS_SECRET_ACCESS_KEY
name: cilium-aws
optional: true
- name: AWS_DEFAULT_REGION
valueFrom:
secretKeyRef:
key: AWS_DEFAULT_REGION
name: cilium-aws
optional: true
image: "docker.io/cilium/operator-generic:v1.8.2"
imagePullPolicy: IfNotPresent
name: cilium-operator
livenessProbe:
httpGet:
host: '127.0.0.1'
path: /healthz
port: 9234
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 3
volumeMounts:
- mountPath: /tmp/cilium/config-map
name: cilium-config-path
readOnly: true
hostNetwork: true
restartPolicy: Always
priorityClassName: system-cluster-critical
serviceAccount: cilium-operator
serviceAccountName: cilium-operator
volumes:
# To read the configuration from the config map
- configMap:
name: cilium-config
name: cilium-config-path
@thejosephstevens thanks for your reporting this issue. I just take a look at helm chart, just want to make sure if you have --set global.eni=true option.
Seems like the operator image is with aws flavour for me as per below.
$ helm template cilium cilium/cilium --version 1.8.2 \
--namespace kube-system \
--set global.eni=true \
--set config.ipam=eni \
--set global.egressMasqueradeInterfaces=eth0 \
--set global.tunnel=disabled \
--set global.nodeinit.enabled=true > template.yaml
$ grep -is "operator-" template.yaml
operator-api-serve-addr: "127.0.0.1:9234"
- cilium-operator-aws
image: "docker.io/cilium/operator-aws:v1.8.2"
Sorry, I should have been more precise: the way the instructions are written they actually direct you to do this if you want to use tunnels for the overlay
⎇ helm template cilium cilium/cilium --version 1.8.2 \
--namespace kube-system \
--set config.ipam=eni \
--set global.egressMasqueradeInterfaces=eth0 \
--set global.nodeinit.enabled=true > template.yaml
⎇ grep -is "operator-" template.yaml
operator-api-serve-addr: "127.0.0.1:9234"
- cilium-operator-generic
image: "docker.io/cilium/operator-generic:v1.8.2"
And this results in the invalid template that fails at runtime. What the instructions should do is ask you to also remove the line --set config.ipam=eni, because that does produce a valid configuration. The idea @tgraf mentioned in Slack of having helm fail on this configuration would also be a nice quality of life improvement
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Most helpful comment
Sorry, I should have been more precise: the way the instructions are written they actually direct you to do this if you want to use tunnels for the overlay
And this results in the invalid template that fails at runtime. What the instructions should do is ask you to also remove the line
--set config.ipam=eni, because that does produce a valid configuration. The idea @tgraf mentioned in Slack of having helm fail on this configuration would also be a nice quality of life improvement