Currently in environments where a user must configure --hostname-override for the kubelet (such as AWS), kube-proxy is currently being deployed in a degraded state. Specifically, Services of type NodePort and LoadBalancer where externalTrafficPolicy: local.
Since we are deploying kube-proxy as a daemonset that means that the only options available are to override the command arguments using the downward api, or to use an init container to mutate the config. This is further complicated because the kube-proxy command line options are marked as deprecated in favor of the component config.
related kube-proxy issue: https://github.com/kubernetes/kubernetes/issues/57518
@luxas This is a nasty config issue, we need to chat with the @kubernetes/sig-network-bugs folks on this b/c the UX and work - arounds are really ugly.
workaround is here https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/#services-with-externaltrafficpolicy-local-are-not-reachable .
Punting this work item to 1.12.
I tried to apply the workaround, but ran into two issues.
First, the patch appears to have a typo. After patching, I found that the NODE_NAME variable was not expanded in the command line:
# pgrep -laf kube-proxy
8463 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=${NODE_NAME}
I fixed this by changing the curly braces ${NODE_NAME} to parentheses $(NODE_NAME), as per the docs:
Note: The environment variable appears in parentheses, "$(VAR)". This is required for the variable to be expanded in the command or args field.
-- https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#use-environment-variables-to-define-arguments
Second, kube-proxy appears to not use the value of the --hostname-override flag when it gets its Node from the API server, instead using the hostname reported by the OS. I edited the kube-proxy DaemonSet to increase the log verbosity, and found that kube-proxy is reading the --hostname-override flag:
I0728 02:09:00.385519 1 flags.go:27] FLAG: --alsologtostderr="false"
I0728 02:09:00.385634 1 flags.go:27] FLAG: --bind-address="0.0.0.0"
I0728 02:09:00.385644 1 flags.go:27] FLAG: --cleanup="false"
I0728 02:09:00.385658 1 flags.go:27] FLAG: --cleanup-iptables="false"
I0728 02:09:00.385665 1 flags.go:27] FLAG: --cleanup-ipvs="true"
I0728 02:09:00.385670 1 flags.go:27] FLAG: --cluster-cidr=""
I0728 02:09:00.385678 1 flags.go:27] FLAG: --config="/var/lib/kube-proxy/config.conf"
I0728 02:09:00.385685 1 flags.go:27] FLAG: --config-sync-period="15m0s"
I0728 02:09:00.385698 1 flags.go:27] FLAG: --conntrack-max="0"
I0728 02:09:00.385705 1 flags.go:27] FLAG: --conntrack-max-per-core="32768"
I0728 02:09:00.385712 1 flags.go:27] FLAG: --conntrack-min="131072"
I0728 02:09:00.385718 1 flags.go:27] FLAG: --conntrack-tcp-timeout-close-wait="1h0m0s"
I0728 02:09:00.385724 1 flags.go:27] FLAG: --conntrack-tcp-timeout-established="24h0m0s"
I0728 02:09:00.385734 1 flags.go:27] FLAG: --feature-gates=""
I0728 02:09:00.385744 1 flags.go:27] FLAG: --healthz-bind-address="0.0.0.0:10256"
I0728 02:09:00.385750 1 flags.go:27] FLAG: --healthz-port="10256"
I0728 02:09:00.385756 1 flags.go:27] FLAG: --help="false"
I0728 02:09:00.385762 1 flags.go:27] FLAG: --hostname-override="192.0.2.24"
I0728 02:09:00.385779 1 flags.go:27] FLAG: --iptables-masquerade-bit="14"
I0728 02:09:00.385787 1 flags.go:27] FLAG: --iptables-min-sync-period="0s"
I0728 02:09:00.385794 1 flags.go:27] FLAG: --iptables-sync-period="30s"
I0728 02:09:00.385802 1 flags.go:27] FLAG: --ipvs-exclude-cidrs="[]"
I0728 02:09:00.385821 1 flags.go:27] FLAG: --ipvs-min-sync-period="0s"
I0728 02:09:00.385832 1 flags.go:27] FLAG: --ipvs-scheduler=""
I0728 02:09:00.385838 1 flags.go:27] FLAG: --ipvs-sync-period="30s"
I0728 02:09:00.385844 1 flags.go:27] FLAG: --kube-api-burst="10"
I0728 02:09:00.385850 1 flags.go:27] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0728 02:09:00.385857 1 flags.go:27] FLAG: --kube-api-qps="5"
I0728 02:09:00.385870 1 flags.go:27] FLAG: --kubeconfig=""
I0728 02:09:00.385876 1 flags.go:27] FLAG: --log-backtrace-at=":0"
I0728 02:09:00.385885 1 flags.go:27] FLAG: --log-dir=""
I0728 02:09:00.385891 1 flags.go:27] FLAG: --log-flush-frequency="5s"
I0728 02:09:00.385897 1 flags.go:27] FLAG: --logtostderr="true"
I0728 02:09:00.385907 1 flags.go:27] FLAG: --masquerade-all="false"
I0728 02:09:00.385913 1 flags.go:27] FLAG: --master=""
I0728 02:09:00.385918 1 flags.go:27] FLAG: --metrics-bind-address="127.0.0.1:10249"
I0728 02:09:00.385924 1 flags.go:27] FLAG: --nodeport-addresses="[]"
I0728 02:09:00.385931 1 flags.go:27] FLAG: --oom-score-adj="-999"
I0728 02:09:00.385941 1 flags.go:27] FLAG: --profiling="false"
I0728 02:09:00.385947 1 flags.go:27] FLAG: --proxy-mode=""
I0728 02:09:00.385955 1 flags.go:27] FLAG: --proxy-port-range=""
I0728 02:09:00.385962 1 flags.go:27] FLAG: --resource-container="/kube-proxy"
I0728 02:09:00.385968 1 flags.go:27] FLAG: --stderrthreshold="2"
I0728 02:09:00.385978 1 flags.go:27] FLAG: --udp-timeout="250ms"
I0728 02:09:00.385984 1 flags.go:27] FLAG: --v="4"
I0728 02:09:00.385990 1 flags.go:27] FLAG: --version="false"
I0728 02:09:00.385998 1 flags.go:27] FLAG: --vmodule=""
I0728 02:09:00.386005 1 flags.go:27] FLAG: --write-config-to=""
I0728 02:09:00.389945 1 feature_gate.go:230] feature gates: &{map[]}
I0728 02:09:00.412872 1 iptables.go:603] couldn't get iptables-restore version; assuming it doesn't support --wait
I0728 02:09:00.413128 1 iptables.go:200] Could not connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: no such file or directory
W0728 02:09:00.477508 1 server_others.go:287] Flag proxy-mode="" unknown, assuming iptables proxy
I0728 02:09:00.479201 1 server_others.go:140] Using iptables Proxier.
W0728 02:09:00.496071 1 server.go:605] Failed to retrieve node info: nodes "daniel-ubuntu16" not found
W0728 02:09:00.496570 1 proxier.go:306] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
Note that kube-proxy tries to get the node with the name daniel-ubuntu16, which happens to be the hostname of the node where the kube-proxy Pod is scheduled, but the node is registered as 192.0.2.24. The string daniel-ubuntu16 does not appear in the config file (/var/lib/kube-proxy/config.conf), so kube-proxy must getting this value from the OS.
I'll file a PR for the first issue, but I'm still tracking down the root cause for the second.
I was able to work around the second by changing the hostnameOverride field in the kube-proxy ConfigMap to 192.0.2.24, but of course that's not a correct configuration because it's consumed by kube-proxy Pods scheduled on all nodes.
I believe the the root of the second issue is that the kube-proxy config file hostnameOverride value overrides the flag value. I edited the kube-proxy DaemonSet and removed the flag --config="/var/lib/kube-proxy/config.conf. After this change, the kube-proxy respected the --hostname-override flag.
The config file defines hostnameOverride: "". However, removing the field from the config file is _not_ a workaround. The field is a string, so its zero value is the empty string, and this causes kube-proxy to ask the OS for the hostname.
Looking at the kube-proxy code, I find that the flags update a config struct, but later the configuration file is unmarshalled into a new config struct, and the pointer to the original config struct is overwritten, effectively discarding all the flag values. (Maybe this is by design, since the flags are deprecated.)
I was able to set hostnameOverride by modifying the config in an init container. I updated the patch in the docs. Please see the above PR.
Sadly we are way to late in the cycle to make this change.
@timothysc
how this new flag can set via kubeadm?
I'm re-opening b/c we will need to update docs and other details on our side.
I will pick this up.
@Klaven thanks
/kind documentation
^ adding this kind too.
I think kubernetes/kubernetes#69340 makes the --hostname-override flag "undeprecated" again. This makes this bug super-easy to fix from kubeadm's pov:
we should _always_ set kube-proxy's --hostname-override to spec.nodeName (via downward API)
ie: we should drop "in some environments" from this bug description. On older versions (before 69340 gets rolled out), the override flag is ignored in the presence of --config -- but whenever it is not ignored, this is _always_ the correct value to set.
It would also be nice if it were possible to set the node name during _kubeadm join_ with less hassle. There's the --node-name flag, but _kubeadm join_ ignores it silently if also supplied with a --config flag. (@SataQiu further documented this behavior in the code.)
At present, in order to set the node name, I have patch the configuration file supplied to _kubeadm join_ on each host, which is fragile with YAML. Fortunately, _kubeadm init_ and _kubeadm join_ will happily read JSON as well, so patching a configuration file in JSON with _jq_ is a little bit easier than the _sed_ script I'm using now against the YAML. Still, given that we have the node-name flag in place, I was wondering if this patching is the desired user experience, or whether ignoring the flag value is an accident. Ignoring that flag _silently_ is cruel.
Writing today, I noticed the following comment per @luxas's hand, so there is hope:
Nb. --config overrides command line flags, TODO: fix this
Is there an open issue for that problem?
@seh that's a valid observation for a UX problem.
we should have node-name overriding the config, but i don't think we can do that for 1.13.
there is an issue in the k/kubeadm repo (don't remember the name) about the "meta-problem" of flag overrides, but we don't have good universal solutions yet.
Understood. Thank you for acknowledging the pain. Dealing with the current situation is possible with shell script facilities, but it would have been much easier to resolve had it been more obvious how the flag and the configuration file's presence interact. Documenting the current behavior鈥攅ven if undesirable in the long term鈥攚ould save a lot of confusion and frustration.
we should have node-name overriding the config
@seh, @neolit123, @timothysc here is a possible fix for this: https://github.com/kubernetes/kubernetes/pull/71270
Please, review.
It looks like _kubeadm init_ still ignores the --node-name flag when using a configuration file. So close...
I was able to workaround this issue as follows
- name: kube-proxy
command:
- sh
- -c
- |
hostname $HOSTNAME
/usr/local/bin/kube-proxy --config=/etc/kubernetes/kube-proxy.conf
env:
- name: HOSTNAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
Most helpful comment
I think kubernetes/kubernetes#69340 makes the
--hostname-overrideflag "undeprecated" again. This makes this bug super-easy to fix from kubeadm's pov:we should _always_ set kube-proxy's
--hostname-overridetospec.nodeName(via downward API)ie: we should drop "in some environments" from this bug description. On older versions (before 69340 gets rolled out), the override flag is ignored in the presence of
--config-- but whenever it is not ignored, this is _always_ the correct value to set.