Kind: Running kind delete cluster has issue deleting linked containers

Created on 20 Nov 2019  路  11Comments  路  Source: kubernetes-sigs/kind

What happened:

Running kind delete cluster gives the following output

ERROR: failed to delete cluster: failed to delete nodes: command "docker rm -f -v kind-control-plane2 kind-control-plane,ingress-proxy-80/target,ingress-proxy-443/target kind-external-load-balancer kind-control-plane3 kind-worker2 kind-worker kind-worker3" failed with error: exit status 1

What you expected to happen:

For the cluster to get deleted without error

How to reproduce it (as minimally and precisely as possible):

  1. Create cluster
kind create cluster --config=config.yaml

The contents of config.yaml

kind: Cluster
apiVersion: kind.sigs.k8s.io/v1alpha3
networking:
  disableDefaultCNI: True
nodes:
- role: control-plane
- role: control-plane
- role: control-plane
- role: worker
- role: worker
- role: worker
kubeadmConfigPatches:
- |
  apiVersion: kubeadm.k8s.io/v1beta2
  kind: ClusterConfiguration
  metadata:
    name: config
  networking:
    serviceSubnet: "172.30.0.0/16"
    podSubnet: "10.254.0.0/16"
  1. Run the workaournd to make calico work
docker exec -ti kind-control-plane sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-control-plane2 sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-control-plane3 sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-worker sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-worker2 sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-worker3 sysctl -w net.ipv4.conf.all.rp_filter=0
  1. Install calico
curl -so manifests/calico.yaml https://docs.projectcalico.org/v3.8/manifests/calico.yaml
sed -i 's/192\.168/10\.254/g' manifests/calico.yaml
kubectl apply -f manifests/calico.yaml
  1. Label the workers as such
for worker in kind-worker kind-worker{2..3}
do
  kubectl label node ${worker} node-role.kubernetes.io/worker=''
done
  1. Install nginx using helm
kubectl create ns ingress-nginx

helm repo add stable https://kubernetes-charts.storage.googleapis.com/
helm repo update 

helm install ingress-nginx stable/nginx-ingress --namespace ingress-nginx \
--set rbac.create=true --set controller.image.pullPolicy="Always" --set controller.extraArgs.enable-ssl-passthrough="" \
--set controller.stats.enabled=true --set controller.service.type="NodePort"
  1. Port forward 80/433 on local host to the nodeport
for port in 80 443
    do
        node_port=$(kubectl get svc -n ingress-nginx ingress-nginx-nginx-ingress-controller -o=jsonpath="{.spec.ports[?(@.port == ${port})].nodePort}")

        docker run -d --name ingress-proxy-${port} \
          --publish 127.0.0.1:${port}:${port} \
          --link kind-control-plane:target \
          alpine/socat -dd \
          tcp-listen:${port},fork,reuseaddr tcp-connect:target:${node_port}
    done

Anything else we need to know?:

I need to run the following in order to successfully run kind delete cluster

docker rm --link /ingress-proxy-80/target
docker rm --link /ingress-proxy-443/target

Another thing, is that I never had to do this in v0.5.0

Environment:

  • kind version:
kind v0.6.0 go1.13.4 linux/amd64
  • Kubernetes version: (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:18:23Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version:
Containers: 3
 Running: 3
 Paused: 0
 Stopped: 0
Images: 10
Server Version: 1.13.1
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: systemd
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Authorization: rhel-push-plugin
Swarm: inactive
Runtimes: oci runc
Default Runtime: oci
Init Binary: /usr/libexec/docker/docker-init-current
containerd version:  (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: N/A (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  WARNING: You're not using the default seccomp profile
  Profile: /etc/docker/seccomp.json
 selinux
Kernel Version: 5.3.11-300.fc31.x86_64
Operating System: Fedora 31 (Workstation Edition)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 12
Total Memory: 62.47 GiB
Name: laptop
ID: EPXO:U3CU:UTC3:2OMV:HEXZ:KE2X:X4BW:M35G:IUY2:UTMY:LKOK:NJF4
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 default-route-openshift-image-registry.apps.ocp4.cloud.chx
 default-route-openshift-image-registry.apps.openshift4.cloud.chx
 127.0.0.0/8
Live Restore Enabled: true
Registries: docker.io (secure), registry.fedoraproject.org (secure), quay.io (secure), registry.access.redhat.com (secure), registry.centos.org (secure), docker.io (secure)
  • OS:
NAME=Fedora
VERSION="31 (Workstation Edition)"
ID=fedora
VERSION_ID=31
VERSION_CODENAME=""
PLATFORM_ID="platform:f31"
PRETTY_NAME="Fedora 31 (Workstation Edition)"
ANSI_COLOR="0;34"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:31"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f31/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=31
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=31
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="Workstation Edition"
VARIANT_ID=workstation
kinbug lifecyclactive prioritimportant-longterm

Most helpful comment

Indeed it does

$ kind get nodes
kind-control-plane3
kind-worker2
kind-external-load-balancer
kind-worker
kind-control-plane,ingress-proxy-80/target,ingress-proxy-443/target
kind-control-plane2
kind-worker3

All 11 comments

More verbose output

$ kind delete cluster  -v 9
Deleting cluster "kind" ...
ERROR: failed to delete cluster: failed to delete nodes: command "docker rm -f -v kind-control-plane,ingress-proxy-80/target,ingress-proxy-443/target kind-control-plane2 kind-worker kind-control-plane3 kind-worker3 kind-worker2 kind-external-load-balancer" failed with error: exit status 1

Output:
kind-control-plane2
kind-worker
kind-control-plane3
kind-worker3
kind-worker2
kind-external-load-balancer
Error response from daemon: No such container: kind-control-plane,ingress-proxy-80/target,ingress-proxy-443/target

Stack Trace: 
sigs.k8s.io/kind/pkg/errors.WithStack
    /src/pkg/errors/errors.go:51
sigs.k8s.io/kind/pkg/exec.(*LocalCmd).Run
    /src/pkg/exec/local.go:116
sigs.k8s.io/kind/pkg/internal/cluster/providers/docker.(*Provider).DeleteNodes
    /src/pkg/internal/cluster/providers/docker/provider.go:130
sigs.k8s.io/kind/pkg/internal/cluster/delete.Cluster
    /src/pkg/internal/cluster/delete/delete.go:42
sigs.k8s.io/kind/pkg/cluster.(*Provider).Delete
    /src/pkg/cluster/provider.go:105
sigs.k8s.io/kind/pkg/cmd/kind/delete/cluster.runE
    /src/pkg/cmd/kind/delete/cluster/deletecluster.go:58
sigs.k8s.io/kind/pkg/cmd/kind/delete/cluster.NewCommand.func1
    /src/pkg/cmd/kind/delete/cluster/deletecluster.go:44
github.com/spf13/cobra.(*Command).execute
    /go/pkg/mod/github.com/spf13/[email protected]/command.go:826
github.com/spf13/cobra.(*Command).ExecuteC
    /go/pkg/mod/github.com/spf13/[email protected]/command.go:914
github.com/spf13/cobra.(*Command).Execute
    /go/pkg/mod/github.com/spf13/[email protected]/command.go:864
sigs.k8s.io/kind/cmd/kind/app.Run
    /src/cmd/kind/app/main.go:53
sigs.k8s.io/kind/cmd/kind/app.Main
    /src/cmd/kind/app/main.go:35
main.main
    /src/main.go:25
runtime.main
    /usr/local/go/src/runtime/proc.go:203
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1357

i'm going to assume that if you do kind get nodes it also lists your ingress containers?
ideally the filter should not catch these.

Indeed it does

$ kind get nodes
kind-control-plane3
kind-worker2
kind-external-load-balancer
kind-worker
kind-control-plane,ingress-proxy-80/target,ingress-proxy-443/target
kind-control-plane2
kind-worker3

@christianh814
in this function you can see the docker command that is used for retrieving the list of nodes:
https://github.com/kubernetes-sigs/kind/blob/226c290cdd1f8f39a948d436ac9c96deeb1ae1ef/pkg/internal/cluster/providers/docker/provider.go#L92

deprecatedClusterLabelKey is defined here:
https://github.com/kubernetes-sigs/kind/blob/383279e348eed288a3ad68a984b355ddd4a7254a/pkg/internal/cluster/providers/docker/constants.go#L24

one potential fix is to always split after the first , on a line that the command returns. [1]

the problem here seems to be the linking.
i don't know if there is a way to exclude this from the output of the docker command, but you can play with that.

the alternative is to go with [1], assuming , must not be part of the node name.

the alternative is to go with [1], assuming , must not be part of the node name.

looks like the cluster names already forbids ",".

cluster names must match ^[a-zA-Z0-9_.-]+$

potential fix here:
https://github.com/kubernetes-sigs/kind/pull/1117

/priority important-longterm

  1. Run the workaournd to make calico work
docker exec -ti kind-control-plane sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-control-plane2 sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-control-plane3 sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-worker sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-worker2 sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-worker3 sysctl -w net.ipv4.conf.all.rp_filter=0

@christianh814
this is no longer needed in 0.6 https://github.com/kubernetes-sigs/kind/pull/897

  1. Run the workaournd to make calico work
docker exec -ti kind-control-plane sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-control-plane2 sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-control-plane3 sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-worker sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-worker2 sysctl -w net.ipv4.conf.all.rp_filter=0
docker exec -ti kind-worker3 sysctl -w net.ipv4.conf.all.rp_filter=0

@christianh814
this is no longer needed in 0.6 #897

Nice!

So:

Warning: The --link flag is a legacy feature of Docker. It may eventually be removed. Unless you absolutely need to continue using it, we recommend that you use user-defined networks to facilitate communication between two containers instead of using --link. One feature that user-defined networks do not support that you can do with --link is sharing environment variables between containers. However, you can use other mechanisms such as volumes to share environment variables between containers in a more controlled way.

https://docs.docker.com/network/links/

I'm not sure we should invest in this, in addition, @neolit123's patch doesn't work in all cases, the output is not sorted and the format is not documented :(

+1 i will close my PR.
--link might have to be added to "known issues".

Given that this is the only instance I've heard of someone trying to use this with kind and docker upstream strongly discourages using it with bold red text, I think we can even leave out the "known issues" and focus on the user defined networks solution.

https://github.com/kubernetes-sigs/kind/issues/1124 takes priority but we definitely need to look at user defined networks more.

I think you can accomplish this kind of ingress stuff without using --link as well.

FWIW if we need a more complete fix in the future, I think the answer is to walk the list of names and keep the one without a / in it. I'd rather not add unnecessary complexity for a feature that is going away though.

Was this page helpful?
0 / 5 - 0 ratings