What happened:
When I create v1.18 cluster (offline), it faild.
(MoeLove) β ~ kind create cluster --image=kindest/node:v1.18.0@sha256:0e20578828edd939d25eb98496a685c76c98d54084932f76069f886ec315d694 --name=v1.18
Creating cluster "v1.18" ...
β Ensuring node image (kindest/node:v1.18.0) πΌ
β Preparing nodes π¦
β Writing configuration π
β Starting control-plane πΉοΈ
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged v1.18-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1
Debug:
(MoeLove) β kubernetes git:(master) kind create cluster --image=kindest/node:v1.18.0@sha256:0e20578828edd939d25eb98496a685c76c98d54084932f76069f886ec315d694 --name=v1.18 --retain
Creating cluster "v1.18" ...
β Ensuring node image (kindest/node:v1.18.0) πΌ
β Preparing nodes π¦
β Writing configuration π
β Starting control-plane πΉοΈ
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged v1.18-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1
(MoeLove) β kubernetes git:(master) docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
01fad0389680 kindest/node:v1.18.0 "/usr/local/bin/entrβ¦" 6 minutes ago Up 6 minutes 127.0.0.1:39295->6443/tcp v1.18-control-plane
(MoeLove) β kubernetes git:(master) docker exec -it 01fad0389680 bash
root@v1:/# systemctl status containerd
β containerd.service - containerd container runtime
Loaded: loaded (/etc/systemd/system/containerd.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2020-04-12 00:51:13 UTC; 10min ago
Docs: https://containerd.io
Main PID: 126 (containerd)
Tasks: 15
Memory: 26.6M
CGroup: /system.slice/docker-01fad03896801f36a8de368f3265dad862bb66cf02d52a1a2f341d062a180578.scope/system.slice/containerd.service
ββ126 /usr/local/bin/containerd
Apr 12 01:01:16 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:16.024975301Z" level=error msg="RunPodSandbox for &PodSandbo
xMetadata{Name:kube-scheduler-v1.18-control-plane,Uid:2208005057033f6461474a4b1eaeb34f,Namespace:kube-system,Attempt:0,} failed, error"
error="failed to get sandbox image \"k8s.gcr.io/pause:3.1\": failed to pull image \"k8s.gcr.io/pause:3.1\": failed to pull and unpack im
age \"k8s.gcr.io/pause:3.1\": failed to resolve reference \"k8s.gcr.io/pause:3.1\": failed to do request: Head https://k8s.gcr.io/v2/pau
se/manifests/3.1: dial tcp 74.125.203.82:443: i/o timeout"
Apr 12 01:01:19 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:19.345354143Z" level=error msg="Failed to load cni configura
tion" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Apr 12 01:01:20 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:20.024485908Z" level=error msg="RunPodSandbox for &PodSandbo
xMetadata{Name:etcd-v1.18-control-plane,Uid:c9a0fb45a6d0163b4056d67af760b788,Namespace:kube-system,Attempt:0,} failed, error" error="fai
led to get sandbox image \"k8s.gcr.io/pause:3.1\": failed to pull image \"k8s.gcr.io/pause:3.1\": failed to pull and unpack image \"k8s.
gcr.io/pause:3.1\": failed to resolve reference \"k8s.gcr.io/pause:3.1\": failed to do request: Head https://k8s.gcr.io/v2/pause/manifes
ts/3.1: dial tcp 74.125.203.82:443: i/o timeout"
Apr 12 01:01:22 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:22.023952642Z" level=error msg="RunPodSandbox for &PodSandbo
xMetadata{Name:kube-controller-manager-v1.18-control-plane,Uid:b1c39986355aaa05d871c42958815492,Namespace:kube-system,Attempt:0,} failed
, error" error="failed to get sandbox image \"k8s.gcr.io/pause:3.1\": failed to pull image \"k8s.gcr.io/pause:3.1\": failed to pull and
unpack image \"k8s.gcr.io/pause:3.1\": failed to resolve reference \"k8s.gcr.io/pause:3.1\": failed to do request: Head https://k8s.gcr.
io/v2/pause/manifests/3.1: dial tcp 74.125.203.82:443: i/o timeout"
Apr 12 01:01:24 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:24.347973630Z" level=error msg="Failed to load cni configura
root@v1:/# journalctl -u containerd
-- Logs begin at Sun 2020-04-12 00:51:13 UTC. --
Apr 12 01:01:31 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:31.239271331Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-scheduler-v1.18-control-plane,Uid:2208005057033f6461474a4b1eaeb34f,Namespace:kube-system,Attempt:0,
} failed, error" error="rpc error: code = Canceled desc = failed to get sandbox image \"k8s.gcr.io/pause:3.1\": failed to pull image \"k8s.gcr.io/pause:3.1\": failed to pull and unpack image \"k8s.gcr.io/pause:3.1\": failed to resolve reference \"k8s.gcr.
io/pause:3.1\": failed to do request: context canceled"
Apr 12 01:01:31 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:31.241440158Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-apiserver-v1.18-control-plane,Uid:49aefb7d9ab550220c600e6b2d8245f9,Namespace:kube-system,Attempt:0,
} failed, error" error="rpc error: code = Canceled desc = failed to get sandbox image \"k8s.gcr.io/pause:3.1\": failed to pull image \"k8s.gcr.io/pause:3.1\": failed to pull and unpack image \"k8s.gcr.io/pause:3.1\": failed to resolve reference \"k8s.gcr.
io/pause:3.1\": failed to do request: context canceled"
Apr 12 01:01:42 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:42.646801294Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to l
oad cni config"
Apr 12 01:01:42 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:42.715350898Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to l
oad cni config"
Apr 12 01:01:43 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:43.077874875Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:etcd-v1.18-control-plane,Uid:c9a0fb45a6d0163b4056d67af760b788,Namespace:kube-system,Attempt:0,}"
Apr 12 01:01:43 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:43.081317436Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-apiserver-v1.18-control-plane,Uid:49aefb7d9ab550220c600e6b2d8245f9,Namespace:kube-system,Attempt:0,}
"
Apr 12 01:01:43 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:43.085858677Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-controller-manager-v1.18-control-plane,Uid:b1c39986355aaa05d871c42958815492,Namespace:kube-system,At
tempt:0,}"
containerd image list:
root@v1:/# ctr --namespace=k8s.io i ls
REF TYPE DIGEST SIZE PLATFORMS LABELS
docker.io/kindest/kindnetd:0.5.4 application/vnd.oci.image.manifest.v1+json sha256:f7dbcdbc1e1cfda232bf13225de69fcdeeb64a81fd496e3c25414e6347ce374d 108.0 MiB linux/amd64 io.cri-containerd.image=managed
docker.io/rancher/local-path-provisioner:v0.0.12 application/vnd.oci.image.manifest.v1+json sha256:dd36600950cf353e88107d524031334abd32c8cc2982e331d2b5f6e200af7913 40.0 MiB linux/amd64 io.cri-containerd.image=managed
k8s.gcr.io/coredns:1.6.7 application/vnd.oci.image.manifest.v1+json sha256:5dfcb0bdbe73888a8a8a8fad86b8a1943579e3ea482148225fc505c80f32757b 41.9 MiB linux/amd64 io.cri-containerd.image=managed
k8s.gcr.io/debian-base:v2.0.0 application/vnd.oci.image.manifest.v1+json sha256:810d45197dc61cee861b30e6311e9a14a36050f758b47bc278ae8dfb578e4404 51.4 MiB linux/amd64 io.cri-containerd.image=managed
k8s.gcr.io/etcd:3.4.4-0 application/vnd.oci.image.manifest.v1+json sha256:8cf466d7ca35c35198f4ff270a9e5ae0ab9ad52e5c8d986ab5b3887568359a39 324.9 MiB linux/amd64 io.cri-containerd.image=managed
k8s.gcr.io/kube-apiserver:v1.19.0-alpha.1.512_ee6b88ddf904b4 application/vnd.oci.image.manifest.v1+json sha256:0d3e92bfe4e4df2e38a85276040b252cae53370feadf3cbc23e6ab124ca800e9 140.0 MiB linux/amd64 io.cri-containerd.image=managed
k8s.gcr.io/kube-controller-manager:v1.19.0-alpha.1.512_ee6b88ddf904b4 application/vnd.oci.image.manifest.v1+json sha256:3f72e5726c3605fb0a8e11c39b5c5f02e78d312cc6b6ac5173fec6fbe5fbc99e 127.0 MiB linux/amd64 io.cri-containerd.image=managed
k8s.gcr.io/kube-proxy:v1.19.0-alpha.1.512_ee6b88ddf904b4 application/vnd.oci.image.manifest.v1+json sha256:b1c21981b1f730269df234ec5dab759f3bed01e7250345771428dc60365d559e 127.3 MiB linux/amd64 io.cri-containerd.image=managed
k8s.gcr.io/kube-scheduler:v1.19.0-alpha.1.512_ee6b88ddf904b4 application/vnd.oci.image.manifest.v1+json sha256:33348ce6e79f45fb4f399133fbfabbee5de2c2dc7ad5e04b1ce764b3c42b81d3 108.2 MiB linux/amd64 io.cri-containerd.image=managed
k8s.gcr.io/pause:3.2 application/vnd.oci.image.manifest.v1+json sha256:61e45779fc594fcc1062bb9ed2cf5745b19c7ba70f0c93eceae04ffb5e402269 669.7 KiB linux/amd64 io.cri-containerd.image=managed
sha256:0e8d7e76ed346ae63c1eb2f17047b3c727bc5783fa6b51d3ee12f89cea964dbc application/vnd.oci.image.manifest.v1+json sha256:0d3e92bfe4e4df2e38a85276040b252cae53370feadf3cbc23e6ab124ca800e9 140.0 MiB linux/amd64 io.cri-containerd.image=managed
sha256:12f992c4835e95e8e820cabd88d3ee5e55c2cb456e45b358bf9631e78814de2b application/vnd.oci.image.manifest.v1+json sha256:3f72e5726c3605fb0a8e11c39b5c5f02e78d312cc6b6ac5173fec6fbe5fbc99e 127.0 MiB linux/amd64 io.cri-containerd.image=managed
sha256:2186a1a396deb58f1ea5eaf20193a518ca05049b46ccd754ec83366b5c8c13d5 application/vnd.oci.image.manifest.v1+json sha256:f7dbcdbc1e1cfda232bf13225de69fcdeeb64a81fd496e3c25414e6347ce374d 108.0 MiB linux/amd64 io.cri-containerd.image=managed
sha256:30f347e5200f5451133fd7b8966c2403d94c3336600b756cd865bd8c40c7c314 application/vnd.oci.image.manifest.v1+json sha256:b1c21981b1f730269df234ec5dab759f3bed01e7250345771428dc60365d559e 127.3 MiB linux/amd64 io.cri-containerd.image=managed
sha256:67da37a9a360e600e74464da48437257b00a754c77c40f60c65e4cb327c34bd5 application/vnd.oci.image.manifest.v1+json sha256:5dfcb0bdbe73888a8a8a8fad86b8a1943579e3ea482148225fc505c80f32757b 41.9 MiB linux/amd64 io.cri-containerd.image=managed
sha256:6fab4b32ce98a757fa14abc91d504d992a972844326b9fcd70080397343403a5 application/vnd.oci.image.manifest.v1+json sha256:8cf466d7ca35c35198f4ff270a9e5ae0ab9ad52e5c8d986ab5b3887568359a39 324.9 MiB linux/amd64 io.cri-containerd.image=managed
sha256:80d28bedfe5dec59da9ebf8e6260224ac9008ab5c11dbbe16ee3ba3e4439ac2c application/vnd.oci.image.manifest.v1+json sha256:61e45779fc594fcc1062bb9ed2cf5745b19c7ba70f0c93eceae04ffb5e402269 669.7 KiB linux/amd64 io.cri-containerd.image=managed
sha256:9bd6154724425e6083550fd85a91952fa2f79ef0b9844f0d009c37a72d075757 application/vnd.oci.image.manifest.v1+json sha256:810d45197dc61cee861b30e6311e9a14a36050f758b47bc278ae8dfb578e4404 51.4 MiB linux/amd64 io.cri-containerd.image=managed
sha256:c5161a19f4e358a6b4df024b355aefe04e1afb1b9be0a9c1224414b75037dc2c application/vnd.oci.image.manifest.v1+json sha256:33348ce6e79f45fb4f399133fbfabbee5de2c2dc7ad5e04b1ce764b3c42b81d3 108.2 MiB linux/amd64 io.cri-containerd.image=managed
sha256:db10073a6f829f72cc09655e92fbc3c74410c647c626b431ecd5257d1f6b59c1 application/vnd.oci.image.manifest.v1+json sha256:dd36600950cf353e88107d524031334abd32c8cc2982e331d2b5f6e200af7913 40.0 MiB linux/amd64 io.cri-containerd.image=managed
What you expected to happen:
create cluster.
How to reproduce it (as minimally and precisely as possible):
In offline:
kind create cluster --image=kindest/node:v1.18.0@sha256:0e20578828edd939d25eb98496a685c76c98d54084932f76069f886ec315d694 --name=v1.18
Anything else we need to know?:
Environment:
kind version): kind v0.8.0-alpha+c68a1cf537d680 go1.14.2 linux/amd64kubectl version):(MoeLove) β ~ kubectl version --client
Client Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.0-rc.1", GitCommit:"dbbed7806681109f541264ab37284f9a51c87fcc", GitTreeState:"clean", BuildDate:"2020-03-17T17:16:42Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
docker info): /etc/os-release):We need update cri to use new pause:3.2
We really actually need kind & containerd to be in sync here. Older kubernetes versions will not use 3.2
We can probably configure CRI & kubeadm ourselves and auto preload it ourselves
We can probably configure CRI & kubeadm ourselves and auto preload it ourselves
Do you mean that the user uses the pod-infra-container-image configuration item in the configuration file?
Edit: --pod-infra-container-image only support for docker.
Or do we directly preload both pause: 3.1 and pause: 3.2 ? pause image is small.
I consider that users may build node image by themselves, but not necessarily build base image.
k8s.gcr.io/pause 3.2 80d28bedfe5d 8 weeks ago 683kB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 2 years ago 742kB
No we should not load both.
The root problem here is that kubeadm and CRI are not aligned on this.
We should align them to specify the same image.
I think the best approach is probably to teach containers CRI config to us
the image from kubeadm at node build time.
This is going to be brittle though, we can't query that directly afaik :/
A different approach would be always setting everything to a particular
pause image regardless of kubernetes version, but that's suboptimal from an
upstream perspective because now we're not testing the default, which we
generally do when a default exists.
Just upgrading containerd is a partial implementation of the latter
approach.
On Mon, Apr 13, 2020, 7:33 PM Jintao Zhang notifications@github.com wrote:
We can probably configure CRI & kubeadm ourselves and auto preload it
ourselvesDo you mean that the user uses the pod-infra-container-image
configuration item in the configuration file?Or do we directly preload both pause: 3.1 and pause: 3.2 ? pause image is
small.k8s.gcr.io/pause 3.2 80d28bedfe5d 8 weeks ago 683kBk8s.gcr.io/pause 3.1 da86e6ba6ca1 2 years ago 742kB
β
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes-sigs/kind/issues/1471#issuecomment-613193088,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAHADKZKFORFDE35OAM4EZLRMPDPTANCNFSM4MGWAJ6Q
.
this one is going to be frustrating. there is no non-brittle way to detect this.
Maybe we can change containerd's config file ?
There has a sandbox_image configuration item of containerd's cri plugin.
Yes, that's known. The problem is knowing which image to use in a non brittle fashion.
Also kubeadm does not have an option to specify this.
Frankly kubeadm should not be pulling images in kind and the preflights are useless and time wssting in this context.
If we finally drop 1.11 and 1.12 k8s we can just skip the preflight and ship whatever image we tell containerd to use.
We'd wind up back at square 1 for detecting etcd vs pause though.
I think we're going to have to do the awful substring match that totally won't ever bite us π
/lifecycle active
AFAICT kubeadm config images list at least does not respect kubeletExtraArgs containing pod-infra-container-image.
Per https://github.com/kubernetes/kubeadm/issues/2020 there's no config field for this and they're currently not inclined to add one.
We can do the kubeadm config images list | grep pause => inject to containerd config at node build time, but that's pretty brittle and we really should actually just be using the preferred pod sandbox image for the CRI.
There's no reason for this to be tied to the kubernetes version.
We could work around for Kubernetes 1.13+ by skipping preflight entirely (and thus no pulling / we don't care if kubeadm is aware of this, we don't really have much use for preflight anyhow) but we'd still need to ignore it from the kubeadm config images list output at node build time and have some workaround for 1.11 / 1.12 (unless we drop those entirely).
We can do the
kubeadm config images list | grep pause=> inject to containerd config at node build time, but that's pretty brittle and we really should actually just be using the preferred pod sandbox image for the CRI.
+1
I'm thinking, for this problem, it is easier to load two images directly than maintaining a brittle logic (although this is a temporary solution)
Then we can consider publishing pre-build base-image of different kubernetes version.
For example kindest/base:v1.12 , kindest/base:v1.13 etc. (just like cri https://github.com/kubernetes-sigs/cri-tools#current-status)
WDYTοΌ
I'm thinking, for this problem, it is easier to load two images directly than maintaining a brittle logic (although this is a temporary solution)
That's a ~1MB cop out already though, and we'll eventually need 3 or more.
Then we can consider publishing pre-build base-image of different kubernetes version.
For example kindest/base:v1.12 , kindest/base:v1.13 etc. (just like cri https://github.com/kubernetes-sigs/cri-tools#current-status)
CRI tools is backwards compatible, it adds functionality. Those versions are just new over time.
Adding multi version makes custom base images less manageable (e.g. for another architecture), and then we are still incorrectly managing the pause image, it should be related only to the pod implementation.
(also the pause image is loaded and node image time so multiple bases is unnecessary)
I started writing something after thinking about this morning's discussion, but the nod build code is such a mess and I don't want to just insert yet another hack here.
I'll get a PR up soon.
should be fixed by https://github.com/kubernetes-sigs/kind/pull/1505
we may modify the approach in the future, but at least this is good enough for now I hope.
Thanks!