See: https://github.com/kubernetes-sigs/kind/issues/1634 for prior context.
In short: When running on gLinux upgrading containerd to the latest 1.4.X development nightlies breaks cluster bringup. Containers can't come up due to failing to set hugetlb limit. gLinux does not appear to have hugetlb on the host. This works fine under docker for mac and in our CI.
Using https://github.com/kind-ci/containerd-nightlies I'm searching for the break (NOTE: the commit is really the only relevant part, the "1.3.0" binaries here are not necessarily from the 1.3 branch).
| version | works? |
| ------- | ------- |
| v1.3.0-293-gd3b42574 | โ
|
| v1.3.0-326-g3fe22817 | โ
|
| v1.3.0-350-g669f516b | โ
|
| v1.3.0-374-g97ca1be0 | โ
|
| v1.3.0-383-g7a5fcf61 | โ |
| v1.3.0-400-gc6851ace | โ |
| v1.3.0-434-ge5fc9910 | โ |
| v1.4.0-beta.0-2-g6312b52d | โ |
Quicker testing using this Dockerfile:
FROM kindest/node:v1.18.2
ARG CONTAINERD_VERSION="v1.4.0-beta.0-2-g6312b52d"
RUN echo "Installing containerd ..." \
&& export ARCH=$(dpkg --print-architecture | sed 's/ppc64el/ppc64le/' | sed 's/armhf/arm/') \
&& export CONTAINERD_BASE_URL="https://github.com/kind-ci/containerd-nightlies/releases/download/containerd-${CONTAINERD_VERSION#v}" \
&& curl -sSL --retry 5 --output /tmp/containerd.tgz "${CONTAINERD_BASE_URL}/containerd-${CONTAINERD_VERSION#v}.linux-${ARCH}.tar.gz" \
&& tar -C /usr/local -xzvf /tmp/containerd.tgz \
&& rm -rf /tmp/containerd.tgz \
&& rm -f /usr/local/bin/containerd-stress /usr/local/bin/containerd-shim-runc-v1 \
&& curl -sSL --retry 5 --output /usr/local/sbin/runc "${CONTAINERD_BASE_URL}/runc.${ARCH}" \
&& chmod 755 /usr/local/sbin/runc \
&& containerd --version
docker build -t kindest/node:latest --build-arg CONTAINERD_VERSION=v1.4.0-beta.0-2-g6312b52d .
Narrowed it to something in containerd/containerd 97ca1be0..7a5fcf61, we don't appear to have any nightly builds between these.
Noting references to hugetlb in the diff ... https://github.com/containerd/containerd/compare/97ca1be0..7a5fcf61#diff-205a6ca18f070382cf15fbec12476bc5R558
https://github.com/containerd/containerd/compare/97ca1be0..7a5fcf61#diff-85b069277b32de3389720de87247f54eR451
It seems like perhaps the CRI change there started to require hugetlb ...
cc @dims @mikebrow @bg-chun -- is it expected that systems without hugetlb cgroup cannot create containers / k8s pods now?
Failure looks like this: https://github.com/kubernetes-sigs/kind/issues/1634#issuecomment-636261856
May 30 02:11:51 kind-control-plane containerd[128]: time="2020-05-30T02:11:51.145632472Z" level=error msg="StartContainer for "3f4c33b71d247428d5c03c831ac2de4adc5c33c2b32c1b81e8c679363013f6e0" failed" error="failed to create containerd task: OCI runtime create failed: container_linux.go:353: starting container process caused: process_linux.go:437: container init caused: process_linux.go:403: setting cgroup config for procHooks process caused: cannot set hugetlb limit: container could not join or create cgroup: unknown"
@dims suggested trying different runc but using this kindest/node:latest:
FROM kindest/node:v1.18.2
ARG CONTAINERD_VERSION="v1.4.0-beta.0-2-g6312b52d"
RUN echo "Installing runc ..." \
&& export ARCH=$(dpkg --print-architecture | sed 's/ppc64el/ppc64le/' | sed 's/armhf/arm/') \
&& export CONTAINERD_BASE_URL="https://github.com/kind-ci/containerd-nightlies/releases/download/containerd-${CONTAINERD_VERSION#v}" \
&& curl -sSL --retry 5 --output /usr/local/sbin/runc "${CONTAINERD_BASE_URL}/runc.${ARCH}" \
&& chmod 755 /usr/local/sbin/runc \
&& runc --version
To run with:
containerd github.com/containerd/containerd v1.3.4-12-g1e902b2d 1e902b2d7e2cdb21e88b11cdf16e267b500d15a8
runc version 1.0.0-rc10+dev
spec: 1.0.2
(aka what runc containerd HEAD is specifying, but with containerd 1.3.X) shows no issues.
Same issue when using runc rc9, rc10, and runc from HEAD (e664e732d5e9d580a8d162219030169f2508393e) with any containerd version after 97ca1be0 (may not be that exact commit, somewhere between that and 7ca1be0)
@bg-chun i believe, this broke after the containerd/cri got updated with https://github.com/containerd/cri/pull/1332/files#diff-97925bda0bb83f5a74a5c8152bc190ceR451-R456
Can you please take a look?
@dims my read of the code.. kubelet sent containerd-CRI a runtime.CreateContainerRequest that contained an array of HugepageLimit. containerd-CRI returned the error due to the system not supporting the request. It worked before.. only because we were ignoring the the array of HugepageLimit. We'll need a switch in containerd/cri to request containerd to ignore the array of HugepageLimit in container requests (create and update). Or kublet could hold the switch to ignore them and filter before sending.
Probably best to skip hugepage tests on platforms without support, or somehow configure them to expect fail for said platform(s)?
@mikebrow ack. Please see below:
So it looks like we need to detect if the cgroups controller for hugetlb is present and active in two places
Looks like we can borrow code from cri-o:
https://github.com/cri-o/cri-o/blob/master/server/container_create_linux.go#L60-L87
@mikebrow ack. Please see below:
So it looks like we need to detect if the cgroups controller for hugetlb is present and active in two places
- #1 in kubelet : https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_container_linux.go#L100-L105
- #2 in containerd/cri : https://github.com/containerd/cri/blob/master/pkg/containerd/opts/spec_unix.go#L451-L455
1 is definitely the longer term fix, kubelet should not be requesting containerd for hugetlb stuff if we already know that it does not exist
2 is shorter term fix (may be log a warning for now and then a hard error later!) as this will help existing k8s versions to work with newer containerd as this change was introduced recently in containerd/cri@c02c248#diff-97925bda0bb83f5a74a5c8152bc190ceR451-R456
Looks like we can borrow code from cri-o:
https://github.com/cri-o/cri-o/blob/master/server/container_create_linux.go#L60-L87
@dims
For #2... maybe a cri flag for: IgnoreHugePageLimitsIfControllerNotPresent (if set to false will error on create / update container request with huge page limits; if set to true will log a warning, filter the huge page limit requests and continue)
I'm torn on what the default should be, have to think it over.
I see cri-o is silently ignoring.. tyranny of the first to ignore... will also ignore by default.
@mikebrow i threw a PR for @BenTheElder to try when he gets a chance - https://github.com/containerd/cri/pull/1501 i can't really test as i don't have access to the environment (gLinux)
@BenTheElder can you please try https://github.com/containerd/cri/pull/1501 and let us know?
Sorry for the delay, AFAICT https://github.com/containerd/cri/pull/1501 does the trick.
Using this to test:
FROM kindest/node:v1.18.2
COPY runc /usr/local/sbin/runc
COPY ctr containerd* /usr/local/bin/
RUN echo "Installing containerd ..." \
&& containerd --version \
&& runc --version
Binaries are coming from make binaries in containerd at 62dd1411
runc is e664e732d5e9d580a8d162219030169f2508393e
This fails, but when I swap the containerd binary for the make containerd binary from https://github.com/containerd/cri/pull/1501 I get a cluster up.
$ docker exec kind-control-plane containerd --version
time="2020-06-02T05:02:33.919837002Z" level=warning msg="This customized containerd is only for CI test, DO NOT use it for distribution."
containerd github.com/containerd/containerd d7ce093d-TEST
To futher sanity check, the make containerd binary from containerd/cri at 8898550e (master) fails.
thanks @BenTheElder
update for those following along: https://github.com/containerd/cri/pull/1501 is in and seems to mitigate, we'll need to get it into containerd/containerd next, and then it perhaps makes sense for kubelet to mitigate as well.
Well this fix is nearly in, but sadly we now can't update due to CI becoming much less stable about deleting containers somewhere in v1.3.3-14-g449e9269 to v1.3.4-12-g1e902b2d, we've rolled back to v1.3.3-14-g449e9269 for now ๐
We've been reproducing / bisecting this in GCE kube-up.sh based (not KIND) CI and currently suspect https://github.com/containerd/containerd/commit/b26e6abe00d17541647f78c37bbf091cbf797fc3
FYI @howardjohn
For reproduction discussion see:
https://github.com/kind-ci/containerd-nightlies/pull/13 vendor.conf format changed, breaking the runc commit lookup. @dims figured it out and fixed it :pray:
testing again once we re-cut all the versions we were looking at.
we are back up to HEAD
Most helpful comment
https://github.com/kind-ci/containerd-nightlies/pull/13 vendor.conf format changed, breaking the runc commit lookup. @dims figured it out and fixed it :pray:
testing again once we re-cut all the versions we were looking at.