1. What kops version are you running? The command kops version, will display
this information.
Version 1.13.0
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Version 1.13.0
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Adding a node to a cluster results in nodeup to look for Downloading "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm" which it does not exist anymore due to centos 7.7 release.
5. What happened after the commands executed?
kops tries to boostrap the node but nodeup fails due to pointing to a nonexistent package
6. What did you expect to happen?
New node bootstrapped and joined to the cluster
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Sep 17 19:58:57 nodeup: I0917 19:58:57.667661 3560 executor.go:145] No progress made, sleeping before retrying 1 failed task(s)
Sep 17 19:59:07 nodeup: I0917 19:59:07.667801 3560 executor.go:103] Tasks: 40 done / 48 total; 1 can run
Sep 17 19:59:07 nodeup: I0917 19:59:07.667844 3560 executor.go:178] Executing task "Package/docker-ce": Package: docker-ce
Sep 17 19:59:07 nodeup: I0917 19:59:07.667883 3560 package.go:206] Listing installed packages: /usr/bin/rpm -q docker-ce --queryformat %{NAME} %{VERSION}
Sep 17 19:59:07 nodeup: I0917 19:59:07.693153 3560 package.go:267] Installing package "docker-ce" (dependencies: [Package: container-selinux])
Sep 17 19:59:07 nodeup: I0917 19:59:07.747296 3560 files.go:100] Hash matched for "/var/cache/nodeup/packages/docker-ce": sha1:5369602f88406d4fb9159dc1d3fd44e76fb4cab8
Sep 17 19:59:07 nodeup: I0917 19:59:07.747368 3560 files.go:103] Hash did not match for "/var/cache/nodeup/packages/container-selinux": actual=sha1:93fdc15d22645b17bb1b2cc652f5bf51924d00a7 vs expected=sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0
Sep 17 19:59:07 nodeup: I0917 19:59:07.747458 3560 http.go:77] Downloading "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm"
Sep 17 19:59:07 nodeup: I0917 19:59:07.891339 3560 files.go:103] Hash did not match for "/var/cache/nodeup/packages/container-selinux": actual=sha1:93fdc15d22645b17bb1b2cc652f5bf51924d00a7 vs expected=sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0
Sep 17 19:59:07 nodeup: W0917 19:59:07.891385 3560 executor.go:130] error running task "Package/docker-ce" (2m20s remaining to succeed): downloaded from "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm"
but hash did not match expected "sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0"
I'm seeing this as well
We are seeing this issue as well.
Looks like this package was removed from centos repo, returning a 404:
wget http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
--2019-09-17 15:10:16-- http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
Resolving mirror.centos.org (mirror.centos.org)... 23.254.0.226
Connecting to mirror.centos.org (mirror.centos.org)|23.254.0.226|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2019-09-17 15:10:17 ERROR 404: Not Found.
This causes a major issue when considering autoscaling (cluster-autoscaler) which takes down nodes and new ones never join the cluster.
Ideally, for resiliency Kops should not be resolving artifacts required for nodeup/bootstrapping during node runtime from public repos - not sure if this is the way to go but possibly consider placing such critical rpms/binaries in the state store during init and fetching from there during runtime?
Also, if package is already installed (some may choose to bake in their AMI), it should skip trying to fetch this (not sure if this is the current behavior already).
Experiencing this in a production cluster as well. Is there any way to fast track this?
Added a PR
https://github.com/kubernetes/kops/pull/7612
A manual workaround is downloading the following file from a working node
/var/cache/nodeup/packages/container-selinux
and upload it to the new node.
Some Centos mirrors sites might still have the old RPM file. see: https://mirror-status.centos.org/
This has just bitten us as well, #7609 should resolve it however.
@rdjy Thanks for the answer, it did the trick for us.
Now that #7609 is merged how would I be able to leverage this change? Do I have to wait for a new kops release or how is nodeup released?
We're working on getting a 1.13/1.14 cut with these fixes asap.
You'll either need to build and deploy your own version of kops (including protokube and kubeup), a workaround as suggested above (you can probably utilize a hook to automate it https://github.com/kubernetes/kops/blob/master/docs/cluster_spec.md#hooks) or wait for a release which we're actively working on getting out asap!
Hi,
I had no luck using a hook to curl the correct file, as hooks seem to run AFTER nodeup. All I can think of is to build a custom AMI instead of vanillia amazon linux 2.
Indeed, hooks won't work. We figured that out the exact same time as @alexinthesky 馃槀
Then we switched for the Debian AMI to avoid further damage by dying spot instances.
kope.io/k8s-1.14-debian-stretch-amd64-hvm-ebs-2019-08-16
+1 Seeing the same
it's a bit involved but we found a workaround until a new release is cut (especially for people having this issue in production).
Bottom line is:
/var/cache/nodeup, it's around 200mb)
curl https://yourBucket/var_cache_nodeup.tgz | tar -C / -xzf -
This way the cache is there before nodeup is ran.
Below is an improved workaround, inspired by previous comments and pull requests. Kops supports arbitrary userdata. The snippet needs to be added to each instance group spec.
spec:
additionalUserData:
- content: |
bootcmd:
- mkdir -p /var/cache/nodeup/packages
- curl --proxy http://my.proxy:3128 -o /var/cache/nodeup/packages/container-selinux http://mirror.centos.org/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
name: workaround-container-selinux
type: text/cloud-config
Hi,
I just face the same issue recreating one of the masters node.
I connected to the node via ssh and download the package from another URL.
curl http://mirror.centos.org/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm -o /var/cache/nodeup/packages/container-selinux
Was able to workaround the issue by running the below commands on both Master and Nodes
curl http://mirror.centos.org/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm -o /var/cache/nodeup/packages/container-selinux
yum install -y selinux-policy selinux-policy-base selinux-policy-targeted
This workaround no longer works. As of today http://mirror.centos.org/centos/7.6.1810/ has been deprecated. This also breaks the fix that went in kops 1.13.1: https://github.com/kubernetes/kops/pull/7609
As a workaround you can use http://vault.centos.org/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
But really contianer-selinux needsto be updated to http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.107-3.el7.noarch.rpm along with associated dependencies
OK so looks like we'll be doing 1.13.2 this morning. I'd also really prefer to get away from the OS packaging (towards "tar.gz" installation) as it seems to be introducing more problems than it solves.
For 2.68.1 -> 2.107.3: We try not to make potentially breaking changes once we have released the 1.x.0 of kops. But we do so for security fixes etc. So we can look at getting it into 1.14.0 (which hasn't _quite_ released yet). But is it a security fix (in which case we would get it into 1.13.0)?
Here's the changelog, looks like there's not a strict security fix vs feature distinction, so we should probably shouldn't introduce the new version in kops 1.13:
* Fri Aug 02 2019 Jindrich Novy <[email protected]> - 2:2.107-3
- use 2.107 in RHEL7u7
- add build.sh script
* Thu Jul 11 2019 Lokesh Mandvekar <[email protected]> - 2:2.107-2
- Resolves: #1626215
* Mon Jun 24 2019 Lokesh Mandvekar <[email protected]> - 2:2.107-1
- bump to v2.107
* Tue Apr 23 2019 Lokesh Mandvekar <[email protected]> - 2:2.99-1
- built commit b13d03b
* Tue Apr 02 2019 Frantisek Kluknavsky <[email protected]> - 2:2.95-2
- rebase
* Thu Feb 28 2019 Frantisek Kluknavsky <[email protected]> - 2:2.84-2
- rebase
* Tue Jan 08 2019 Frantisek Kluknavsky <[email protected]> - 2.77-1
- backported fixes from upstream
* Mon Nov 12 2018 Dan Walsh <[email protected]> - 2.76-1
- Allow containers to use fuse file systems by default
- Allow containers to sendto dgram socket of container runtimes
- Needed to run container runtimes in notify socket unit files.
* Fri Oct 19 2018 Dan Walsh <[email protected]> - 2.74-1
- Allow containers to setexec themselves
* Tue Sep 18 2018 Frantisek Kluknavsky <[email protected]> - 2:2.73-3
- tweak macro for fedora - applies to rhel8 as well
* Mon Sep 17 2018 Frantisek Kluknavsky <[email protected]> - 2:2.73-2
- moved changelog entries:
- Define spc_t as a container_domain, so that container_runtime will transition
to spc_t even when setup with nosuid.
- Allow container_runtimes to setattr on callers fifo_files
- Fix restorecon to not error on missing directory
* Thu Sep 06 2018 Dan Walsh <[email protected]> - 2.69-3
- Make sure we pull in the latest selinux-policy
* Wed Jul 25 2018 Dan Walsh <[email protected]> - 2.69-2
- Add map support to container-selinux for RHEL 7.5
- Dontudit attempts to write to kernel_sysctl_t
Can the packages be externalised into a yaml/json file that nodeup reads in instead of being compiled into the binary? That would enable people to source the rpm and store it locally (s3, cloud storage, etc).
I've opted to save the rpm in S3 and then add it into kops with this in the instance groups:
spec:
additionalUserData:
- content: |
bootcmd:
- mkdir -p /var/cache/nodeup/packages
- aws s3 cp s3://<my-s3-bucket>/container-selinux /var/cache/nodeup/packages/container-selinux
name: workaround-container-selinux
type: text/cloud-config
Then you just need to sort out the bucket policy and iam privileges for kops to read from the bucket. This is in an AWS environment obviously, I'm sure there are similar approaches for the other cloud platforms.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
Below is an improved workaround, inspired by previous comments and pull requests. Kops supports arbitrary userdata. The snippet needs to be added to each instance group spec.