Kops: Nodeup can't find container-selinux-2.68-1.el7.noarch.rpm when trying to bootstrap a new node to a cluster

Created on 17 Sep 2019 · 23Comments · Source: kubernetes/kops

1. What kops version are you running? The command kops version, will display
this information.
Version 1.13.0

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Version 1.13.0
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Adding a node to a cluster results in nodeup to look for Downloading "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm" which it does not exist anymore due to centos 7.7 release.
5. What happened after the commands executed?
kops tries to boostrap the node but nodeup fails due to pointing to a nonexistent package

6. What did you expect to happen?
New node bootstrapped and joined to the cluster

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Sep 17 19:58:57 nodeup: I0917 19:58:57.667661 3560 executor.go:145] No progress made, sleeping before retrying 1 failed task(s) Sep 17 19:59:07 nodeup: I0917 19:59:07.667801 3560 executor.go:103] Tasks: 40 done / 48 total; 1 can run Sep 17 19:59:07 nodeup: I0917 19:59:07.667844 3560 executor.go:178] Executing task "Package/docker-ce": Package: docker-ce Sep 17 19:59:07 nodeup: I0917 19:59:07.667883 3560 package.go:206] Listing installed packages: /usr/bin/rpm -q docker-ce --queryformat %{NAME} %{VERSION} Sep 17 19:59:07 nodeup: I0917 19:59:07.693153 3560 package.go:267] Installing package "docker-ce" (dependencies: [Package: container-selinux]) Sep 17 19:59:07 nodeup: I0917 19:59:07.747296 3560 files.go:100] Hash matched for "/var/cache/nodeup/packages/docker-ce": sha1:5369602f88406d4fb9159dc1d3fd44e76fb4cab8 Sep 17 19:59:07 nodeup: I0917 19:59:07.747368 3560 files.go:103] Hash did not match for "/var/cache/nodeup/packages/container-selinux": actual=sha1:93fdc15d22645b17bb1b2cc652f5bf51924d00a7 vs expected=sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0 Sep 17 19:59:07 nodeup: I0917 19:59:07.747458 3560 http.go:77] Downloading "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm" Sep 17 19:59:07 nodeup: I0917 19:59:07.891339 3560 files.go:103] Hash did not match for "/var/cache/nodeup/packages/container-selinux": actual=sha1:93fdc15d22645b17bb1b2cc652f5bf51924d00a7 vs expected=sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0 Sep 17 19:59:07 nodeup: W0917 19:59:07.891385 3560 executor.go:130] error running task "Package/docker-ce" (2m20s remaining to succeed): downloaded from "http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm" but hash did not match expected "sha1:d9f87f7f4f2e8e611f556d873a17b8c0c580fec0"

lifecyclrotten

Source

igarcia-sugarcrm

👍14

Most helpful comment

Below is an improved workaround, inspired by previous comments and pull requests. Kops supports arbitrary userdata. The snippet needs to be added to each instance group spec.

spec:
  additionalUserData:
  - content: |
      bootcmd:
        - mkdir -p /var/cache/nodeup/packages
        - curl --proxy http://my.proxy:3128 -o /var/cache/nodeup/packages/container-selinux http://mirror.centos.org/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
    name: workaround-container-selinux
    type: text/cloud-config

rdjy on 20 Sep 2019

👍9

All 23 comments

I'm seeing this as well

elisiano on 17 Sep 2019

We are seeing this issue as well.

Looks like this package was removed from centos repo, returning a 404:

wget http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
--2019-09-17 15:10:16--  http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
Resolving mirror.centos.org (mirror.centos.org)... 23.254.0.226
Connecting to mirror.centos.org (mirror.centos.org)|23.254.0.226|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2019-09-17 15:10:17 ERROR 404: Not Found.

This causes a major issue when considering autoscaling (cluster-autoscaler) which takes down nodes and new ones never join the cluster.

Ideally, for resiliency Kops should not be resolving artifacts required for nodeup/bootstrapping during node runtime from public repos - not sure if this is the way to go but possibly consider placing such critical rpms/binaries in the state store during init and fetching from there during runtime?
Also, if package is already installed (some may choose to bake in their AMI), it should skip trying to fetch this (not sure if this is the current behavior already).

eytan-avisror on 18 Sep 2019

👍8

Experiencing this in a production cluster as well. Is there any way to fast track this?

Added a PR
https://github.com/kubernetes/kops/pull/7612

ianlmk on 18 Sep 2019

A manual workaround is downloading the following file from a working node
/var/cache/nodeup/packages/container-selinux
and upload it to the new node.

Some Centos mirrors sites might still have the old RPM file. see: https://mirror-status.centos.org/

rdjy on 18 Sep 2019

This has just bitten us as well, #7609 should resolve it however.

gjtempleton on 18 Sep 2019

@rdjy Thanks for the answer, it did the trick for us.

a8j8i8t8 on 18 Sep 2019

😄1

Now that #7609 is merged how would I be able to leverage this change? Do I have to wait for a new kops release or how is nodeup released?

dojadop on 19 Sep 2019

👍1

We're working on getting a 1.13/1.14 cut with these fixes asap.

You'll either need to build and deploy your own version of kops (including protokube and kubeup), a workaround as suggested above (you can probably utilize a hook to automate it https://github.com/kubernetes/kops/blob/master/docs/cluster_spec.md#hooks) or wait for a release which we're actively working on getting out asap!

mikesplain on 19 Sep 2019

👍2

Hi,

I had no luck using a hook to curl the correct file, as hooks seem to run AFTER nodeup. All I can think of is to build a custom AMI instead of vanillia amazon linux 2.

alexinthesky on 19 Sep 2019

Indeed, hooks won't work. We figured that out the exact same time as @alexinthesky 😂

Then we switched for the Debian AMI to avoid further damage by dying spot instances.
kope.io/k8s-1.14-debian-stretch-amd64-hvm-ebs-2019-08-16

hrzbrg on 19 Sep 2019

👍1

+1 Seeing the same

CarpathianUA on 19 Sep 2019

it's a bit involved but we found a workaround until a new release is cut (especially for people having this issue in production).
Bottom line is:

create a public s3 bucket and place there a tar with what you need (we did this with all /var/cache/nodeup, it's around 200mb)
copy the current launch configuration of the AutoScalingGroup into a new one (make sure you select the right IP policy based on your topology) and add 1 line in the beginning:
curl https://yourBucket/var_cache_nodeup.tgz | tar -C / -xzf -
(adjust the tar path extraction depending how you created your tar)
update the AutoScalingGroup to use the newly created LaunchConfiguration.

This way the cache is there before nodeup is ran.

elisiano on 20 Sep 2019

Below is an improved workaround, inspired by previous comments and pull requests. Kops supports arbitrary userdata. The snippet needs to be added to each instance group spec.

spec:
  additionalUserData:
  - content: |
      bootcmd:
        - mkdir -p /var/cache/nodeup/packages
        - curl --proxy http://my.proxy:3128 -o /var/cache/nodeup/packages/container-selinux http://mirror.centos.org/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm
    name: workaround-container-selinux
    type: text/cloud-config

rdjy on 20 Sep 2019

👍9

Hi,
I just face the same issue recreating one of the masters node.

I connected to the node via ssh and download the package from another URL.

curl http://mirror.centos.org/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm -o /var/cache/nodeup/packages/container-selinux

dignajar on 20 Sep 2019

Was able to workaround the issue by running the below commands on both Master and Nodes

curl http://mirror.centos.org/centos/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm -o /var/cache/nodeup/packages/container-selinux
yum install -y selinux-policy selinux-policy-base selinux-policy-targeted

bgopalakrishnan1986 on 21 Sep 2019

This workaround no longer works. As of today http://mirror.centos.org/centos/7.6.1810/ has been deprecated. This also breaks the fix that went in kops 1.13.1: https://github.com/kubernetes/kops/pull/7609

As a workaround you can use http://vault.centos.org/7.6.1810/extras/x86_64/Packages/container-selinux-2.68-1.el7.noarch.rpm

But really contianer-selinux needsto be updated to http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.107-3.el7.noarch.rpm along with associated dependencies

vobrien-axway on 25 Sep 2019

👍1

OK so looks like we'll be doing 1.13.2 this morning. I'd also really prefer to get away from the OS packaging (towards "tar.gz" installation) as it seems to be introducing more problems than it solves.

For 2.68.1 -> 2.107.3: We try not to make potentially breaking changes once we have released the 1.x.0 of kops. But we do so for security fixes etc. So we can look at getting it into 1.14.0 (which hasn't _quite_ released yet). But is it a security fix (in which case we would get it into 1.13.0)?

justinsb on 25 Sep 2019

👍2

Here's the changelog, looks like there's not a strict security fix vs feature distinction, so we should probably shouldn't introduce the new version in kops 1.13:

* Fri Aug 02 2019 Jindrich Novy <[email protected]> - 2:2.107-3
- use 2.107 in RHEL7u7
- add build.sh script

* Thu Jul 11 2019 Lokesh Mandvekar <[email protected]> - 2:2.107-2
- Resolves: #1626215

* Mon Jun 24 2019 Lokesh Mandvekar <[email protected]> - 2:2.107-1
- bump to v2.107

* Tue Apr 23 2019 Lokesh Mandvekar <[email protected]> - 2:2.99-1
- built commit b13d03b

* Tue Apr 02 2019 Frantisek Kluknavsky <[email protected]> - 2:2.95-2
- rebase

* Thu Feb 28 2019 Frantisek Kluknavsky <[email protected]> - 2:2.84-2
- rebase

* Tue Jan 08 2019 Frantisek Kluknavsky <[email protected]> - 2.77-1
- backported fixes from upstream

* Mon Nov 12 2018 Dan Walsh <[email protected]> - 2.76-1
- Allow containers to use fuse file systems by default
- Allow containers to sendto dgram socket of container runtimes
- Needed to run container runtimes in notify socket unit files.

* Fri Oct 19 2018 Dan Walsh <[email protected]> - 2.74-1
- Allow containers to setexec themselves

* Tue Sep 18 2018 Frantisek Kluknavsky <[email protected]> - 2:2.73-3
- tweak macro for fedora - applies to rhel8 as well

* Mon Sep 17 2018 Frantisek Kluknavsky <[email protected]> - 2:2.73-2
- moved changelog entries:
- Define spc_t as a container_domain, so that container_runtime will transition
to spc_t even when setup with nosuid.
- Allow container_runtimes to setattr on callers fifo_files
- Fix restorecon to not error on missing directory

* Thu Sep 06 2018 Dan Walsh <[email protected]> - 2.69-3
- Make sure we pull in the latest selinux-policy

* Wed Jul 25 2018 Dan Walsh <[email protected]> - 2.69-2
- Add map support to container-selinux for RHEL 7.5
- Dontudit attempts to write to kernel_sysctl_t

justinsb on 25 Sep 2019

👍1

Can the packages be externalised into a yaml/json file that nodeup reads in instead of being compiled into the binary? That would enable people to source the rpm and store it locally (s3, cloud storage, etc).

I've opted to save the rpm in S3 and then add it into kops with this in the instance groups:

spec:
  additionalUserData:
  - content: |
      bootcmd:
        - mkdir -p /var/cache/nodeup/packages
        - aws s3 cp s3://<my-s3-bucket>/container-selinux /var/cache/nodeup/packages/container-selinux
    name: workaround-container-selinux
    type: text/cloud-config

Then you just need to sort out the bucket policy and iam privileges for kops to read from the bucket. This is in an AWS environment obviously, I'm sure there are similar approaches for the other cloud platforms.

nigeldunn on 26 Sep 2019

👍4

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 25 Dec 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 24 Jan 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 23 Feb 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot on 23 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Cycle Nodes

owenmorgan · 3Comments

error: error validating "cluster-autoscaler.yml": error validating data: found invalid field tolerations for v1.PodSpec; if you choose to ignore these errors, turn validation off with --validate=false

endejoli · 4Comments

Kubectl top nodes not working with the metrics server

minasys · 3Comments

Allow opt-in to etcd3

justinsb · 4Comments

Fully Scripted Creation?

pluttrell · 4Comments