Kops: Upgrade to kops 1.9 alpha 1 - fails with missing CA cert keyset.yaml

Created on 1 Mar 2018 · 11Comments · Source: kubernetes/kops

Version 1.9.0-alpha.1 (git-f799036a3)
Cloud provider: AWS
Upgrade from k8s 1.8.8 to 1.9.3, the 1.8.8 upgrade was done with kops 1.8.1

nodeup fails with:

Mar 01 20:18:07 ip-10-88-1-7 nodeup[936]: I0301 20:18:07.218152     936 s3fs.go:198] Reading file "s3://kops-k8s-state-store/e2e.us-east-1.aws.k8s/pki/issued/ca/keyset.yaml"
Mar 01 20:18:07 ip-10-88-1-7 nodeup[936]: W0301 20:18:07.225106     936 main.go:141] got error running nodeup (will retry in 30s): error building loader: CA certificate "ca" not found

It appears that 1.9.0-alpha.1 isn't creating the keyset.yaml for upgrades.

P0 blocks-next

Source

blakebarnett

👍2

Most helpful comment

Any way to mitigate this? Can I generate this keyset.yaml file manually?

Update 1

I found the structure of keyset.yaml here: https://github.com/kubernetes/kops/blob/1c75f475101110383ff077f515756ba060a1b997/upup/pkg/fi/vfs_castore_test.go#L109

Trying to recreate that using the *.crt file in the pki/issued/ca folder....

Update 2

So in order to convert .crt to keyset.yaml, you need to add a file containing the following:

apiVersion: kops/v1alpha2
kind: Keyset
metadata:
  creationTimestamp: null
  name: ca
spec:
  keys:
  - id: "[name of the .crt file without .crt extension]"
    publicMaterial: [base64 encoded stirng containing the .crt file]
  type: Keypair

Has to be done for all the folders and subfolders in /pki in S3, trying to write a script to do that.

Update 3

I generated keysets for both .crt and .key files and after that I got the cluster to boot! 🎉

Here's my shitty script: https://gist.github.com/fredsted/034dbf2f8b1117c37a4add0efeea7029 :)

fredsted on 5 Mar 2018

👍4 🎉1

All 11 comments

cc @justinsb ... i believe this fixed it? https://github.com/kubernetes/kops/pull/4375

gambol99 on 1 Mar 2018

That was for 1.8.1 to ignore keyset.yaml files created by a 1.9.0 install, this is a 1.9.0 _upgrade_ not creating the keyset.yaml file properly.

blakebarnett on 1 Mar 2018

👍1

We need kops, during an update, so convert the existing certs into a keyset. Then write the keyset to the state store.

https://github.com/kubernetes/kops/blob/1c75f475101110383ff077f515756ba060a1b997/upup/pkg/fi/vfs_castore.go#L292

Is where the write happens.

chrislovecnm on 4 Mar 2018

Any way to mitigate this? Can I generate this keyset.yaml file manually?

Update 1

I found the structure of keyset.yaml here: https://github.com/kubernetes/kops/blob/1c75f475101110383ff077f515756ba060a1b997/upup/pkg/fi/vfs_castore_test.go#L109

Trying to recreate that using the *.crt file in the pki/issued/ca folder....

Update 2

So in order to convert .crt to keyset.yaml, you need to add a file containing the following:

apiVersion: kops/v1alpha2
kind: Keyset
metadata:
  creationTimestamp: null
  name: ca
spec:
  keys:
  - id: "[name of the .crt file without .crt extension]"
    publicMaterial: [base64 encoded stirng containing the .crt file]
  type: Keypair

Has to be done for all the folders and subfolders in /pki in S3, trying to write a script to do that.

Update 3

I generated keysets for both .crt and .key files and after that I got the cluster to boot! 🎉

Here's my shitty script: https://gist.github.com/fredsted/034dbf2f8b1117c37a4add0efeea7029 :)

fredsted on 5 Mar 2018

👍4 🎉1

@fredsted I have a PR in today - please feel free to comment and test. I am getting a new private key, which we will probably need to work out.

chrislovecnm on 5 Mar 2018

👍1

Could someone please add a note about this issue on the release page for 1.9.0-alpha1? I would not have attempted the upgrade if I had known about this but now I have a cluster with no control plane and no easy way of rolling back to 1.8.7.

jpds on 13 Mar 2018

👍1

It would appear that downgrading the kubernetes version doesn't also bring down the version of kubeup utils:

/var/cache/kubernetes-install $ cat kube_env.yaml 
Assets:
- 0f3a59e4c0aae8c2b2a0924d8ace010ebf39f48e@https://storage.googleapis.com/kubernetes-release/release/v1.8.7/bin/linux/amd64/kubelet
- 36340bb4bb158357fe36ffd545d8295774f55ed9@https://storage.googleapis.com/kubernetes-release/release/v1.8.7/bin/linux/amd64/kubectl
- 1d9788b0f5420e1a219aad2cb8681823fc515e7c@https://storage.googleapis.com/kubernetes-release/network-plugins/cni-0799f5732f2a11b329d9e3d51b9c8f2e3759f2ff.tar.gz
- 0dc1b84eac8bd859e4ba266f7f40111cda26bb30@https://kubeupv2.s3.amazonaws.com/kops/1.9.0-alpha.1/linux/amd64/utils.tar.gz
ClusterName: platform.k8s.local
ConfigBase: s3://cluster-state-store/platform.k8s.local
InstanceGroupName: master-eu-west-1a
Tags:
- _automatic_upgrades
- _aws
- _kubernetes_master
- _networking_cni
channels:
- s3://cluster-state-store/platform.k8s.local/addons/bootstrap-channel.yaml
protokubeImage:
  hash: 7dfd9493043ffa8ea7437e188613b470947b37f7
  name: protokube:1.9.0-alpha.1
  source: https://kubeupv2.s3.amazonaws.com/kops/1.9.0-alpha.1/images/protokube.tar.gz

jpds on 13 Mar 2018

Leave command list I used to roll back broken k8s for somebody.

# change kubernetesVersion. e.g. 1.8.4
$ kops edit cluster your_cluster

# change image. e.g. kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
$ kops edit ig master-us-west-2a --name your_cluster
$ kops edit ig nodes --name your_cluster

$ kops update cluster your_cluster
$ kops update cluster your_cluster --yes

$ kops rolling-update cluster your_cluster 
$ kops rolling-update cluster your_cluster --yes --cloudonly --fail-on-validate-error="false"

jngbng on 13 Mar 2018

What I had to do:

Change /var/cache/kubernetes-install/kube_env.yaml to:

Assets:
- 0f3a59e4c0aae8c2b2a0924d8ace010ebf39f48e@https://storage.googleapis.com/kubernetes-release/release/v1.8.7/bin/linux/amd64/kubelet
- 36340bb4bb158357fe36ffd545d8295774f55ed9@https://storage.googleapis.com/kubernetes-release/release/v1.8.7/bin/linux/amd64/kubectl
- 1d9788b0f5420e1a219aad2cb8681823fc515e7c@https://storage.googleapis.com/kubernetes-release/network-plugins/cni-0799f5732f2a11b329d9e3d51b9c8f2e3759f2ff.tar.gz
- f62360d3351bed837ae3ffcdee65e9d57511695a@https://kubeupv2.s3.amazonaws.com/kops/1.8.0/linux/amd64/utils.tar.gz
ClusterName: platform.k8s.local
ConfigBase: s3://cluster-state-store/platform.k8s.local
InstanceGroupName: master-eu-west-1a
Tags:
- _automatic_upgrades
- _aws
- _kubernetes_master
- _networking_cni
channels:
- s3://cluster-state-store/platform.k8s.local/addons/bootstrap-channel.yaml
protokubeImage:
  hash: 1b972e92520b3cafd576893ae3daeafdd1bc9ffd
  name: protokube:1.8.0
  source: https://kubeupv2.s3.amazonaws.com/kops/1.8.0/images/protokube.tar.gz

Run:

$ wget https://kubeupv2.s3.amazonaws.com/kops/1.8.0/linux/amd64/nodeup
$ chmod +x nodeup
$ sudo mv nodeup /var/cache/kubernetes-install/
$ sudo systemctl restart kops-configuration # otherwise it keeps using the old version of nodeup
$ sudo docker ps -a

jpds on 13 Mar 2018

The above didn't help, I was able to bring up the masters but couldn't get the nodes to come back up. Brought down the semi-production cluster and redeployed.

jpds on 13 Mar 2018

@fredsted Thanks for the "shitty" script. It fixed the master node for me.