Version 1.9.0-alpha.1 (git-f799036a3)
Cloud provider: AWS
Upgrade from k8s 1.8.8 to 1.9.3, the 1.8.8 upgrade was done with kops 1.8.1
nodeup fails with:
Mar 01 20:18:07 ip-10-88-1-7 nodeup[936]: I0301 20:18:07.218152 936 s3fs.go:198] Reading file "s3://kops-k8s-state-store/e2e.us-east-1.aws.k8s/pki/issued/ca/keyset.yaml"
Mar 01 20:18:07 ip-10-88-1-7 nodeup[936]: W0301 20:18:07.225106 936 main.go:141] got error running nodeup (will retry in 30s): error building loader: CA certificate "ca" not found
It appears that 1.9.0-alpha.1 isn't creating the keyset.yaml for upgrades.
cc @justinsb ... i believe this fixed it? https://github.com/kubernetes/kops/pull/4375
That was for 1.8.1 to ignore keyset.yaml files created by a 1.9.0 install, this is a 1.9.0 _upgrade_ not creating the keyset.yaml file properly.
We need kops, during an update, so convert the existing certs into a keyset. Then write the keyset to the state store.
Is where the write happens.
Any way to mitigate this? Can I generate this keyset.yaml file manually?
Update 1
I found the structure of keyset.yaml here: https://github.com/kubernetes/kops/blob/1c75f475101110383ff077f515756ba060a1b997/upup/pkg/fi/vfs_castore_test.go#L109
Trying to recreate that using the *.crt file in the pki/issued/ca folder....
Update 2
So in order to convert .crt to keyset.yaml, you need to add a file containing the following:
apiVersion: kops/v1alpha2
kind: Keyset
metadata:
creationTimestamp: null
name: ca
spec:
keys:
- id: "[name of the .crt file without .crt extension]"
publicMaterial: [base64 encoded stirng containing the .crt file]
type: Keypair
Has to be done for all the folders and subfolders in /pki in S3, trying to write a script to do that.
Update 3
I generated keysets for both .crt and .key files and after that I got the cluster to boot! 馃帀
Here's my shitty script: https://gist.github.com/fredsted/034dbf2f8b1117c37a4add0efeea7029 :)
@fredsted I have a PR in today - please feel free to comment and test. I am getting a new private key, which we will probably need to work out.
Could someone please add a note about this issue on the release page for 1.9.0-alpha1? I would not have attempted the upgrade if I had known about this but now I have a cluster with no control plane and no easy way of rolling back to 1.8.7.
It would appear that downgrading the kubernetes version doesn't also bring down the version of kubeup utils:
/var/cache/kubernetes-install $ cat kube_env.yaml
Assets:
- 0f3a59e4c0aae8c2b2a0924d8ace010ebf39f48e@https://storage.googleapis.com/kubernetes-release/release/v1.8.7/bin/linux/amd64/kubelet
- 36340bb4bb158357fe36ffd545d8295774f55ed9@https://storage.googleapis.com/kubernetes-release/release/v1.8.7/bin/linux/amd64/kubectl
- 1d9788b0f5420e1a219aad2cb8681823fc515e7c@https://storage.googleapis.com/kubernetes-release/network-plugins/cni-0799f5732f2a11b329d9e3d51b9c8f2e3759f2ff.tar.gz
- 0dc1b84eac8bd859e4ba266f7f40111cda26bb30@https://kubeupv2.s3.amazonaws.com/kops/1.9.0-alpha.1/linux/amd64/utils.tar.gz
ClusterName: platform.k8s.local
ConfigBase: s3://cluster-state-store/platform.k8s.local
InstanceGroupName: master-eu-west-1a
Tags:
- _automatic_upgrades
- _aws
- _kubernetes_master
- _networking_cni
channels:
- s3://cluster-state-store/platform.k8s.local/addons/bootstrap-channel.yaml
protokubeImage:
hash: 7dfd9493043ffa8ea7437e188613b470947b37f7
name: protokube:1.9.0-alpha.1
source: https://kubeupv2.s3.amazonaws.com/kops/1.9.0-alpha.1/images/protokube.tar.gz
Leave command list I used to roll back broken k8s for somebody.
# change kubernetesVersion. e.g. 1.8.4
$ kops edit cluster your_cluster
# change image. e.g. kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
$ kops edit ig master-us-west-2a --name your_cluster
$ kops edit ig nodes --name your_cluster
$ kops update cluster your_cluster
$ kops update cluster your_cluster --yes
$ kops rolling-update cluster your_cluster
$ kops rolling-update cluster your_cluster --yes --cloudonly --fail-on-validate-error="false"
What I had to do:
Change /var/cache/kubernetes-install/kube_env.yaml to:
Assets:
- 0f3a59e4c0aae8c2b2a0924d8ace010ebf39f48e@https://storage.googleapis.com/kubernetes-release/release/v1.8.7/bin/linux/amd64/kubelet
- 36340bb4bb158357fe36ffd545d8295774f55ed9@https://storage.googleapis.com/kubernetes-release/release/v1.8.7/bin/linux/amd64/kubectl
- 1d9788b0f5420e1a219aad2cb8681823fc515e7c@https://storage.googleapis.com/kubernetes-release/network-plugins/cni-0799f5732f2a11b329d9e3d51b9c8f2e3759f2ff.tar.gz
- f62360d3351bed837ae3ffcdee65e9d57511695a@https://kubeupv2.s3.amazonaws.com/kops/1.8.0/linux/amd64/utils.tar.gz
ClusterName: platform.k8s.local
ConfigBase: s3://cluster-state-store/platform.k8s.local
InstanceGroupName: master-eu-west-1a
Tags:
- _automatic_upgrades
- _aws
- _kubernetes_master
- _networking_cni
channels:
- s3://cluster-state-store/platform.k8s.local/addons/bootstrap-channel.yaml
protokubeImage:
hash: 1b972e92520b3cafd576893ae3daeafdd1bc9ffd
name: protokube:1.8.0
source: https://kubeupv2.s3.amazonaws.com/kops/1.8.0/images/protokube.tar.gz
Run:
$ wget https://kubeupv2.s3.amazonaws.com/kops/1.8.0/linux/amd64/nodeup
$ chmod +x nodeup
$ sudo mv nodeup /var/cache/kubernetes-install/
$ sudo systemctl restart kops-configuration # otherwise it keeps using the old version of nodeup
$ sudo docker ps -a
The above didn't help, I was able to bring up the masters but couldn't get the nodes to come back up. Brought down the semi-production cluster and redeployed.
@fredsted Thanks for the "shitty" script. It fixed the master node for me.
Most helpful comment
Any way to mitigate this? Can I generate this keyset.yaml file manually?
Update 1
I found the structure of keyset.yaml here: https://github.com/kubernetes/kops/blob/1c75f475101110383ff077f515756ba060a1b997/upup/pkg/fi/vfs_castore_test.go#L109
Trying to recreate that using the *.crt file in the pki/issued/ca folder....
Update 2
So in order to convert .crt to keyset.yaml, you need to add a file containing the following:
Has to be done for all the folders and subfolders in /pki in S3, trying to write a script to do that.
Update 3
I generated keysets for both .crt and .key files and after that I got the cluster to boot! 馃帀
Here's my shitty script: https://gist.github.com/fredsted/034dbf2f8b1117c37a4add0efeea7029 :)