Version 1.18.1 (git-453d7d96be)
AWS
kops create cluster \
--cloud=aws \
--zones=us-west-2a \
--name=k8s-cs.domain.net \
--state=s3://domaininfrastructurestate/kops/cs/ \
--dns-zone=k8s-cs.domain.net \
--node-size=t3a.small \
--node-tenancy=default \
--node-volume-size=50 \
--node-count=3 \
--master-size=t3a.small \
--master-volume-size=50 \
--master-zones=us-west-2a \
--master-count=3 \
--out=. \
--target=terraform \
--yes \
--image=ami-028e52edfeb33adb2
W0924 16:33:48.893149 30386 create_cluster.go:771] Running with masters in the same AZs; redundancy will be reduced
I0924 16:33:50.377267 30386 subnets.go:184] Assigned CIDR 172.20.32.0/19 to subnet us-west-2a
I0924 16:33:54.964514 30386 create_cluster.go:1537] Using SSH public key: /home/demi4/.ssh/id_rsa.pub
I0924 16:34:08.385316 30386 executor.go:103] Tasks: 0 done / 95 total; 47 can run
I0924 16:34:08.397555 30386 dnszone.go:242] Check for existing route53 zone to re-use with name "k8s-cs.domain.net"
I0924 16:34:09.249425 30386 dnszone.go:249] Existing zone "k8s-cs.domain.net." found; will configure TF to reuse
I0924 16:34:10.792010 30386 vfs_castore.go:590] Issuing new certificate: "apiserver-aggregator-ca"
I0924 16:34:10.831807 30386 vfs_castore.go:590] Issuing new certificate: "etcd-clients-ca"
I0924 16:34:10.900833 30386 vfs_castore.go:590] Issuing new certificate: "ca"
I0924 16:34:10.919639 30386 vfs_castore.go:590] Issuing new certificate: "etcd-manager-ca-main"
I0924 16:34:11.035820 30386 vfs_castore.go:590] Issuing new certificate: "etcd-peers-ca-events"
I0924 16:34:11.106013 30386 vfs_castore.go:590] Issuing new certificate: "etcd-peers-ca-main"
I0924 16:34:11.226926 30386 vfs_castore.go:590] Issuing new certificate: "etcd-manager-ca-events"
I0924 16:34:16.248941 30386 executor.go:103] Tasks: 47 done / 95 total; 26 can run
I0924 16:34:18.305484 30386 vfs_castore.go:590] Issuing new certificate: "master"
I0924 16:34:18.365115 30386 vfs_castore.go:590] Issuing new certificate: "apiserver-aggregator"
I0924 16:34:18.463993 30386 vfs_castore.go:590] Issuing new certificate: "kops"
I0924 16:34:18.493709 30386 vfs_castore.go:590] Issuing new certificate: "kubelet"
I0924 16:34:18.506919 30386 vfs_castore.go:590] Issuing new certificate: "kube-controller-manager"
I0924 16:34:18.517012 30386 vfs_castore.go:590] Issuing new certificate: "kube-scheduler"
I0924 16:34:18.523661 30386 vfs_castore.go:590] Issuing new certificate: "kubecfg"
I0924 16:34:18.533129 30386 vfs_castore.go:590] Issuing new certificate: "kubelet-api"
I0924 16:34:18.602225 30386 vfs_castore.go:590] Issuing new certificate: "apiserver-proxy-client"
I0924 16:34:18.682028 30386 vfs_castore.go:590] Issuing new certificate: "kube-proxy"
I0924 16:34:22.575969 30386 executor.go:103] Tasks: 73 done / 95 total; 18 can run
I0924 16:34:24.249151 30386 executor.go:103] Tasks: 91 done / 95 total; 4 can run
I0924 16:34:24.249770 30386 executor.go:103] Tasks: 95 done / 95 total; 0 can run
panic: Terraform resource names cannot start with a digit. This is a bug in Kops, please report this in a GitHub Issue. Name: 1.etcd-events.k8s-cs.domain.net
goroutine 1 [running]:
k8s.io/kops/upup/pkg/fi/cloudup/terraform.tfSanitize(0xc000bf2570, 0x23, 0x3d027e5, 0xe)
/go/src/k8s.io/kops/upup/pkg/fi/cloudup/terraform/target.go:104 +0x302
k8s.io/kops/upup/pkg/fi/cloudup/terraform.(*TerraformTarget).finish012(0xc00056e500, 0xc000a077d0, 0x0, 0x34d2ae0)
/go/src/k8s.io/kops/upup/pkg/fi/cloudup/terraform/target_0_12.go:61 +0x6b5
k8s.io/kops/upup/pkg/fi/cloudup/terraform.(*TerraformTarget).Finish(0xc00056e500, 0xc000a077d0, 0xa, 0xc000625700)
/go/src/k8s.io/kops/upup/pkg/fi/cloudup/terraform/target.go:200 +0x52f
k8s.io/kops/upup/pkg/fi/cloudup.(*ApplyClusterCmd).Run(0xc000868000, 0x43b59a0, 0xc000052108, 0x0, 0x0)
/go/src/k8s.io/kops/upup/pkg/fi/cloudup/apply_cluster.go:938 +0x26c1
main.RunUpdateCluster(0x43b59a0, 0xc000052108, 0xc0003e1b60, 0x7ffca7e78ee0, 0x15, 0x4356d20, 0xc00000e018, 0xc000a41e60, 0x0, 0x0, ...)
/go/src/k8s.io/kops/cmd/kops/update_cluster.go:274 +0x9ba
main.RunCreateCluster(0x43b59a0, 0xc000052108, 0xc0003e1b60, 0x4356d20, 0xc00000e018, 0xc00036fc00, 0xc0000e3800, 0xc000845d20)
/go/src/k8s.io/kops/cmd/kops/create_cluster.go:1357 +0x3720
main.NewCmdCreateCluster.func1(0xc00088b400, 0xc0001fbc20, 0x0, 0x11)
/go/src/k8s.io/kops/cmd/kops/create_cluster.go:274 +0x188
github.com/spf13/cobra.(*Command).execute(0xc00088b400, 0xc0001fbb00, 0x11, 0x12, 0xc00088b400, 0xc0001fbb00)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:830 +0x2aa
github.com/spf13/cobra.(*Command).ExecuteC(0x61e3f80, 0x6220128, 0x0, 0x0)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:864
main.Execute()
/go/src/k8s.io/kops/cmd/kops/root.go:96 +0x8f
main.main()
/go/src/k8s.io/kops/cmd/kops/main.go:25 +0x20
What did you expect to happen?
Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.**
```apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "2020-09-24T13:33:51Z"
name: k8s-cs.domain.net
spec:
api:
dns: {}
authorization:
rbac: {}
channel: stable
cloudProvider: aws
configBase: s3://domaininfrastructurestate/kops/cs/k8s-cs.domain.net
containerRuntime: docker
dnsZone: k8s-cs.domain.net
etcdClusters:
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2020-09-24T13:33:52Z"
labels:
kops.k8s.io/cluster: k8s-cs.domain.net
name: master-us-west-2a-1
spec:
image: ami-028e52edfeb33adb2
machineType: t3a.small
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-us-west-2a-1
role: Master
rootVolumeSize: 50
subnets:
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2020-09-24T13:33:52Z"
labels:
kops.k8s.io/cluster: k8s-cs.domain.net
name: master-us-west-2a-2
spec:
image: ami-028e52edfeb33adb2
machineType: t3a.small
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-us-west-2a-2
role: Master
rootVolumeSize: 50
subnets:
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2020-09-24T13:33:53Z"
labels:
kops.k8s.io/cluster: k8s-cs.domain.net
name: master-us-west-2a-3
spec:
image: ami-028e52edfeb33adb2
machineType: t3a.small
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-us-west-2a-3
role: Master
rootVolumeSize: 50
subnets:
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: "2020-09-24T13:33:53Z"
labels:
kops.k8s.io/cluster: k8s-cs.domain.net
name: nodes
spec:
image: ami-028e52edfeb33adb2
machineType: t3a.small
maxSize: 3
minSize: 3
nodeLabels:
kops.k8s.io/instancegroup: nodes
role: Node
rootVolumeSize: 50
subnets:
Thanks for reporting this! This is definitely a bug in Kops. The problem is in the generated terraform code and the EBS volumes used by the etcd cluster. The terraform resource names for the volumes begin with the etcdMember name from the ClusterSpec.
Terraform 0.12 no longer allows resource names to begin with digits, but in your case you have etcdMember names of 1 2 and 3.
Kops could handle this in one of two ways:
volWe could also print a warning during generation that the terraform code will fail and the user needs to change the etcdmember names.
As a side note, I dont know what exactly is involved with changing the member names and how seamless that is. If its straight forward and without downtime, maybe thats what we suggest to terraform users.
/kind bug
As a side note, I dont know what exactly is involved with changing the member names and how seamless that is. If its straight forward and without downtime, maybe thats what we suggest to terraform users.
I had the same issue as @dimitrez with etcd starting with digits and i tried renaming it.
Straight forward is not the term i would use as you have etcd certificates name mismatch during the roll-out for example
I ended up purging masters volumes and restoring from etcd backup but we're testing if it could work by scaling up (adding new members with correct naming) and scaling down old masters
We also ran into this. Renaming the terraform resource would be less of a risk then trying to rename etcd members. I think the suggestion of "if the etcdmember name starts with a digit, prefix the terraform resource name with something like vol" is a good solution, and is inline with other 1.18 changes for terraform 12.
Is there a way to bypass this issue if I create a new cluster ?
@jtbonhomme if you're specifying etcd members with a letter (or prefix) in a brand new cluster, it will work
Like this :
etcdMembers:
- instanceGroup: master-eu-west-3a-1
name: "a1"
@jtbonhomme do you modify in any way the cluster config after creating it?
@hakman no, I don't
@nfillot I am not sure to your point, I tried to export a cluster in yaml format (kops get --name sn-dev.k8s.local -o yaml > cluster.yaml), then I changed etcdMembers names to prefix them with a letter (the letter of the AZ), but how to i generate a terraform manifest from a cluster description file ?
@nfillot I tried to create a brand new cluster with a new name, then update it with --target terraform flag.
It works, thank you !
Thanks @jtbonhomme. I think I found the issue and should be fixed in the next 1.18 release.
This will work only if you are creating the cluster in multiple AZs, in case it helps.
OK @hakman congrats for your investigations and for having found the root cause.
Thank you for your help !
Most helpful comment
OK @hakman congrats for your investigations and for having found the root cause.
Thank you for your help !