------------- BUG REPORT TEMPLATE --------------------
kops version are you running? The command kops version, will display1.8.1
kubectl version will print thekops flag.1.8.6
AWS
I provisioned a cluster using kops-generated Terraform, that I modified somewhat to work with the rest of my infrastructure. The cluster was been running for weeks without issue.
Yesterday, I tore down the cluster and rebuilt it (terraform destroy/apply). The cluster will not come back up. Instead, protokube hangs, waiting for the etcd volumes to attach. Here is the log output. The "waiting for volume to be attached" message repeats endlessly. I have confirmed via AWS CLI and console that the EBS volume is attached to the EC2 instance.
Mar 30 22:30:37 ip-172-31-1-13 systemd[1]: Starting Kubernetes Protokube Service...
Mar 30 22:30:37 ip-172-31-1-13 systemd[1]: Started Kubernetes Protokube Service.
Mar 30 22:30:37 ip-172-31-1-13 docker[1546]: protokube version 0.1
Mar 30 22:30:37 ip-172-31-1-13 docker[1546]: I0330 22:30:37.674785 1576 aws_volume.go:72] AWS API Request: ec2metadata/GetMetadata
Mar 30 22:30:37 ip-172-31-1-13 docker[1546]: I0330 22:30:37.676202 1576 aws_volume.go:72] AWS API Request: ec2metadata/GetMetadata
Mar 30 22:30:37 ip-172-31-1-13 docker[1546]: I0330 22:30:37.676962 1576 aws_volume.go:72] AWS API Request: ec2metadata/GetMetadata
Mar 30 22:30:37 ip-172-31-1-13 docker[1546]: I0330 22:30:37.678841 1576 aws_volume.go:72] AWS API Request: ec2/DescribeInstances
Mar 30 22:30:37 ip-172-31-1-13 docker[1546]: I0330 22:30:37.781307 1576 aws_volume.go:72] AWS API Request: ec2/DescribeVolumes
Mar 30 22:30:37 ip-172-31-1-13 docker[1546]: I0330 22:30:37.781410 1576 dnscontroller.go:101] starting DNS controller
Mar 30 22:30:37 ip-172-31-1-13 docker[1546]: I0330 22:30:37.781437 1576 dnscache.go:75] querying all DNS zones (no cached results)
Mar 30 22:30:37 ip-172-31-1-13 docker[1546]: I0330 22:30:37.782471 1576 route53.go:50] AWS request: route53 ListHostedZones
Mar 30 22:30:37 ip-172-31-1-13 docker[1546]: I0330 22:30:37.911045 1576 volume_mounter.go:254] Trying to mount master volume: "vol-0f1a04a36c6baaaae"
Mar 30 22:30:37 ip-172-31-1-13 docker[1546]: I0330 22:30:37.911510 1576 aws_volume.go:72] AWS API Request: ec2/AttachVolume
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: I0330 22:30:38.184418 1576 aws_volume.go:396] AttachVolume request returned {
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: AttachTime: 2018-03-30 22:30:38.164 +0000 UTC,
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: Device: "/dev/xvdu",
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: InstanceId: "i-0d46fbb1317501ac0",
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: State: "attaching",
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: VolumeId: "vol-0f1a04a36c6baaaae"
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: }
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: I0330 22:30:38.184693 1576 aws_volume.go:72] AWS API Request: ec2/DescribeVolumes
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: I0330 22:30:38.268938 1576 volume_mounter.go:254] Trying to mount master volume: "vol-020d90a464b55678f"
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: I0330 22:30:38.269163 1576 aws_volume.go:72] AWS API Request: ec2/AttachVolume
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: I0330 22:30:38.560245 1576 aws_volume.go:396] AttachVolume request returned {
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: AttachTime: 2018-03-30 22:30:38.543 +0000 UTC,
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: Device: "/dev/xvdv",
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: InstanceId: "i-0d46fbb1317501ac0",
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: State: "attaching",
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: VolumeId: "vol-020d90a464b55678f"
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: }
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: I0330 22:30:38.560654 1576 aws_volume.go:72] AWS API Request: ec2/DescribeVolumes
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: I0330 22:30:38.658480 1576 volume_mounter.go:273] Currently attached volumes: [{"ID":"vol-0f1a04a36c6baaaae","LocalDevice":"/dev/xvdu","AttachedTo":"","Mountpoint":"","Status":"available","Info":{"Description":"vol-0f1a04a36c6baaaae","EtcdClusters":[{"clusterKey":"main","nodeName":"b","nodeNames":["a","b","c"]}]}} {"ID":"vol-020d90a464b55678f","LocalDevice":"/dev/xvdv","AttachedTo":"","Mountpoint":"","Status":"available","Info":{"Description":"vol-020d90a464b55678f","EtcdClusters":[{"clusterKey":"events","nodeName":"b","nodeNames":["a","b","c"]}]}}]
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: I0330 22:30:38.658815 1576 volume_mounter.go:58] Master volume "vol-0f1a04a36c6baaaae" is attached at "/dev/xvdu"
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: I0330 22:30:38.659340 1576 volume_mounter.go:72] Doing safe-format-and-mount of /dev/xvdu to /mnt/master-vol-0f1a04a36c6baaaae
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: I0330 22:30:38.659365 1576 aws_volume.go:318] nvme path not found "/rootfs/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0f1a04a36c6baaaae"
Mar 30 22:30:38 ip-172-31-1-13 docker[1546]: I0330 22:30:38.659373 1576 volume_mounter.go:107] Waiting for volume "vol-0f1a04a36c6baaaae" to be attached
Mar 30 22:30:39 ip-172-31-1-13 docker[1546]: I0330 22:30:39.659499 1576 aws_volume.go:318] nvme path not found "/rootfs/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0f1a04a36c6baaaae"
Mar 30 22:30:39 ip-172-31-1-13 docker[1546]: I0330 22:30:39.659519 1576 volume_mounter.go:107] Waiting for volume "vol-0f1a04a36c6baaaae" to be attached
Mar 30 22:30:40 ip-172-31-1-13 docker[1546]: I0330 22:30:40.659641 1576 aws_volume.go:318] nvme path not found "/rootfs/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0f1a04a36c6baaaae"
Mar 30 22:30:40 ip-172-31-1-13 docker[1546]: I0330 22:30:40.659660 1576 volume_mounter.go:107] Waiting for volume "vol-0f1a04a36c6baaaae" to be attached
The cluster should start normally.
kops get --name my.example.com -oyaml to display your cluster manifest.metadata:
creationTimestamp: 2018-02-09T00:29:28Z
name: redacted
spec:
api:
loadBalancer:
type: Public
authorization:
rbac: {}
channel: stable
cloudProvider: aws
clusterDNSDomain: cluster.local
configBase: s3://redacted
configStore: s3://redacted
dnsZone: redacted
docker:
bridge: ""
ipMasq: false
ipTables: false
logDriver: json-file
logLevel: warn
logOpt:
- max-size=10m
- max-file=5
storage: overlay,aufs
version: 1.13.1
etcdClusters:
- etcdMembers:
- encryptedVolume: true
instanceGroup: master-us-west-2a
name: a
- encryptedVolume: true
instanceGroup: master-us-west-2b
name: b
- encryptedVolume: true
instanceGroup: master-us-west-2c
name: c
name: main
version: 2.2.1
- etcdMembers:
- encryptedVolume: true
instanceGroup: master-us-west-2a
name: a
- encryptedVolume: true
instanceGroup: master-us-west-2b
name: b
- encryptedVolume: true
instanceGroup: master-us-west-2c
name: c
name: events
version: 2.2.1
iam:
allowContainerRegistry: true
legacy: false
keyStore: s3://redacted/pki
kubeAPIServer:
address: 127.0.0.1
admissionControl:
- Initializers
- NamespaceLifecycle
- LimitRanger
- ServiceAccount
- PersistentVolumeLabel
- DefaultStorageClass
- DefaultTolerationSeconds
- NodeRestriction
- Priority
- ResourceQuota
allowPrivileged: true
anonymousAuth: false
apiServerCount: 3
authorizationMode: RBAC
cloudProvider: aws
etcdServers:
- http://127.0.0.1:4001
etcdServersOverrides:
- /events#http://127.0.0.1:4002
image: gcr.io/google_containers/kube-apiserver:v1.8.6
insecurePort: 8080
kubeletPreferredAddressTypes:
- InternalIP
- Hostname
- ExternalIP
logLevel: 2
requestheaderAllowedNames:
- aggregator
requestheaderExtraHeaderPrefixes:
- X-Remote-Extra-
requestheaderGroupHeaders:
- X-Remote-Group
requestheaderUsernameHeaders:
- X-Remote-User
securePort: 443
serviceClusterIPRange: 100.64.0.0/13
storageBackend: etcd2
kubeControllerManager:
allocateNodeCIDRs: true
attachDetachReconcileSyncPeriod: 1m0s
cloudProvider: aws
clusterCIDR: 100.96.0.0/11
clusterName: redacted
configureCloudRoutes: false
image: gcr.io/google_containers/kube-controller-manager:v1.8.6
leaderElection:
leaderElect: true
logLevel: 2
useServiceAccountCredentials: true
kubeDNS:
domain: cluster.local
replicas: 2
serverIP: 100.64.0.10
kubeProxy:
clusterCIDR: 100.96.0.0/11
cpuRequest: 100m
featureGates: null
hostnameOverride: '@aws'
image: gcr.io/google_containers/kube-proxy:v1.8.6
logLevel: 2
kubeScheduler:
image: gcr.io/google_containers/kube-scheduler:v1.8.6
leaderElection:
leaderElect: true
logLevel: 2
kubelet:
allowPrivileged: true
cgroupRoot: /
cloudProvider: aws
clusterDNS: 100.64.0.10
clusterDomain: cluster.local
enableDebuggingHandlers: true
evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5%
featureGates:
ExperimentalCriticalPodAnnotation: "true"
hostnameOverride: '@aws'
kubeconfigPath: /var/lib/kubelet/kubeconfig
logLevel: 2
networkPluginMTU: 9001
networkPluginName: kubenet
nonMasqueradeCIDR: 100.64.0.0/10
podInfraContainerImage: gcr.io/google_containers/pause-amd64:3.0
podManifestPath: /etc/kubernetes/manifests
requireKubeconfig: true
kubernetesApiAccess:
- 0.0.0.0/0
kubernetesVersion: 1.8.6
masterInternalName: api.internal.redacted
masterKubelet:
allowPrivileged: true
cgroupRoot: /
cloudProvider: aws
clusterDNS: 100.64.0.10
clusterDomain: cluster.local
enableDebuggingHandlers: true
evictionHard: memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5%
featureGates:
ExperimentalCriticalPodAnnotation: "true"
hostnameOverride: '@aws'
kubeconfigPath: /var/lib/kubelet/kubeconfig
logLevel: 2
networkPluginMTU: 9001
networkPluginName: kubenet
nonMasqueradeCIDR: 100.64.0.0/10
podInfraContainerImage: gcr.io/google_containers/pause-amd64:3.0
podManifestPath: /etc/kubernetes/manifests
registerSchedulable: false
requireKubeconfig: true
masterPublicName: api.redacted
networkCIDR: 172.31.0.0/22
networking:
kopeio: {}
nonMasqueradeCIDR: 100.64.0.0/10
secretStore: s3://redacted/secrets
serviceClusterIPRange: 100.64.0.0/13
sshAccess:
- 0.0.0.0/0
subnets:
- id: subnet-e89740a3
name: us-west-2a
type: Private
zone: us-west-2a
- id: subnet-5967d220
name: us-west-2b
type: Private
zone: us-west-2b
- id: subnet-4c23b616
name: us-west-2c
type: Private
zone: us-west-2c
- id: subnet-e99740a2
name: utility-us-west-2a
type: Utility
zone: us-west-2a
- id: subnet-4460d53d
name: utility-us-west-2b
type: Utility
zone: us-west-2b
- id: subnet-c822b792
name: utility-us-west-2c
type: Utility
zone: us-west-2c
topology:
bastion:
bastionPublicName: bastion.redacted
dns:
type: Public
masters: private
nodes: private
Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Anything else do we need to know?
I solved the issue. Apparently, the new M5 AWS instance types are not supported in kops 1.8.1, as it does not support NVME for EBS volumes yet. Changing the instance type to M4 resolved the issue. There should be a warning when attempting to use unsupported instances types when provisioning a kops cluster.
Still same issue with M5 on kops 1.9.0
Are you using Debian as your base image? If so, I assume Jessie? If you upgrade to Stretch that resolves the issue and allows you to use M5 with 1.9.0.
@thereverendtom I have kops 1.9.0 installed and cluster upgrade has nothing to do as it is already 1.9.3 from stable. It's still Jessie and from what I can tell from channels (stable, alpha) there is no Stretch option. So as kops 1.9.0 stands with deploying 1.9.3 it doesn't support M5 without some manual changes as far as I can tell because upgrade or without some sort of manual update to the ig to doesn't upgrade image to stretch.
Yeah, I think you may have to make a manual edit to the cluster config to specify Stretch over Jessie. I have kops generating Terraform, and then using that, so I just made the manual changes before applying the Terraform.
Most helpful comment
I solved the issue. Apparently, the new M5 AWS instance types are not supported in kops 1.8.1, as it does not support NVME for EBS volumes yet. Changing the instance type to M4 resolved the issue. There should be a warning when attempting to use unsupported instances types when provisioning a kops cluster.