kops version: Version 1.8.0 (git-5099bc5)
kubectl version: v1.8.7
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.7", GitCommit:"b30876a5539f09684ff9fde266fda10b37738c9c", GitTreeState:"clean", BuildDate:"2018-01-16T21:59:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.7", GitCommit:"b30876a5539f09684ff9fde266fda10b37738c9c", GitTreeState:"clean", BuildDate:"2018-01-16T21:52:38Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
cloud provider: AWS
What commands did you run? What is the simplest way to reproduce this issue?
Created an instance group with following configuration.
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: example.com
name: spot-nodes
spec:
additionalSecurityGroups:
- sg-1e310660
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
machineType: m5.4xlarge
maxPrice: "0.4"
maxSize: 4
minSize: 4
nodeLabels:
spot: "true"
role: Node
rootVolumeSize: 100
subnets:
- us-east-1a
taints:
- spot=true:NoSchedule
~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 10M 0 10M 0% /dev
tmpfs 13G 5.0M 13G 1% /run
/dev/nvme0n1p1 7.4G 5.6G 1.5G 80% /
tmpfs 31G 0 31G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 31G 0 31G 0% /sys/fs/cgroup
~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 100G 0 disk
鈹斺攢nvme0n1p1 259:1 0 8G 0 part /
What did you expect to happen?
Node using the full 100GB capacity of the root volume.
cluster manifest:
$ kops get --name $NAME -oyaml
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
creationTimestamp: 2018-01-17T11:04:19Z
name: example.com
spec:
api:
dns: {}
authorization:
alwaysAllow: {}
channel: stable
cloudProvider: aws
configBase: s3://my-bucket/example.com
dnsZone: example.com
etcdClusters:
- enableEtcdTLS: true
etcdMembers:
- instanceGroup: master-us-east-1a
name: a
name: main
version: 3.0.17
- enableEtcdTLS: true
etcdMembers:
- instanceGroup: master-us-east-1a
name: a
name: events
version: 3.0.17
iam:
allowContainerRegistry: true
legacy: false
kubeAPIServer:
runtimeConfig:
admissionregistration.k8s.io/v1alpha1: "true"
kubernetesApiAccess:
- 10.2.0.0/16
kubernetesVersion: 1.8.7
masterPublicName: api.example.com
networkCIDR: 10.2.0.0/16
networkID: vpc-xxx
networking:
kubenet: {}
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- 10.2.0.0/16
sshKeyName: xxx
subnets:
- cidr: 10.2.48.0/20
name: us-east-1a
type: Public
zone: us-east-1a
topology:
dns:
type: Public
masters: public
nodes: public
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-01-17T11:04:19Z
labels:
kops.k8s.io/cluster: example.com
name: master-us-east-1a
spec:
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
machineType: t2.large
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-us-east-1a
role: Master
subnets:
- us-east-1a
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-01-17T11:04:19Z
labels:
kops.k8s.io/cluster: example.com
name: nodes
spec:
additionalSecurityGroups:
- sg-1e310660
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
machineType: m4.4xlarge
maxSize: 10
minSize: 10
nodeLabels:
kops.k8s.io/instancegroup: nodes
role: Node
rootVolumeSize: 200
subnets:
- us-east-1a
---
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-01-23T09:06:04Z
labels:
kops.k8s.io/cluster: example.com
name: spot-nodes
spec:
additionalSecurityGroups:
- sg-1e310660
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
machineType: m5.4xlarge
maxPrice: "0.4"
maxSize: 4
minSize: 4
nodeLabels:
spot: "true"
role: Node
rootVolumeSize: 100
subnets:
- us-east-1a
taints:
- spot=true:NoSchedule
Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Anything else do we need to know?
Other node instance group has partition with same size as the volume.
~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 200G 0 disk
鈹斺攢xvda1 202:1 0 200G 0 part /
I just came across the exact same issue, disk is created with the correct size but root partition defaults to 8gb instead. Pretty much the same config as above.
I will go ahead and assume it is a debian jessie issue: https://github.com/kubernetes/kops/blob/master/docs/releases/1.8-NOTES.md
New AWS instance types: P3, C5, M5, H1. Please note that NVME volumes are not supported on the default jessie image, so masters will not boot on M5 and C5 instance types unless a stretch image is chosen (change stretch to jessie in the image name). Also note that kubernetes will not support mounting persistent volumes on NVME instances until Kubernetes 1.9.
This should affect masters and nodes.
From the statement in release notes, I assumed only masters are affected since it says
...NVME volumes are not supported on the default jessie image, so masters will not boot on M5 and C5 instance types...
Also seeing this issue on our new M5 Nodes.
@ApsOps does it work with stretch?
@chrislovecnm yeah, looks like it works with the stretch image. There's an extra 1MB partition (of unusable space, I think) though.
~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 31G 0 31G 0% /dev
tmpfs 6.2G 4.5M 6.2G 1% /run
/dev/nvme0n1p2 94G 3.1G 87G 4% /
tmpfs 31G 0 31G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 31G 0 31G 0% /sys/fs/cgroup
~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 100G 0 disk
鈹溾攢nvme0n1p1 259:1 0 1007.5K 0 part
鈹斺攢nvme0n1p2 259:2 0 100G 0 part /
We ran into the same issue (cluster worked fine, until we started a few pods containing large images, then the images would not be deployed because of "No space left on device").
Switching from m5.xlarge to m4.xlarge instances seems to fix it for now, but since older instance types are getting more expensive, I hope this issue can be fixed soon.
Any news on this?
@fredsted It works with the stretch image.
Ah, sorry. I replaced the image name in my ig and it seems to work.
Closing since NVME volumes (C5,M5, etc. instances) are supported in the debian stretch image. Please reopen if you still face any issues.
Which stretch image are you using?
I'm using kope.io/k8s-1.8-debian-stretch-amd64-hvm-ebs-2018-02-08 with m5.xlarge instances.
this is happening for me as well.
Is there any resolution to this? Im having this issue too
@cheynewallace check the previous comments. It works with debian-stretch images.
I found this - https://github.com/kubernetes/kops/blob/master/channels/stable
Does this mean I need to be on kubernetesVersion: ">=1.11.0" before switching to the image that supports NVME (stretch)?
How did you find the compatible image name? Not sure I'm looking in the right place.
@richstokes For AWS, I found a complete list here: https://eu-central-1.console.aws.amazon.com/ec2/v2/home?region=eu-central-1#Images:visibility=public-images;ownerAlias=383156758163;sort=name
Most helpful comment
Also seeing this issue on our new M5 Nodes.