Kops: rootVolumeSize value not respected for spot node instance group with M5 instance

Created on 24 Jan 2018  路  17Comments  路  Source: kubernetes/kops

  1. kops version: Version 1.8.0 (git-5099bc5)

  2. kubectl version: v1.8.7

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.7", GitCommit:"b30876a5539f09684ff9fde266fda10b37738c9c", GitTreeState:"clean", BuildDate:"2018-01-16T21:59:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.7", GitCommit:"b30876a5539f09684ff9fde266fda10b37738c9c", GitTreeState:"clean", BuildDate:"2018-01-16T21:52:38Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
  1. cloud provider: AWS

  2. What commands did you run? What is the simplest way to reproduce this issue?
    Created an instance group with following configuration.

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  labels:
    kops.k8s.io/cluster: example.com
  name: spot-nodes
spec:
  additionalSecurityGroups:
  - sg-1e310660
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
  machineType: m5.4xlarge
  maxPrice: "0.4"
  maxSize: 4
  minSize: 4
  nodeLabels:
    spot: "true"
  role: Node
  rootVolumeSize: 100
  subnets:
  - us-east-1a
  taints:
  - spot=true:NoSchedule
  1. What happened after the commands executed?
    The nodes ASG was created with 100 GB root volume, but with only 8GB partition being used.
~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             10M     0   10M   0% /dev
tmpfs            13G  5.0M   13G   1% /run
/dev/nvme0n1p1  7.4G  5.6G  1.5G  80% /
tmpfs            31G     0   31G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            31G     0   31G   0% /sys/fs/cgroup

~$ lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
nvme0n1     259:0    0  100G  0 disk
鈹斺攢nvme0n1p1 259:1    0    8G  0 part /
  1. What did you expect to happen?
    Node using the full 100GB capacity of the root volume.

  2. cluster manifest:

$ kops get --name $NAME -oyaml
apiVersion: kops/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: 2018-01-17T11:04:19Z
  name: example.com
spec:
  api:
    dns: {}
  authorization:
    alwaysAllow: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://my-bucket/example.com
  dnsZone: example.com
  etcdClusters:
  - enableEtcdTLS: true
    etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    name: main
    version: 3.0.17
  - enableEtcdTLS: true
    etcdMembers:
    - instanceGroup: master-us-east-1a
      name: a
    name: events
    version: 3.0.17
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    runtimeConfig:
      admissionregistration.k8s.io/v1alpha1: "true"
  kubernetesApiAccess:
  - 10.2.0.0/16
  kubernetesVersion: 1.8.7
  masterPublicName: api.example.com
  networkCIDR: 10.2.0.0/16
  networkID: vpc-xxx
  networking:
    kubenet: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 10.2.0.0/16
  sshKeyName: xxx
  subnets:
  - cidr: 10.2.48.0/20
    name: us-east-1a
    type: Public
    zone: us-east-1a
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-01-17T11:04:19Z
  labels:
    kops.k8s.io/cluster: example.com
  name: master-us-east-1a
spec:
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
  machineType: t2.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-us-east-1a
  role: Master
  subnets:
  - us-east-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-01-17T11:04:19Z
  labels:
    kops.k8s.io/cluster: example.com
  name: nodes
spec:
  additionalSecurityGroups:
  - sg-1e310660
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
  machineType: m4.4xlarge
  maxSize: 10
  minSize: 10
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  rootVolumeSize: 200
  subnets:
  - us-east-1a

---

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-01-23T09:06:04Z
  labels:
    kops.k8s.io/cluster: example.com
  name: spot-nodes
spec:
  additionalSecurityGroups:
  - sg-1e310660
  image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-12-02
  machineType: m5.4xlarge
  maxPrice: "0.4"
  maxSize: 4
  minSize: 4
  nodeLabels:
    spot: "true"
  role: Node
  rootVolumeSize: 100
  subnets:
  - us-east-1a
  taints:
  - spot=true:NoSchedule
  1. Please run the commands with most verbose logging by adding the -v 10 flag.
    Paste the logs into this report, or in a gist and provide the gist link here.

  2. Anything else do we need to know?
    Other node instance group has partition with same size as the volume.

~$ lsblk
NAME    MAJ:MIN   RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0      0  200G  0 disk
鈹斺攢xvda1 202:1      0  200G  0 part /

Most helpful comment

Also seeing this issue on our new M5 Nodes.

All 17 comments

I just came across the exact same issue, disk is created with the correct size but root partition defaults to 8gb instead. Pretty much the same config as above.

I will go ahead and assume it is a debian jessie issue: https://github.com/kubernetes/kops/blob/master/docs/releases/1.8-NOTES.md

New AWS instance types: P3, C5, M5, H1. Please note that NVME volumes are not supported on the default jessie image, so masters will not boot on M5 and C5 instance types unless a stretch image is chosen (change stretch to jessie in the image name). Also note that kubernetes will not support mounting persistent volumes on NVME instances until Kubernetes 1.9.

This should affect masters and nodes.

From the statement in release notes, I assumed only masters are affected since it says

...NVME volumes are not supported on the default jessie image, so masters will not boot on M5 and C5 instance types...

Also seeing this issue on our new M5 Nodes.

@ApsOps does it work with stretch?

@chrislovecnm yeah, looks like it works with the stretch image. There's an extra 1MB partition (of unusable space, I think) though.

~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             31G     0   31G   0% /dev
tmpfs           6.2G  4.5M  6.2G   1% /run
/dev/nvme0n1p2   94G  3.1G   87G   4% /
tmpfs            31G     0   31G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            31G     0   31G   0% /sys/fs/cgroup

~$ lsblk
NAME        MAJ:MIN RM    SIZE RO TYPE MOUNTPOINT
nvme0n1     259:0    0    100G  0 disk
鈹溾攢nvme0n1p1 259:1    0 1007.5K  0 part
鈹斺攢nvme0n1p2 259:2    0    100G  0 part /

We ran into the same issue (cluster worked fine, until we started a few pods containing large images, then the images would not be deployed because of "No space left on device").
Switching from m5.xlarge to m4.xlarge instances seems to fix it for now, but since older instance types are getting more expensive, I hope this issue can be fixed soon.

Any news on this?

@fredsted It works with the stretch image.

Ah, sorry. I replaced the image name in my ig and it seems to work.

Closing since NVME volumes (C5,M5, etc. instances) are supported in the debian stretch image. Please reopen if you still face any issues.

Which stretch image are you using?

I'm using kope.io/k8s-1.8-debian-stretch-amd64-hvm-ebs-2018-02-08 with m5.xlarge instances.

this is happening for me as well.

Is there any resolution to this? Im having this issue too

@cheynewallace check the previous comments. It works with debian-stretch images.

I found this - https://github.com/kubernetes/kops/blob/master/channels/stable

Does this mean I need to be on kubernetesVersion: ">=1.11.0" before switching to the image that supports NVME (stretch)?

How did you find the compatible image name? Not sure I'm looking in the right place.

@richstokes For AWS, I found a complete list here: https://eu-central-1.console.aws.amazon.com/ec2/v2/home?region=eu-central-1#Images:visibility=public-images;ownerAlias=383156758163;sort=name

Was this page helpful?
0 / 5 - 0 ratings