Kops: Need to increase root partition size

Created on 4 Jul 2018 · 14Comments · Source: kubernetes/kops

Thanks for submitting an issue! Please fill in as much of the template below as
you can.

------------- BUG REPORT TEMPLATE --------------------

What kops version are you running? The command kops version, will display
this information.
Version 1.8.0 (git-5099bc5)
What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.1", GitCommit:"f38e43b221d08850172a9a4ea785a86a3ffa3b3a", GitTreeState:"clean", BuildDate:"2017-10-11T23:16:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
What cloud provider are you using?
AWS
What commands did you run? What is the simplest way to reproduce this issue?
kops create cluster
What happened after the commands executed?
I could see that the 'root or /" PARTITION size(not volume on AWS EBS) of node is only 8GB but the Volume size of node is 128GB , how to increase the partition ? I have live traffic on this node( but can go for restart but data should not loss)
What did you expect to happen?
I want to have 100+GB of root partition by kops rolling update or upgrade, without losing the data( I am okay for restart but I need the pods, configmaps,secrets.,etc to be retained)
Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops/v1alpha2
kind: Cluster
metadata:
creationTimestamp: 2018-06-12T07:38:11Z
name: prod.example.com
spec:
api:
loadBalancer:
type: Public
authorization:
alwaysAllow: {}
channel: stable
cloudLabels:
Environment: Prod
Provisioner: kops
Role: node
Type: k8s
cloudProvider: aws
configBase: s3://k8s-example-clusters/prod.example.com
dnsZone: example.com
etcdClusters:

etcdMembers:
- instanceGroup: master-ap-south-1a
  
  name: a
  
  name: main
etcdMembers:
- instanceGroup: master-ap-south-1a
  
  name: a
  
  name: events
  
  iam:
  
  allowContainerRegistry: true
  
  legacy: false
  
  kubernetesApiAccess:
0.0.0.0/0
kubernetesVersion: 1.8.1
masterPublicName: api.prod.example.com
networkCIDR: 172.20.0.0/16
networking:
calico: {}
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
0.0.0.0/0
subnets:
cidr: 172.10.32.0/19
name: ap-south-1a
type: Private
zone: ap-south-1a
cidr: 172.10.64.0/19
name: ap-south-1b
type: Private
zone: ap-south-1b
cidr: 172.10.0.0/22
name: utility-ap-south-1a
type: Utility
zone: ap-south-1a
cidr: 172.10.4.0/22
name: utility-ap-south-1b
type: Utility
zone: ap-south-1b
topology:
bastion:
bastionPublicName: bastion.prod.example.com
dns:
type: Public
masters: private
nodes: private

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-06-12T07:38:12Z
labels:
kops.k8s.io/cluster: prod.example.com
name: bastions
spec:
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: t2.micro
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: bastions
role: Bastion
subnets:

utility-ap-south-1a
utility-ap-south-1b

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-06-12T07:38:11Z
labels:
kops.k8s.io/cluster: prod.example.com
name: master-ap-south-1a
spec:
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: t2.small
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-ap-south-1a
role: Master
subnets:

ap-south-1a

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-06-12T07:38:12Z
labels:
kops.k8s.io/cluster: prod.example.com
name: nodes
spec:
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: m5.large
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: nodes
role: Node
subnets:

ap-south-1a
ap-south-1b
1. Please run the commands with most verbose logging by adding the -v 10 flag.
  
  Paste the logs into this report, or in a gist and provide the gist link here.

Anything else do we need to know?
We deployed two cluster with same script, one has 120G of / partition and another has only 8G

------------- FEATURE REQUEST TEMPLATE --------------------

Describe IN DETAIL the feature/behavior/change you would like to see.
Feel free to provide a design supporting your feature request.

lifecyclrotten scheduled-to-close

Source

saravana-code

👍2

Most helpful comment

We managed to resize NVME root partition with this hook:

spec:
  hooks:
  - name: resize-nvme-rootfs
    roles:
    - Node
    manifest: |
      Type=oneshot
      ExecStart=/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'

stanvit on 17 Aug 2018

👍6

All 14 comments

The corresponding docs for increasement of a disk are here:
https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#changing-the-root-volume-size-or-type

But the main problem with your small disk size is your instance type - machineType: m5.large has currently problems with disk size - This issue should be hepful for you -https://github.com/kubernetes/kops/issues/3991

I would recommend to you: Change instance type from m5.large to m4.large and do a rolling upgrade. Helpful docs - https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#change-the-instance-type-in-an-instance-group

thomaspeitz on 4 Jul 2018

👍1

We managed to resize NVME root partition with this hook:

spec:
  hooks:
  - name: resize-nvme-rootfs
    roles:
    - Node
    manifest: |
      Type=oneshot
      ExecStart=/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'

stanvit on 17 Aug 2018

👍6

@tsupertramp I also have problems with resizing the root volume , but with t2 family

kops version: Version 1.10.0 (git-3b783df3b) (forked by spotinst.com)
kubectl version:

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

instance group manifest:

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-08-27T15:25:17Z
  labels:
    kops.k8s.io/cluster: frank***s.com
  name: es740_nodes
spec:
  image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
  machineType: t2.xlarge,t2.2xlarge
  maxSize: 5
  minSize: 0
  nodeLabels:
    kops.k8s.io/instancegroup: ***_nodes
  role: Node
  rootVolumeSize: 150
  subnets:
  - eu-central-1a
  - eu-central-1b
  - eu-central-1c

frank-bee on 28 Aug 2018

What happened after the commands executed?
I could see that the 'root or /" PARTITION size(not volume on AWS EBS) of node is only 8GB but the Volume size of node is 128GB , how to increase the partition ? I have live traffic on this node( but can go for restart but data should not loss)

We have this exact same issue. We're trying to use r5 instances for our nodes and we end up having a root device of only 8GB.

Also the device is called /dev/nve.... Not /dev/xdva.... with those instances now, unlike what they used to be with r4 instances. Even if the instance's EBS volume is configured to expose it as xvda in the Amazon Console.

We had to go back to r4 which fixed the issue for now until we can move on to r5 instances again (a bit cheaper price-wise than r4 but much better in RAM and CPU)

RedVortex on 13 Sep 2018

Same here.

igorvpcleao on 17 Sep 2018

We managed to resize NVME root partition with this hook:

spec:
  hooks:
  - name: resize-nvme-rootfs
    roles:
    - Node
    manifest: |
      Type=oneshot
      ExecStart=/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'

@stanvit What AMI and Instance type are you using? I am trying to launch a r5.4xlarge with AMI k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11 (ami-dbd611a6) and I can't seem to be able to run the growpart command successfully. Here is the output I get:

FAILED: failed to get CHS from /dev/nvme0n1p1
root@ip-10-20-234-228:/home/admin# growpart /dev/nvme0n1 1
attempt to resize /dev/nvme0n1 failed. sfdisk output below:
|
| Disk /dev/nvme0n1: 16709 cylinders, 255 heads, 63 sectors/track
| Old situation:
| Units: cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
|
|    Device Boot Start     End   #cyls    #blocks   Id  System
| /dev/nvme0n1p1   *      0+   1044-   1045-   8386560   83  Linux
| /dev/nvme0n1p2          0       -       0          0    0  Empty
| /dev/nvme0n1p3          0       -       0          0    0  Empty
| /dev/nvme0n1p4          0       -       0          0    0  Empty
| New situation:
| Units: sectors of 512 bytes, counting from 0
|
|    Device Boot    Start       End   #sectors  Id  System
| /dev/nvme0n1p1   *      4096 268430084  268425989  83  Linux
| /dev/nvme0n1p2             0         -          0   0  Empty
| /dev/nvme0n1p3             0         -          0   0  Empty
| /dev/nvme0n1p4             0         -          0   0  Empty
| Successfully wrote the new partition table
|
| sfdisk: BLKRRPART: Device or resource busy
| sfdisk: The command to re-read the partition table failed.
| Run partprobe(8), kpartx(8) or reboot your system now,
| before using mkfs
| sfdisk: If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
| to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
| (See fdisk(8).)
| Re-reading the partition table ...
FAILED: failed to resize
***** WARNING: Resize failed, attempting to revert ******
Re-reading the partition table ...
sfdisk: BLKRRPART: Device or resource busy
sfdisk: The command to re-read the partition table failed.
Run partprobe(8), kpartx(8) or reboot your system now,
before using mkfs
***** Appears to have gone OK ****

If I run these commands:
/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'

I get:
NOCHANGE: partition 1 is size 268425989. it cannot be grown

prichrd on 1 Oct 2018

@pric, we are basing our AMIs the same base image, k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11, but encrypt them with KMS (that should affect the operations in any way though). Our instance type is m5.large.

growpart never worked for us either, failing with the same error as you just posted. The output from your growpart-workaround command invocation suggests that the the partition had been resized earlier. What does your fdisk -l show? Did you try to run resize2fs /dev/nvme0n1p1?

stanvit on 2 Oct 2018

@stanvit thanks for the help. Actually I fell back on the stretch AMI and everything is properly sized now. According to Geojaz on this issue (https://github.com/kubernetes/kops/issues/3901), stretch is now safe to use.

prichrd on 2 Oct 2018

I resized my nodes from t2.large to c5.2xlarge and had the same issue. @stanvit's solution worked perfectly. Thanks so much!

nerdinand on 11 Oct 2018

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 9 Jan 2019

Experienced the same issue with machineType t3.large and kops v1.8
The workaround provided by @stanvit worked for me.

3minus1 on 4 Feb 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot on 6 Mar 2019

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot on 5 Apr 2019

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.