Thanks for submitting an issue! Please fill in as much of the template below as
you can.
------------- BUG REPORT TEMPLATE --------------------
What kops version are you running? The command kops version, will display
this information.
Version 1.8.0 (git-5099bc5)
What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.1", GitCommit:"f38e43b221d08850172a9a4ea785a86a3ffa3b3a", GitTreeState:"clean", BuildDate:"2017-10-11T23:16:41Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
What cloud provider are you using?
AWS
kops get --name my.example.com -o yaml to display your cluster manifest.apiVersion: kops/v1alpha2
kind: Cluster
metadata:
creationTimestamp: 2018-06-12T07:38:11Z
name: prod.example.com
spec:
api:
loadBalancer:
type: Public
authorization:
alwaysAllow: {}
channel: stable
cloudLabels:
Environment: Prod
Provisioner: kops
Role: node
Type: k8s
cloudProvider: aws
configBase: s3://k8s-example-clusters/prod.example.com
dnsZone: example.com
etcdClusters:
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-06-12T07:38:12Z
labels:
kops.k8s.io/cluster: prod.example.com
name: bastions
spec:
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: t2.micro
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: bastions
role: Bastion
subnets:
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-06-12T07:38:11Z
labels:
kops.k8s.io/cluster: prod.example.com
name: master-ap-south-1a
spec:
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: t2.small
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-ap-south-1a
role: Master
subnets:
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-06-12T07:38:12Z
labels:
kops.k8s.io/cluster: prod.example.com
name: nodes
spec:
image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: m5.large
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: nodes
role: Node
subnets:
-v 10 flag.------------- FEATURE REQUEST TEMPLATE --------------------
Describe IN DETAIL the feature/behavior/change you would like to see.
Feel free to provide a design supporting your feature request.
The corresponding docs for increasement of a disk are here:
https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#changing-the-root-volume-size-or-type
But the main problem with your small disk size is your instance type - machineType: m5.large has currently problems with disk size - This issue should be hepful for you -https://github.com/kubernetes/kops/issues/3991
I would recommend to you: Change instance type from m5.large to m4.large and do a rolling upgrade. Helpful docs - https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#change-the-instance-type-in-an-instance-group
We managed to resize NVME root partition with this hook:
spec:
hooks:
- name: resize-nvme-rootfs
roles:
- Node
manifest: |
Type=oneshot
ExecStart=/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'
@tsupertramp I also have problems with resizing the root volume , but with t2 family
kops version: Version 1.10.0 (git-3b783df3b) (forked by spotinst.com)
kubectl version:
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
instance group manifest:
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-08-27T15:25:17Z
labels:
kops.k8s.io/cluster: frank***s.com
name: es740_nodes
spec:
image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: t2.xlarge,t2.2xlarge
maxSize: 5
minSize: 0
nodeLabels:
kops.k8s.io/instancegroup: ***_nodes
role: Node
rootVolumeSize: 150
subnets:
- eu-central-1a
- eu-central-1b
- eu-central-1c
- What happened after the commands executed?
I could see that the 'root or /" PARTITION size(not volume on AWS EBS) of node is only 8GB but the Volume size of node is 128GB , how to increase the partition ? I have live traffic on this node( but can go for restart but data should not loss)
We have this exact same issue. We're trying to use r5 instances for our nodes and we end up having a root device of only 8GB.
Also the device is called /dev/nve.... Not /dev/xdva.... with those instances now, unlike what they used to be with r4 instances. Even if the instance's EBS volume is configured to expose it as xvda in the Amazon Console.
We had to go back to r4 which fixed the issue for now until we can move on to r5 instances again (a bit cheaper price-wise than r4 but much better in RAM and CPU)
Same here.
We managed to resize NVME root partition with this hook:
spec: hooks: - name: resize-nvme-rootfs roles: - Node manifest: | Type=oneshot ExecStart=/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'
@stanvit What AMI and Instance type are you using? I am trying to launch a r5.4xlarge with AMI k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11 (ami-dbd611a6) and I can't seem to be able to run the growpart command successfully. Here is the output I get:
FAILED: failed to get CHS from /dev/nvme0n1p1
root@ip-10-20-234-228:/home/admin# growpart /dev/nvme0n1 1
attempt to resize /dev/nvme0n1 failed. sfdisk output below:
|
| Disk /dev/nvme0n1: 16709 cylinders, 255 heads, 63 sectors/track
| Old situation:
| Units: cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
|
| Device Boot Start End #cyls #blocks Id System
| /dev/nvme0n1p1 * 0+ 1044- 1045- 8386560 83 Linux
| /dev/nvme0n1p2 0 - 0 0 0 Empty
| /dev/nvme0n1p3 0 - 0 0 0 Empty
| /dev/nvme0n1p4 0 - 0 0 0 Empty
| New situation:
| Units: sectors of 512 bytes, counting from 0
|
| Device Boot Start End #sectors Id System
| /dev/nvme0n1p1 * 4096 268430084 268425989 83 Linux
| /dev/nvme0n1p2 0 - 0 0 Empty
| /dev/nvme0n1p3 0 - 0 0 Empty
| /dev/nvme0n1p4 0 - 0 0 Empty
| Successfully wrote the new partition table
|
| sfdisk: BLKRRPART: Device or resource busy
| sfdisk: The command to re-read the partition table failed.
| Run partprobe(8), kpartx(8) or reboot your system now,
| before using mkfs
| sfdisk: If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
| to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
| (See fdisk(8).)
| Re-reading the partition table ...
FAILED: failed to resize
***** WARNING: Resize failed, attempting to revert ******
Re-reading the partition table ...
sfdisk: BLKRRPART: Device or resource busy
sfdisk: The command to re-read the partition table failed.
Run partprobe(8), kpartx(8) or reboot your system now,
before using mkfs
***** Appears to have gone OK ****
If I run these commands:
/bin/sh -c 'test -b /dev/nvme0n1p1 && growpart-workaround /dev/nvme0n1 1 && resize2fs /dev/nvme0n1p1 || true'
I get:
NOCHANGE: partition 1 is size 268425989. it cannot be grown
@pric, we are basing our AMIs the same base image, k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11, but encrypt them with KMS (that should affect the operations in any way though). Our instance type is m5.large.
growpart never worked for us either, failing with the same error as you just posted. The output from your growpart-workaround command invocation suggests that the the partition had been resized earlier. What does your fdisk -l show? Did you try to run resize2fs /dev/nvme0n1p1?
@stanvit thanks for the help. Actually I fell back on the stretch AMI and everything is properly sized now. According to Geojaz on this issue (https://github.com/kubernetes/kops/issues/3901), stretch is now safe to use.
I resized my nodes from t2.large to c5.2xlarge and had the same issue. @stanvit's solution worked perfectly. Thanks so much!
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale
Experienced the same issue with machineType t3.large and kops v1.8
The workaround provided by @stanvit worked for me.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
@fejta-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue with/reopen.
Mark the issue as fresh with/remove-lifecycle rotten.Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Most helpful comment
We managed to resize NVME root partition with this hook: