We've had a number of problems with ephemeral storage on EC2, not least that newer instance types don't include them (e.g. kubernetes/kubernetes#23787). Also symlinking /mnt/ephemeral seems to confuse the garbage collector.
We should figure out how to ensure that we have a big enough root disk, maybe how to re-enable btrfs, and then if there is anything we can do with the instance storage if we're otherwise not going to use it (maybe hostVolumes? Or some sort of caching service?)
We doing aufs or btrfs?
For the docker instances we're doing aufs or overlay. We should revisit that as other approaches get more testing.
For using instance storage, we should use whatever is appropriate for whatever we decide to use it for :-)
Then what should we use.
Is this still on the roadmap?
Adding a +1 on the need for exposing instance storage - the new AWS i3 instances bench at 18GB/sec+ on instance storage (NVMe-based), which is substantially higher than EBS.
We also would like to expose the instance storage, also for i3 instances. I'm not sure I agree that AWS is moving away from instance storage -- they are just moving them to a new style of instance.
Not only do the i3s have amazing iops performance, the d2 instance class has the most cost-efficient storage available on AWS... 6TB for $150 a month is almost as cheap as s3.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Prevent issues from auto-closing with an /lifecycle frozen comment.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale
+1 for supporting instance storage. I3's performance is great
/lifecycle frozen
/remove-lifecycle stale
I realise its probably implicit; but for reliability having the kubelet working space (logs, tmp) on a different volume to the pod storage as well as container writable layers is really important - whatever is done here, please do preserve that (at least as an option).
Would _really_ like to see this incorporated into kops sooner rather than later, especially now that local PVs are beta in k8s 1.10.
Are there any workarounds how to use the instance storage for pods that benefit from the extra speed of the storage optimized instances?
Would love to see this issue get some love. The NVMe instance storage on i3 instances is so fast and useful. I think that many of us would see a big jump in the utility of our instances if this was available for emptyDir and Docker image storage.
I'm interested in helping out if someone could get me pointed in the right direction. I'm pretty new to the kops codebase.
@justinsb asked on Slack for a use case for this issue, so here's mine:
We are doing CI on Kubernetes, running our software builds in pods that leverage emptyDir scratch directories for code fetches and compiles. It's very I/O intensive, so we chose i3.large instances. Unfortunately, without access to the NVMe disk, these builds are slow as molasses. Without NVMe access, there's no reason to use i3 instance with kops/Kubernetes.
We really need these volumes and I'm willing to take a stab at implementing this but I need someone to point me in the right direction because I'm not very familiar with the kops codebase.
Thanks.
Another usecase:
We have a Kafka cluster running on kubernetes. Kafka takes care of data replication. We stream large amounts of data onto this kafka cluster. The bottleneck is disk bandwidth.
--> We want i3 instances with NVMe to maximize our performance.
Our use case is similar to Hermain's in that we are running a pod-based Cassandra cluster and also want to maximize disk performance by using the locally attached storage rather than ebs volumes.
So, would /var/lib/kubelet/pods be the place to mount this drive? or /var/lib, so that it hosts both /var/lib/docker and /var/lib/kubelet/pods (where emptyDir volumes live).
My use case is just removing EBS from my cluster -- I'd like to just use instance store as the root filesystem since there is no need for EBS, nothing useful is persisted only there.
Excited bandwagon joiner here, I'm running spark on kubernetes and am trying to remove my swapfile read/write as a bottleneck with the new r5d instances. I see that support was added for the instance type in a fairly cursory way to the master branch, but unless I'm missing something it doesn't seem to affect anything other than performing the block mapping. I'd really love to see these disks get mounted in a way that my pods will automatically use them as scratch disk, and ideally I'd also like to get those disks configured in a RAID 0 configuration.
@chrissnell I think /var/lib/kubelet/pods would do what we need. As any application running in kubernetes can take advantage. What reasons are there to mount var/lib instead? I understand @scopej who doesn't want any ebs at all. But if you have an ebs you might as well use it for everything except /var/lib/kubelet/pods right?
Joining the chorus here.
Running stateless apps on a kubernetes cluster. We have no need to store state in said apps. The ones we do need (grafana, prometheus, etc) we're using Statefulsets to do the trick. That pretty much renders our need for EBS close to zero.
Our cluster today runs on Container Linux by CoreOS 1800.6.0 (Rhyolite) but would gladly change it do debian strech if support comes for it first.
In addition to the kubelet reliability issue https://github.com/kubernetes/kops/issues/429#issuecomment-378774435 which accounts for maybe 10% of our node failures, there is a second use case that hasn't been presented thus far, which is a variation on the local-storage-fast use case.
And thats running glusterfs/ceph on the kubelet nodes.
We're considering running such a thing, and I'm speculating as we haven't got into deep design yet, but something like:
EBS for data volume on a given storage node
local NVME for write-through cache and possibly hot object read cache (in the event that local NVME storage size exceeds main memory).
Why would we do this? Why does it make sense given EBS's brilliant performance?
Availability: EBS volumes in EC2 fail - we've had multiple outages for stateful singleton components due to EC2 hypervisor failures, just in the last year. (When a hypervisor fails, the EBS volumes on it cannot be remounted elsewhere for some time - we've seen 45 minutes) - my understanding is that EBS needs to fence the hypervisor to be sure no writes will be submitted from the hypervisor's EBS driver back to the EBS volumes assigned to it, and until that fencing is confirmed, the EBS volume cannot be reattached elsewhere, even with the force feature in the EBS API.
As such, being able to tolerate such failures means either retooling these singletons (which is sometimes a very big job :)) or having a storage driver that doesn't require the hypervisor to be fenced.
AWS NFS is also an option there of course, and yes, we're reviewing it :)
Hoping that @justinsb can chime in here and point me and the others in the right direction. This issue is super critical for us and blocking a big project and I want to give it a shot but not sure where to start in the codebase.
So, just jumping back in here with what I talked about in office hours earlier. At least for the r5d.4xlarge instances, adding this to the top of my user_data file works:
```sudo apt-get -y install mdadm
sudo mdadm --create --verbose /dev/md0 --level=0 --name=empty_dir --raid-devices=2 /dev/nvme1n1 /dev/nvme2n1
sudo mkfs.ext4 -L empty_dir /dev/md0
sudo mkdir /var/lib/docker/overlay
sudo mount LABEL=empty_dir /var/lib/docker/overlay
A few notes here, you need to map the ephemeral volumes correctly for your instance to pick them up. In terraform that looks like this
resource "aws_launch_configuration" "default-spark-cluster" {
name_prefix = "default.spark.cluster-"
image_id = "ami-050a5ee88521c50e4"
instance_type = "r5d.4xlarge"
key_name = "${aws_key_pair.kubernetes-spark-cluster-4e533df7fa5cd8b4500a7bb98d719b11.id}"
iam_instance_profile = "${aws_iam_instance_profile.nodes-spark-cluster.id}"
security_groups = ["${aws_security_group.nodes-spark-cluster.id}"]
associate_public_ip_address = false
user_data = "${file("${path.module}/data/aws_launch_configuration_default.spark.cluster_user_data")}"
ephemeral_block_device {
virtual_name = "ephemeral0"
device_name = "/dev/sdb"
}
ephemeral_block_device {
virtual_name = "ephemeral1"
device_name = "/dev/sdc"
}
ephemeral_block_device {
virtual_name = "ephemeral2"
device_name = "/dev/sdd"
}
lifecycle = {
create_before_destroy = true
}
enable_monitoring = false
}
``
Where the first ephemeral block device is the 8 GB root device and the second two are ~280 GB drives, all NVMe. This is working with the latest k8s-1.8-stretch imagek8s-1.8-debian-stretch-amd64-hvm-ebs-2018-08-17 (ami-050a5ee88521c50e4)because in stretch the default debconf frontend is some form of non-interactive that accepts-y.
This actually isn't even ideal though, I'd love for the large block devices to be mounted as the root volume, but it's unclear to me how to do that, not sure if it would have to be baked into a custom AMI. Additionally, I couldn't mount it any lower than/var/lib/docker/overlay(I would've been happier with/var), but I wasn't able to work through issues of blowing up necessary files and directories on mounting (I tried moving them before the mounting and moving them back but couldn't get that to work consistently either, may have run into a race condition).
Additionally, this obviously would need to be abstracted to support machines with different numbers of ephemeral disks, and a more general solution to catch all the desired storage would be good (I figured out that my spark jobs weren't using/var/lib/kubelet/podsas storage, everything was in/var/lib/docker/overlay`)
cc @justinsb
I'm still pretty new to kops development so I'm hoping that someone can set me straight here.
It appears that the proper place to do the management of the devices is in nodeup, specifically the AWS-specific code here: https://github.com/kubernetes/kops/tree/master/upup/pkg/fi/cloudup/awsup
The instance types and their ephemeral storage (if any) are defined here: https://github.com/kubernetes/kops/blob/master/upup/pkg/fi/cloudup/awsup/machine_types.go
It feels like nodeup should detect the presence of ephemeral disks and issue the mkfs.ext4 commands on the volumes. I'm not so sure how they would be mounted. Would this be defined in the kops InstanceGroup resource spec? Perhaps we could have something like this:
apiVersion: kops/v1alpha2
kind: InstanceGroup
[...]
spec:
machineType: m5d.4xlarge
ephemeralDisks:
- ephemeral0
mountPoint: /var/lib
- ephemeral1
mountPoint: /scratch
Other thoughts....
The use of these ephemeral disks for system directories like /var/lib is super tricky and the implementation varies from system to system, usually involving use of chroot. It's my opinion that this should be out of scope for the initial implementation of ephemeral functionality. Using /var/lib/kubelet/pods or /var/lib/docker, however, should be supported. For /var/lib/kubelet, the mount would have to happen before kubelet is started. For /var/lib/docker, we should probably be stopping docker before the mount and starting it again afterwards, especially for systems that enable docker by default (CoreOS).
I think that software RAID like @thejosephstevens is doing should be out of scope for the initial implementation. I think that having this capability is a great idea but significantly complicates the first implementation.
I'm also wondering how to set up ceph in my kops-managed k8s cluster. It seems like these i3 instances would be great but I wouldn't want kops to format them for me, since ceph would want to take over the device for me.
Also I've seen mongo recommends xfs as the underlying filesystem rather than ext4.
So, I think it would probably be something that should be configurable. I think ideally instead of having any default / automatic behavior here, adding a configuration section to the instance group that specifies what to do with extra volumes, e.g. whether or not to format them, what filesystem to use if so, and what path to mount them at, if any. On startup the instance would examine this configuration and format/mount the disks as specified.
At this point, though, perhaps instead of actual new configuration options, a simpler solution might just be add some examples to the docs how to use hooks to setup these volumes to your taste, you just have to run a few commands to format and mount the disks at the location of your choice, right?
I think volume setup shouldn't be a part of kops unless bringing the node into the cluster actually requires the setup part. If you want to use ephemeral storage for the docker directory then that should be part of kops. But for using the ephemeral storage as a ceph node, it should not be in kops. For applications like ceph or mongo, you should probably just run a daemonset which mounts a hostPath and formats it directly, then exposes it. It's a more generic and higher level way to configure your hosts.
As a workaround, we used the additionalUserData in the IG spec to instruct cloud-init to place the ephemeral node storage on a c3.large instance on a given path, like this example:
spec:
additionalUserData:
- name: local-storage.txt
type: text/cloud-config
content: |
#cloud-config
mounts:
- [ xvdc, /var/local/mnt/lv00, "auto", "defaults,nofail", "0", "0" ]
- [ xvdd, /var/local/mnt/lv01, "auto", "defaults,nofail", "0", "0" ]
Then used the local storage provisioner (https://github.com/kubernetes-incubator/external-storage/tree/master/local-volume), specifying the parent /var/local/mnt path as the discovery directory, making the ephemeral storage available to pods:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-1526796f 14Gi RWO Delete Available local-storage 1m
local-pv-30334a4a 14Gi RWO Delete Available local-storage 1m
local-pv-38820a0b 14Gi RWO Delete Available local-storage 1m
local-pv-73d1578a 14Gi RWO Delete Available local-storage 1m
...
Although the Node still gets an EBS volume as its root from kops, at least the fast local storage can be used for I/O intensive workloads, satisfying our use-case.
Just a note on the complications around software raid that I ran into, it was fairly trivial on the newest k8s stretch AMIs. I did run into issues on the jessie images because of a debconf setting calling for UI interaction for post-install hooks which meant mdadm splashed a blue config screen the first time I tried this manually. I was actually unable to successfully change this configuration in bootstrap prior to installing mdadm (although I'm sure I was missing something). Outside of unusual install hooks though, if your node is fully ephemeral and the storage is fully ephemeral, you don't need to consider your reboot configuration settings, which was the only other option that there seemed to be. I think there's obviously testing to be done as to how this would consistently operate on various OS's (I didn't need to solve for ubuntu, coreOS, amazon linux, etc), but the process itself was pretty trivial. I haven't run into any cases where the RAIDing process has failed. I have been using this for about a month and a half now and it really just works.
@thejosephstevens did you try setting DEBIAN_FRONTEND=noninteractive in your test?
I've also had to set UCF_FORCE_CONFFNEW=YES, and at least once had to set a zillion apt-get flags to say really, really, NO REALLY, use non-interactive setup. These were on an Ubuntu base, so YMMV.
apt-get --no-install-recommends --fix-broken --fix-missing --assume-yes --auto-remove --quiet -o DPkg::options::="--force-confdef" -o DPkg::options::="--force-confnew" install ...
Yeah, tried that to no success. It ended up being a non-issue though once I moved to the most recent kops-1.10 stretch image (although I normally wouldn't advocate changing AMIs just to get different OS default settings). Caveat to my earlier posts though, software raid in one of my environments started freaking out (I ran into the md127 bug), so I ended up de-RAIDing my worker nodes. Without drilling further into that bug (not a current priority for me), I can't recommend my RAID setup from above. The non-RAIDed local drives are still working great though, and I'd be perfectly happy if kops built support for a mapping of local drives to directory paths and a file-system choice (or just default ext4). I think the main trick there is navigating the bootstrap priority so you don't get any races and blow out system data anywhere.
the suggestions in this post worked for me (md127 bug). create an array entry in /etc/mdadm/mdadm.conf and run update-initramfs -u
This is what i'm using, not sure it is the most elegant way but it's working:
spec:
additionalUserData:
- content: |
#cloud-config
repo_update: true
packages:
- mdadm
runcmd:
- sudo mdadm --create --verbose /dev/md0 --level=0 --name=0 --raid-devices=2 dev/nvme0n1 /dev/nvme1n1
- sudo mkfs.ext4 -L 0 /dev/md0
- sudo mkdir /data-1
- sudo mount LABEL=0 /data-1
- [ sudo, sh, -c, 'mdadm -Db /dev/md0 >> /etc/mdadm/mdadm.conf' ]
- echo "ARRAY /dev/md0 $(grep -oE 'UUID=[0-9a-z]+:[0-9a-z]+:[0-9a-z]+:[0-9a-z]+' /etc/mdadm/mdadm.conf)" > /tmp/uuid
- [ sudo, sh, -c, "echo $(cat /tmp/uuid) >> /etc/mdadm/mdadm.conf" ]
- [ sudo, sh, -c, "sed '/name/d' /etc/mdadm/mdadm.conf > /tmp/uuid" ]
- [sudo, sh, -c, "sudo sh -c "echo $(cat /tmp/uuid) | sudo tee /etc/mdadm/mdadm.conf > /dev/null"]
- sudo update-initramfs -u
name: local-storage.txt
type: text/cloud-config
I used ideas from this thread to get it working. This is for a single-volume NVMe drive as found on an AWS EC2 m5d.xlarge instance:
apiVersion: kops/v1alpha2
kind: InstanceGroup
spec:
additionalUserData:
- name: 00-prep-local-storage.sh
type: text/x-shellscript
content: |
#!/bin/sh
/sbin/mkfs.ext4 /dev/nvme1n1
- name: 02-mount-disks.sh
type: text/x-shellscript
content: |
#!/bin/sh
mkdir /scratch
/bin/mount /dev/nvme1n1 /scratch
mkdir /scratch/pods
mkdir /scratch/docker
mkdir /var/lib/kubelet
systemctl stop docker
rm -rf /var/lib/docker
ln -s /scratch/pods /var/lib/kubelet/
ln -s /scratch/docker /var/lib/
systemctl start docker
The downside of this approach is that the mkfs(8) is slow and adds a considerable amount of time to instance launching--at least 3-4 minutes.
FWIW, we've moved to a systemd-based solution now to avoid messing with docker and kube's storage after they start running, just got it running today.
hooks:
- name: volume-mount
roles:
- Node
before:
- kubelet.service
- docker.service
- logrotate.timer
- docker-healthcheck.timer
- kubernetes-iptables-setup.service
- docker-healthcheck.service
- logrotate.service
manifest: |
User=root
Type=oneshot
ExecStartPre=/bin/bash -c 'mkfs.ext4 -L docker_dir /dev/nvme1n1'
ExecStartPre=/bin/bash -c 'mkdir -p /var/lib/docker/overlay'
ExecStartPre=/bin/bash -c 'mount LABEL=docker_dir /var/lib/docker/overlay'
ExecStartPre=/bin/bash -c 'mkfs.ext4 -L empty_dir /dev/nvme2n1'
ExecStartPre=/bin/bash -c 'mkdir -p /var/lib/kubelet/pods'
ExecStart=/bin/bash -c 'mount LABEL=empty_dir /var/lib/kubelet/pods'
There's absolutely more work to be done on this, I'd like better conditionality on it so this could just be applied to all nodes without just resulting in a failing systemd unit on differently configured machines, but I think there may be a model here to extend that doesn't require as much finagling around processes that depend on the potential mount points. I'm pretty sure this doesn't handle system restart though, so I wouldn't buy it wholesale.
I am using a user data solution to mount /var/lib/docker on a c5d ephemeral volume like what is posted above (thank you @chrissnell ). I skipped mounting /var/lib/kubelet/pods because kubelet cannot delete the container directories cannot delete directory /var/lib/kubelet/pods/ed35ba56-1595-11e9-a73a-0210bcc711f2: it is a mount point. But docker is running fine on the ephemeral volume and the node is in service however persistent volume claims are not mounting.
Warning FailedMount 1m (x4 over 8m) kubelet, ip-172-24-105-243.us-west-2.compute.internal Unable to mount volumes for pod "volume-writer-7758ddcbcf-2554f_pvc-test(ba9f0953-1836-11e9-aeff-061c5ed62f3e)": timeout expired waiting for volumes to attach or mount for pod "pvc-test"/"volume-writer-7758ddcbcf-2554f". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-7nhh6]
PVC's work on c5 types instances. I thought it was because of the device name mismatch from AWS api and the linux instance
# /var/log/daemon.log
Jan 14 21:09:35 ip-172-24-105-243 kubelet[1869]: I0114 21:09:35.719693 1869 operation_generator.go:486] MountVolume.WaitForAttach entering for volume "pvc-ba8767a4-1836-11e9-8700-028bb95a9f1c" (UniqueName: "kubernetes.io/aws-ebs/aws://us-west-2c/vol-00744073e67ea96fd") pod "volume-writer-7758ddcbcf-2554f" (UID: "ba9f0953-1836-11e9-aeff-061c5ed62f3e") DevicePath "/dev/xvdbc"
root@ip-172-24-105-243:/home/admin# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:1 0 128G 0 disk
鈹斺攢nvme0n1p1 259:2 0 8G 0 part /
nvme1n1 259:0 0 186.3G 0 disk /scratch
nvme2n1 259:3 0 1G 0 disk
but devices have the nvme names on c5's and pvc work there. Not sure what's going on with that.
This is k8s version 1.10.7
Update: PVC's are working on c5d's in another cluster where I'm running a newer kops debian ami.
PVC's work on c5d on ami kope.io/k8s-1.11-debian-stretch-amd64-hvm-ebs-2018-08-17
but not on kope.io/k8s-1.10-debian-stretch-amd64-hvm-ebs-2018-08-17
It seems mkfs.ext4 hangs on default image for kops 1.12.1. It might be related to https://github.com/bargees/barge-os/issues/76 but not quite sure.
Is there any plan to address this?
Is there a good step-by-step guide available for using NVME as PVCs in pods.
I read the discussion 5 times. Still unsure what to do?
If you're talking about the ephemeral on-host disks like in AWS I wouldn't recommend it for anything other than scratch disk. The way that I did it in my example above was to mount the disks at the paths in the OS that docker uses for basic container process storage (/var/lib/docker/overlay), and the empty_dir (/var/lib/kubelet/pods). Once you have storage mapped at those locations you can access it by adding this to your deploy template
volumeMounts:
- name: scratch
mountPath: /tmp
volumes:
- name: scratch
emptyDir: {}
Just be aware that all the contents of these disks will be lost if you lose the machine, so don't use it for anything you want to persist (prometheus metrics, logs, whatever).
Given my experiences with managing these disks in AWS, it's not clear to me that it was at all worth the effort. We spent a good amount of time debugging issues at runtime (see my mention of md127 further up) and just debugging the basic bootstrap process and I'm not sure that we got a meaningful performance bump on read/write speeds (our tests of disk perf seemed to indicate that these disks did not perform up to the spec of NVMe drives we were familiar with).
Hi,
we are using the NVMe drive provided by AWS with some instances, for now I use the following KOPS hook to mount the NVMe & to assing pods & containers onto it:
hooks:
- name: nvme
roles:
- Node
before:
- kubelet.service
- docker.service
- logrotate.timer
- docker-healthcheck.timer
- kubernetes-iptables-setup.service
- docker-healthcheck.service
- logrotate.service
manifest: |
User=root
Type=oneshot
ExecStartPre=/bin/bash -c 'mkfs -t xfs /dev/nvme1n1'
ExecStartPre=/bin/bash -c 'mkdir /scratch'
ExecStartPre=/bin/bash -c 'mount /dev/nvme1n1 /scratch'
ExecStartPre=/bin/bash -c 'mkdir /scratch/pods'
ExecStartPre=/bin/bash -c 'mkdir /scratch/docker'
ExecStartPre=/bin/bash -c 'rm -rf /var/lib/docker'
ExecStartPre=/bin/bash -c 'ln -s /scratch/pods /var/lib/kubelet/'
ExecStart=/bin/bash -c 'ln -s /scratch/docker /var/lib/'
This does work, and we saw improvement of our performances thanks to this local NVMe.
Anyway, we are now facing another issue regarding disk pressure.
Indeed, kubelet looks for disk space from its file system (see https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/), so it look for space in / and not in /scratch (NVMe mounted point).
Basically, if the / partition goes over 90% space disk usage, then all pod are evicted from the node even if the NVMe partition still have a lot of space.
Does anyone know if it's possible to fully move Kubelet & Docker onto the NVMe to avoid Kubelet polling space disk from / ?
I have some containers that need fast temporary storage (around 100GB) we were using gp2 type AWS EBS volumes however they would quickly run out of burst balance. Local instance storage seemed like the perfect replacement as it would reduce the spend on slow EBS volumes and provide fast temporary storage. However I quickly found that Kubernetes doesn't seem to have quite implemented a way to use the local instance storage yet.
I wanted to use emptyDir volumes on my containers that needed fast local temporary storage so I tried moving /var/lib/kubelet to local instance storage by specifying it in the instance group configuration:
volumeMounts:
- device: /dev/nvme1n1
filesystem: ext4
path: /var/lib/kubelet
However like previous posters have mentioned I started seeing issues with disk pressure and the pods being evicted even though the local instance storage had only used 35% capacity.
Instead we have now switched to using a hostPath volume with an initContainer to set the correct permissions in the host directory. Our kops instance group now looks like this:
volumeMounts:
- device: /dev/nvme1n1
filesystem: ext4
path: /mnt/localssd
Relevant container configuration:
initContainers:
- name: fix-tmp-perms
image: busybox
securityContext:
runAsUser: 0
command: ["sh", "-c", "chown -R 201:201 /tmp/worker-temp; chmod 1777 /tmp/worker-temp; rm -rf /tmp/worker-temp/*"]
volumeMounts:
- name: worker-temp
mountPath: /tmp/worker-temp
volumes:
- name: worker-temp
hostPath:
path: /mnt/localssd/worker-temp
type: DirectoryOrCreate
What would be nice is to be able to specify the root volume in kops to use the local instance storage rather than having to be backed by EBS. I think this makes sense as the EBS volume is only used for temporary storage and is deleted when the instance is deleted.
However like previous posters have mentioned I started seeing issues with disk pressure and the pods being evicted even though the local instance storage had only used 35% capacity.
@kxesd most likely you will have to wait for kubernetes&kops 1.19. The root cause is a bug in cAdvisor that was fixed only recently. This made kubernetes incorrectly detect the ImageFS partition and with it the random partition usage.
For more info, check https://github.com/google/cadvisor/pull/2586.
Most helpful comment
Adding a +1 on the need for exposing instance storage - the new AWS i3 instances bench at 18GB/sec+ on instance storage (NVMe-based), which is substantially higher than EBS.