đź‘‹
Unclear as I am filing this issue on behalf of a user running Nomad on DigitalOcean infrastructure.
OS: most likely Linux
Environment: DigitalOcean VMs (aka Droplets)
A user of DigitalOcean’s CSI driver (which I happen to maintain) opened a support ticket with DigitalOcean saying that they could not use volumes managed by a self-hosted Kubernetes cluster running on DigitalOcean infrastructure. They used DigitalOcean’s CSI driver in version v1.3.0 and saw the following error when our NodePublishVolume implementation was invoked (pardon the output format -- we employ structured logging in our driver):
time="2020-06-24T16:59:54Z"
level=info
msg="mounting the volume"
host_id=<REDACTED>
method=node_publish_volume
mount_options="[bind]"
region=fra1
source_path=/dev/sda
staging_target_path=/csi/staging/redis/rw-block-device-single-node-writer
target_path=/csi/per-alloc/7dd9b6f1-1626-cbf1-a917-a505ec7af1b2/redis/rw-block-device-single-node-writer
version=latest
volume_id=<REDACTED>
volume_mode=block
time="2020-06-24T16:59:54Z"
level=error
msg="method failed"
error="rpc error: code = Internal desc = failed to create target file for raw block bind mount: open /csi/per-alloc/7dd9b6f1-1626-cbf1-a917-a505ec7af1b2/redis/rw-block-device-single-node-writer: is a directory"
host_id=197333792
method=/csi.v1.Node/NodePublishVolume
region=fra1
version=latest
The most important part is the error message:
rpc error: code = Internal desc = failed to create target file for raw block bind mount: open /csi/per-alloc/7dd9b6f1-1626-cbf1-a917-a505ec7af1b2/redis/rw-block-device-single-node-writer: is a directory
What is happening here is that the driver was asked to use a volume in raw block mode. The way this is implemented in our driver is that we create a file under the target path and afterwards bind-mount the device into it. The former step fails with Nomad as the CO (container orchestrator) because the target path already exists as a directory.
Here’s what the spec has to say about the target_path parameter on NodePublishVolume:
// The path to which the volume will be published. It MUST be an
// absolute path in the root filesystem of the process serving this
// request. The CO SHALL ensure uniqueness of target_path per volume.
// The CO SHALL ensure that the parent directory of this path exists
// and that the process serving the request has `read` and `write`
// permissions to that parent directory.
// For volumes with an access type of block, the SP SHALL place the
// block device at target_path.
// For volumes with an access type of mount, the SP SHALL place the
// mounted directory at target_path.
// Creation of target_path is the responsibility of the SP.
// This is a REQUIRED field.
string target_path = 4;
Note these specific parts of the description:
// The CO SHALL ensure that the parent directory of this path exists [...].
// Creation of target_path is the responsibility of the SP.
I reached out to the SIG Storage folks on the Kubernetes Slack (who are involved with the spec work) to confirmed that the spec in this regard should be interpreted as _“the CO MUST NOT create the target_path”_. So it looks like Nomad might be doing too much here.
I briefly considered changing our CSI driver to address this scenario by deleting and recreating the target path. However, I felt somewhat hesitant on this move as it seems a bit invasive / disruptive to me and could mess with the expectations that the CO may have. Overall, I am more inclined to resolving the matter in a fashion that can be considered most compliant with the spec.
That said, I'm more than happy to discuss options if you feel this should be tackled in any particular way.
Thanks!
Unfortunately I'm not familiar enough with Nomad to reproduce things on my own. My guess is that the problem should be reproducible though by creating a Nomad cluster on DigitalOcean infrastructure and trying to use a volume in block access mode.
Thanks for opening this @timoreimann !
I briefly considered changing our CSI driver to address this scenario by deleting and recreating the target path. However, I felt somewhat hesitant on this move as it seems a bit invasive / disruptive to me and could mess with the expectations that the CO may have. Overall, I am more inclined to resolving the matter in a fashion that can be considered most compliant with the spec.
Totally agreed. Honestly, this makes life a little bit easier on us anyways as this has been a source of difficulty when handling errors. It's easy enough to remove, but I'll want to verify that the other common SPs that we test with are also doing the right thing per the spec as the DO provider does.
@tgross I am testing the Openstack Cinder CSI plugin right now and just ran into this exact same issue.
E0707 21:20:45.750781 1 utils.go:83] GRPC error: rpc error: code = Internal desc = Error in making file open /csi/per-alloc/d4983594-027f-458f-f421-4442c3ddf318/gitlab_cinder/rw-block-device-single-node-writer: is a directory
W0707 21:20:47.890594 1 mount_helper_common.go:65] Warning: "/csi/per-alloc/d4983594-027f-458f-f421-4442c3ddf318/gitlab_cinder/rw-block-device-single-node-writer" is not a mountpoint, deleting
W0707 21:20:48.014727 1 mount_helper_common.go:33] Warning: Unmount skipped because path does not exist: /csi/staging/gitlab_cinder/rw-block-device-single-node-writer
I'd say not creating the mount points for block is the correct course of action here. If needed, I can provide replication steps for recreating the issue using the cloud-provider-openstack cinder CSI driver .
Yup, agreed. It's definitely on our stack of CSI bug fixes before we can mark CSI as GA.
Hey guys,
is there any pullreq/patch to test or any workaround, that I can use?
CSI appears to be nonfunctional on multiple platforms. I have tried Cinder, Longhorn, Rook, and MooseFS, all getting similar errors to @rigrassm. Experimentation with CSI is not possible in the current state, is there a workaround coming soon?
It's on our stack of CSI bug fixes before we can mark CSI as GA. I can't give you an exact timeline other than that it's a high priority for the team and for me personally.
Had a chance to take a first pass at this today. The code in question is (volumeManager) publishVolume. Here's what I've got so far:
read-only vs read-write), which we shouldn't be but in practice hasn't mattered because of scheduler constraints.So there's two bugs here:
NodePublishVolume looks to be that we either shouldn't be creating the per-usage portion of the path or we should be extending it before passing it into the plugin. My suspicion for why we haven't hit this previously is that we don't have good coverage of raw block devices, so that code path isn't getting tested. That'd been a somewhat intentional decision (see https://github.com/hashicorp/nomad/issues/7139) but we'll correct that deficiency.NodeStageVolume we should drop the usage portion of the path to belt-and-suspenders our scheduler checks.The bug fixes themselves are pretty trivial so most of the work to do is testing to make sure there's no regressions from that. I'll tackle that tomorrow.
I've opened https://github.com/hashicorp/nomad/pull/8505 with the patch but need to complete regression testing.
@tgross awesome! I'll get a build with the fix running tomorrow and test out the cinder CSI driver.
@tgross
The fix in #8505 Looks good for Openstack CSI
└──╼ # mount | grep csi
devtmpfs on /tmp/NomadClient324171302/csi/node/cinder-csi/per-alloc/0c72d91f-b700-282b-dd5a-da5a4c07808d/influxdata/rw-block-device-single-node-writer type devtmpfs (rw,nosuid,size=4078792k,nr_inodes=1019698,mode=755)
Looking good on DO as well:
# mount | grep csi
udev on /tmp/nomad-csi/client/csi/monolith/digitalocean/per-alloc/bb966fa2-5970-2c60-1635-4f81dbb5a2ab/csi-test-volume/rw-block-device-single-node-writer type devtmpfs (rw,nosuid,relatime,size=4071264k,nr_inodes=1017816,mode=755)
tmpfs on /tmp/nomad-csi/alloc/791c7aa6-252e-aac6-08d1-067d5bd9e51f/plugin/secrets type tmpfs (rw,noexec,relatime,size=1024k)
/dev/sda on /mnt/csi_test_volume type ext4 (rw,noatime,discard,data=ordered)
udev on /tmp/nomad-csi/client/csi/monolith/digitalocean/per-alloc/f901e5c7-760c-ffd5-4561-2d66bbbe9125/csi-test-volume/rw-block-device-single-node-writer type devtmpfs (rw,nosuid,relatime,size=4071264k,nr_inodes=1017816,mode=755)
tmpfs on /tmp/nomad-csi/alloc/f901e5c7-760c-ffd5-4561-2d66bbbe9125/redis/secrets type tmpfs (rw,noexec,relatime,size=1024k)
Ok, I've merged that patch and it'll ship in 0.12.2. (Just missed 0.12.1 that went out today, sorry folks!)
@tgross No worries, I'll apply patch on the top of 0.12.1 and wait. Thanks for the fix.