Nomad: CSI implement secrets support

Created on 9 Apr 2020 · 36Comments · Source: hashicorp/nomad

The MVP for CSI in the 0.11.0 release of Nomad did not include secrets support (ref : https://github.com/container-storage-interface/spec/blob/master/spec.md#secrets-requirements).

themstorage typenhancement

Source

tgross

👍10

Most helpful comment

Below the logs of the alloc, in the log line when the GRPC request is received, the request contains the secrets stanza but no parameters stanza while the volume HCL I used does contain one.

plugin-csi-rbd logs

I0609 09:11:55.484819       1 utils.go:159] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeStageVolume
I0609 09:11:55.484852       1 utils.go:160] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC request: {"secrets":"***stripped***","staging_target_path":"/csi/staging/mysql0/rw-file-system-single-node-writer","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["rw"]}},"access_mode":{"mode":1}},"volume_id":"vol-0b756b75620d63af5"}
E0609 09:11:55.486134       1 utils.go:163] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC error: rpc error: code = Internal desc = missing required parameter pool
I0609 09:11:59.725259       1 utils.go:159] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0609 09:11:59.725287       1 utils.go:160] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC request: {"target_path":"/csi/per-alloc/61402fcc-1084-6437-571a-ea4c282c4a27/mysql0/rw-file-system-single-node-writer","volume_id":"vol-0b756b75620d63af5"}
I0609 09:11:59.725909       1 nodeserver.go:501] ID: 9475 Req-ID: vol-0b756b75620d63af5 targetPath: /csi/per-alloc/61402fcc-1084-6437-571a-ea4c282c4a27/mysql0/rw-file-system-single-node-writer has already been deleted
I0609 09:11:59.725915       1 utils.go:165] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC response: {}
I0609 09:11:59.726794       1 utils.go:159] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeUnstageVolume
I0609 09:11:59.726806       1 utils.go:160] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC request: {"staging_target_path":"/csi/staging/mysql0/rw-file-system-single-node-writer","volume_id":"vol-0b756b75620d63af5"}
I0609 09:11:59.727382       1 nodeserver.go:586] ID: 9476 Req-ID: vol-0b756b75620d63af5 failed to find image metadata: open /csi/staging/mysql0/rw-file-system-single-node-writer/image-meta.json: no such file or directory
I0609 09:11:59.727396       1 utils.go:165] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC response: {}

volume HCL

type = "csi"
id = "mysql0"
name = "mysql0"
external_id = "vol-0b756b75620d63af5"
access_mode = "single-node-writer"
attachment_mode = "file-system"
plugin_id = "plugin-csi-rbd"
mount_options {
   fs_type = "ext4"
   mount_flags = ["rw"]
}

secrets {
  userID = "***stripped***"
  userKey = "***stripped***"
}
parameters {
  pool = "volumes"
}

tomiles on 9 Jun 2020

👍4

All 36 comments

Just ran into this too when evaluating ceph-rbd-csi plugin and it seems to be the only missing part to get it to work. Since it expects the cephx auth info to be passed to the plugin this way.

tomiles on 9 Apr 2020

👍3

When will secrets support be implemented?

mkrueger-sabio on 20 Apr 2020

When will secrets support be implemented?

Hi folks! I can't give you an exact timeline but we have enough requests for this specific feature that it's going to be in the first set of post-MVP features we're implementing for CSI.

tgross on 20 Apr 2020

👍4

Like tomiles and AdrianRibao (see here) we want to run nomad with ceph-csi, but need this feature for that to work.

jorisdevrede on 24 Apr 2020

Hello, I also write here what I wrote in the forum, if it helps:

We are running a ceph cluster and we are evaluating moving from kubernetes to nomad.

I’ve setup a nomad cluster, and I’m trying to use the ceph-csi driver.

I tried to follow the documentation https://learn.hashicorp.com/nomad/stateful-workloads/csi-volumes.

I’ve created this job:

jb "ceph-csi-nodes" {
  datacenters = ["dc1"]

  # you can run node plugins as service jobs as well, but this ensures
  # that all nodes in the DC have a copy.
  type = "system"

  group "nodes" {
    task "plugin" {
      driver = "docker"

      config {
        image = "quay.io/cephcsi/cephcsi:v2.1.0"

        args = [
            "--nodeid=${node.unique.name}",
            "--type=rbd",
            "--nodeserver=true",
            "--endpoint=unix://csi/csi.sock",
            "--v=5",
            "--drivername=rbd.csi.ceph.com",
        ]

        privileged = true
      }

      env {
        "ClusterID" = "<my_cluster_id>"
        "pool" = "SSDPool"
      }

      csi_plugin {
        id        = "ceph-rdb"
        type      = "node"
        mount_dir = "/csi"
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Output from nomad job status:

ID              Type     Priority  Status   Submit Date
ceph-csi-nodes  system   50        running  2020-04-22T09:13:47+02:100

Output from nomad plugin status ceph-rdb:

ID                   = ceph-rdb
Provider             = rbd.csi.ceph.com
Version              = v2.1.0
Controllers Healthy  = 0
Controllers Expected = 0
Nodes Healthy        = 1
Nodes Expected       = 1

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
480b6015  61b2567b  nodes       3        run      running  56m36s ago  22s ago

I’ve created a volume configuration like this one:

id = "ssd-volume"
name = "ssd volume"
type = "csi"
external_id = "<ceph_cluster_id>"
plugin_id = "ceph-rdb"
access_mode = "single-node-writer"
attachment_mode = "file-system"
mount_options {
   fs_type = "ext4"
   mount_flags = ["rw"]
}

Note: is the value got from ceph fsid

Output from nomad volume status:

Container Storage Interface
ID        Name        Plugin ID  Schedulable  Access Mode
ssd-volu  ssd volume  ceph-rdb   true         single-node-writer

If I try to create a job running that volume I get this error:

failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = InvalidArgument desc = stage secrets cannot be nil or empty

Because I guess I need to point to the Ceph monitors and configure the cluster to let the plugin know how to interact with ceph.

In kubernetes this is done with a ConfigMap. Check this example.

We need secrets support to pass values to the plugin.

Thanks!

AdrianRibao on 25 Apr 2020

moodymob on 3 May 2020

This is also needed for many on-prem storage solutions that provide CSI interfaces, such as NetApp and Dell EMC.

henrikjohansen on 11 May 2020

https://github.com/hashicorp/nomad/pull/7923 has been merged and this will ship in 0.11.2

tgross on 11 May 2020

👍3

Hello, i did try this with the version 0.11.2

job "ceph-csi-nodes" {
  datacenters = ["dc1"]

  # you can run node plugins as service jobs as well, but this ensures
  # that all nodes in the DC have a copy.
  type = "system"

  group "nodes" {
    task "plugin" {
      driver = "docker"

      config {
        image = "quay.io/cephcsi/cephcsi:v2.1.1"

        args = [
            "--nodeid=${node.unique.name}",
            "--type=rbd",
            "--nodeserver=true",
            "--endpoint=unix://csi/csi.sock",
            "--v=5",
            "--drivername=rbd.csi.ceph.com",
        ]
        mounts =[
          {
            type = "tmpfs"
            target = "/tmp/csi/keys"
            readonly = false
            tmpfs_options {
              size = 1000000 # size in bytes
            }
          },
        ]

        privileged = true
      }

      env {
        "clusterID" = "85a3220c-9487-11ea-92b7-ecb1d777f070"
        "pool" = "data"
      }

      csi_plugin {
        id        = "ceph-rdb"
        type      = "node"
        mount_dir = "/csi"
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Volume desc:

id = "es1"
name = "ES1 volume"
type = "csi"
external_id = "85a3220c-9487-11ea-92b7-ecb1d777f070"
plugin_id = "ceph-rdb"
access_mode = "single-node-writer"
attachment_mode = "file-system"
mount_options {
   fs_type = "ext4"
   mount_flags = ["rw"]
}

secrets {
  userID = "admin"
  userKey = "ATBF+rpepn2wNBCA1gn7mmEetFR3+Y4sd8rWiA=="
}

but now i've got this message :

failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = Internal desc = missing required parameter pool

moodymob on 17 May 2020

👍1

@moodymob We have a similar issue. I think it must be related to this issue. For the Ceph CSI plugin to work I think you must specify a whole topology to the volume desc like the example here

sbouts on 18 May 2020

missing required parameter pool

I admittedly don't have any experience with Ceph so I can't say for certain, but this error message may also suggest you need a parameters stanza in your volume spec.

tgross on 18 May 2020

@sbouts if you have more details on topology issues, if you could add them to https://github.com/hashicorp/nomad/issues/7669 that would be helpful for getting resources allocated to that issue.

tgross on 18 May 2020

missing required parameter pool

I admittedly don't have any experience with Ceph so I can't say for certain, but this error message may also suggest you need a parameters stanza in your volume spec.
i've added that in my volume definition but i've got this error message:

parameters {
  pool = "data"
}

Error decoding the volume definition: unexpected keys parameters

moodymob on 18 May 2020

@tgross I think the implementation of this issue will allow for the creation of Ceph volumes. I am basing this answer on the 'Available volume parameters' section here: ceph-csi config

sbouts on 20 May 2020

👍2

Citing from here:

The CSI plugin requires configuration information regarding the Ceph cluster(s), that would host the dynamically or statically provisioned volumes. This is provided by adding a per-cluster identifier (referred to as clusterID), and the required monitor details for the same, as in the provided sample config map.

From my point of view it seems new feature similar to Kubernetes configMap is required. Unfortunately I am not so much up to Kubernetes design but this should shed some light. If I got it right configMap is a kind of volume (basically yaml file provided as volume) and ceph-csi requires volume provided with name ceph-csi-config which could be mounted. If you are more up to Kubernetes reading provided information could let you figure out some easier way around.

kriestof on 20 May 2020

@sbouts Could you explain how it would be possible to provision config map? Or is there other way to provision ceph monitor ips? If I am able to build master I will try it with my ceph deployment.

kriestof on 21 May 2020

@sbouts Could you explain how it would be possible to provision config map? Or is there other way to provision ceph monitor ips? If I am able to build master I will try it with my ceph deployment.

@kriestof I overlooked the monitor list requirement. I think that in order for it to work Nomad CSI needs functionality like the Kubernetes configMap.

sbouts on 25 May 2020

👍1

Typically you'd implement something like configMap with a template stanza in the job spec. In this case, it looks like you'd want to add that to the plugin's jobs and have it write out as a config.json file, I think?

tgross on 26 May 2020

@tgross I believe configMap could be more related to volume stanza. I'm not sure how it works internally in ceph-csi but it somehow uses configMap volume under name ceph-csi-config.

On the other hand I was able to track ceph-csi seems to handle configMap simply by reading file: look here. Unfortunately I was not able to find out how file path is guessed. Maybe csi handles that? Better someone more familiar with kubernetes/csi give advice here.

kriestof on 26 May 2020

What's the "monitor" supposed to mean in this context?

tgross on 26 May 2020

@tgross Not sure if I got you right. Monitors implement consensus for ceph with Paxos the same way consul servers make consensus with Raft. In this context this could be just their ip (the same way you need to provide retry_join in consul).

kriestof on 26 May 2020

Ok. Oddly I don't see the configMap actually use anywhere in that examples directory. But looking at the plugin deployment for k8s here it looks like the config map is being consumed by the plugins, and not the volumes.

tgross on 26 May 2020

Cool! That was my missing part. Then probably mounting config.json at /etc/ceph-csi-config/ should be enough. Sadly ceph-csi docs does not mention that concentrating on k8s.

I'll try to find some time and build master to test if that is enough to use ceph-csi.

kriestof on 26 May 2020

Cool! That was my missing part. Then probably mounting config.json at /etc/ceph-csi-config/ should be enough. Sadly ceph-csi docs does not mention that concentrating on k8s.

I'll try to find some time and build master to test if that is enough to use ceph-csi.

job "ceph-csi-nodes" {
  datacenters = ["dc1"]

  # you can run node plugins as service jobs as well, but this ensures
  # that all nodes in the DC have a copy.
  type = "system"

  group "nodes" {
    task "plugin" {
      driver = "docker"

      config {
        image = "quay.io/cephcsi/cephcsi:v2.1.2"

        args = [
            "--nodeid=${node.unique.name}",
            "--type=rbd",
            "--nodeserver=true",
            "--endpoint=unix://csi/csi.sock",
            "--v=5",
            "--drivername=rbd.csi.ceph.com",
        ]
        mounts =[
          {
            type = "tmpfs"
            target = "/tmp/csi/keys"
            readonly = false
            tmpfs_options {
              size = 100000 # size in bytes
            }
          },
        ]

        privileged = true
      }

      env {
        "pool" = "data"
        "clusterID" = "7bf47f90-9909-11ea-a283-ecb1d777f070"
      }

      template {
        data = <<EOH
[
  {
    "clusterID": "7bf47f90-9909-11ea-a283-ecb1d777f070",
    "monitors": [
      "10.3.28.61",
      "10.3.28.62",
      "10.3.28.63"
    ]
  }
]
        EOH
        destination = "/etc/ceph-csi-config/config.json"
      }

      csi_plugin {
        id        = "ceph-rdb"
        type      = "node"
        mount_dir = "/csi"
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Here is my volume description:

id = "es1"
name = "ES1 volume"
type = "csi"
external_id = "7bf47f90-9909-11ea-a283-ecb1d777f070"
plugin_id = "ceph-rdb"
access_mode = "single-node-writer"
attachment_mode = "file-system"
mount_options {
   fs_type = "ext4"
   mount_flags = ["rw"]
}

secrets {
  userID = "admin"
  userKey = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
parameters {
  pool = "data"
}

But when i try to launch my job i've got something like before:

failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = Internal desc = missing required parameter pool

moodymob on 6 Jun 2020

@moodymob

Are you sure you properly updated volume? I struggled with that yesterday with no effect so far.
Do you use Nomad v0.11.3 or and older one? If I got it right parameters stanza is introduced in Nomad v0.11.3.

kriestof on 6 Jun 2020

@moodymob

Are you sure you properly updated volume? I struggled with that yesterday with no effect so far.

Do you use Nomad v0.11.3 or and older one? If I got it right parameters stanza is introduced in Nomad v0.11.3.

Sorry i forget to mention that i use Nomad 0.11.3 / ceph 15.2.3 / Debian 10.

And yes i did deregister and register my volume

moodymob on 6 Jun 2020

Can confirm what @moodymob reported, upgraded cluster to Nomad 0.11.3 and using the csi plugin and volume definition above I got the same error:

_Jun 08, '20 09:10:20 +0200
Setup Failure
failed to setup alloc: pre-run hook "csi_hook" failed: node plugin returned an internal error, check the plugin allocation logs for more information: rpc error: code = Internal desc = missing required parameter pool_

So for some reason the parameters stanza isn't picked up by the csi plugin

tomiles on 8 Jun 2020

👍1

Can you provide the plugin allocation logs, as well as the volume HCL you used?

tgross on 8 Jun 2020

Hi @tgross

here is the log, regards.

Jun 06, '20 12:10:28 +0200 | CSI | failed fingerprinting with error: rpc error: code = Canceled desc = context canceled

moodymob on 9 Jun 2020

Below the logs of the alloc, in the log line when the GRPC request is received, the request contains the secrets stanza but no parameters stanza while the volume HCL I used does contain one.

plugin-csi-rbd logs

I0609 09:11:55.484819       1 utils.go:159] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeStageVolume
I0609 09:11:55.484852       1 utils.go:160] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC request: {"secrets":"***stripped***","staging_target_path":"/csi/staging/mysql0/rw-file-system-single-node-writer","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["rw"]}},"access_mode":{"mode":1}},"volume_id":"vol-0b756b75620d63af5"}
E0609 09:11:55.486134       1 utils.go:163] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC error: rpc error: code = Internal desc = missing required parameter pool
I0609 09:11:59.725259       1 utils.go:159] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0609 09:11:59.725287       1 utils.go:160] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC request: {"target_path":"/csi/per-alloc/61402fcc-1084-6437-571a-ea4c282c4a27/mysql0/rw-file-system-single-node-writer","volume_id":"vol-0b756b75620d63af5"}
I0609 09:11:59.725909       1 nodeserver.go:501] ID: 9475 Req-ID: vol-0b756b75620d63af5 targetPath: /csi/per-alloc/61402fcc-1084-6437-571a-ea4c282c4a27/mysql0/rw-file-system-single-node-writer has already been deleted
I0609 09:11:59.725915       1 utils.go:165] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC response: {}
I0609 09:11:59.726794       1 utils.go:159] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeUnstageVolume
I0609 09:11:59.726806       1 utils.go:160] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC request: {"staging_target_path":"/csi/staging/mysql0/rw-file-system-single-node-writer","volume_id":"vol-0b756b75620d63af5"}
I0609 09:11:59.727382       1 nodeserver.go:586] ID: 9476 Req-ID: vol-0b756b75620d63af5 failed to find image metadata: open /csi/staging/mysql0/rw-file-system-single-node-writer/image-meta.json: no such file or directory
I0609 09:11:59.727396       1 utils.go:165] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC response: {}

volume HCL

type = "csi"
id = "mysql0"
name = "mysql0"
external_id = "vol-0b756b75620d63af5"
access_mode = "single-node-writer"
attachment_mode = "file-system"
plugin_id = "plugin-csi-rbd"
mount_options {
   fs_type = "ext4"
   mount_flags = ["rw"]
}

secrets {
  userID = "***stripped***"
  userKey = "***stripped***"
}
parameters {
  pool = "volumes"
}

tomiles on 9 Jun 2020

👍4

@tgross Could this be explained by the missing Parameters field in this _ControllerPublishVolumeRequest_ struct and its connected _ToCSIRepresentation_ func ?

https://github.com/hashicorp/nomad/blob/103d873ebe84cb57cbb09a599112a39c69592ac1/plugins/csi/plugin.go#L256-L278

I see the Parameters field is validated in _ControllerValidateVolumeRequest_ but not published in the request send to the csi plugin.

tomiles on 18 Jun 2020

Found this container-storage-interface spec issue, so seems the Parameters field is also missing in some request types on their side.

tomiles on 18 Jun 2020

Ceph-csi uses the same csi/spec module and they seem to get the incoming Parameters field from the csi.CreateVolumeRequest and move the parameters to volumeContext

See: https://github.com/ceph/ceph-csi/blob/3364fe7b781a97fff8c74e72d973bd498e0af50c/internal/rbd/controllerserver.go#L244-L253

So I don't know what's missing to enable nomad to pass in the parameters.

tomiles on 18 Jun 2020

👍1

Hi all, I have looked into why the Nomad <--> ceph-csi communication was not as smooth as could be. In short: Nomad has yet to fully implement the CSI specifications. You can look at the feature request here: https://github.com/hashicorp/nomad/issues/8212

sbouts on 22 Jun 2020

Has anyone been able to successfully get Ceph to integrate with CSI Nomad implementation?

finish06 on 12 Jul 2020

@finish06 as @sbouts has pointed out above, our CSI support doesn't currently implement the full spec, including the CreateVolume (and related) RPCs we'd need to support Ceph.

tgross on 13 Jul 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings