Nomad: CSI implement secrets support

Created on 9 Apr 2020  路  36Comments  路  Source: hashicorp/nomad

The MVP for CSI in the 0.11.0 release of Nomad did not include secrets support (ref : https://github.com/container-storage-interface/spec/blob/master/spec.md#secrets-requirements).

themstorage typenhancement

Most helpful comment

Below the logs of the alloc, in the log line when the GRPC request is received, the request contains the secrets stanza but no parameters stanza while the volume HCL I used does contain one.

plugin-csi-rbd logs

I0609 09:11:55.484819       1 utils.go:159] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeStageVolume
I0609 09:11:55.484852       1 utils.go:160] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC request: {"secrets":"***stripped***","staging_target_path":"/csi/staging/mysql0/rw-file-system-single-node-writer","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["rw"]}},"access_mode":{"mode":1}},"volume_id":"vol-0b756b75620d63af5"}
E0609 09:11:55.486134       1 utils.go:163] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC error: rpc error: code = Internal desc = missing required parameter pool
I0609 09:11:59.725259       1 utils.go:159] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0609 09:11:59.725287       1 utils.go:160] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC request: {"target_path":"/csi/per-alloc/61402fcc-1084-6437-571a-ea4c282c4a27/mysql0/rw-file-system-single-node-writer","volume_id":"vol-0b756b75620d63af5"}
I0609 09:11:59.725909       1 nodeserver.go:501] ID: 9475 Req-ID: vol-0b756b75620d63af5 targetPath: /csi/per-alloc/61402fcc-1084-6437-571a-ea4c282c4a27/mysql0/rw-file-system-single-node-writer has already been deleted
I0609 09:11:59.725915       1 utils.go:165] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC response: {}
I0609 09:11:59.726794       1 utils.go:159] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeUnstageVolume
I0609 09:11:59.726806       1 utils.go:160] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC request: {"staging_target_path":"/csi/staging/mysql0/rw-file-system-single-node-writer","volume_id":"vol-0b756b75620d63af5"}
I0609 09:11:59.727382       1 nodeserver.go:586] ID: 9476 Req-ID: vol-0b756b75620d63af5 failed to find image metadata: open /csi/staging/mysql0/rw-file-system-single-node-writer/image-meta.json: no such file or directory
I0609 09:11:59.727396       1 utils.go:165] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC response: {}


volume HCL

type = "csi"
id = "mysql0"
name = "mysql0"
external_id = "vol-0b756b75620d63af5"
access_mode = "single-node-writer"
attachment_mode = "file-system"
plugin_id = "plugin-csi-rbd"
mount_options {
   fs_type = "ext4"
   mount_flags = ["rw"]
}

secrets {
  userID = "***stripped***"
  userKey = "***stripped***"
}
parameters {
  pool = "volumes"
}

All 36 comments

Just ran into this too when evaluating ceph-rbd-csi plugin and it seems to be the only missing part to get it to work. Since it expects the cephx auth info to be passed to the plugin this way.

When will secrets support be implemented?

When will secrets support be implemented?

Hi folks! I can't give you an exact timeline but we have enough requests for this specific feature that it's going to be in the first set of post-MVP features we're implementing for CSI.

Like tomiles and AdrianRibao (see here) we want to run nomad with ceph-csi, but need this feature for that to work.

Hello, I also write here what I wrote in the forum, if it helps:

We are running a ceph cluster and we are evaluating moving from kubernetes to nomad.

I鈥檝e setup a nomad cluster, and I鈥檓 trying to use the ceph-csi driver.

I tried to follow the documentation https://learn.hashicorp.com/nomad/stateful-workloads/csi-volumes.

I鈥檝e created this job:

jb "ceph-csi-nodes" {
  datacenters = ["dc1"]

  # you can run node plugins as service jobs as well, but this ensures
  # that all nodes in the DC have a copy.
  type = "system"

  group "nodes" {
    task "plugin" {
      driver = "docker"

      config {
        image = "quay.io/cephcsi/cephcsi:v2.1.0"

        args = [
            "--nodeid=${node.unique.name}",
            "--type=rbd",
            "--nodeserver=true",
            "--endpoint=unix://csi/csi.sock",
            "--v=5",
            "--drivername=rbd.csi.ceph.com",
        ]

        privileged = true
      }

      env {
        "ClusterID" = "<my_cluster_id>"
        "pool" = "SSDPool"
      }

      csi_plugin {
        id        = "ceph-rdb"
        type      = "node"
        mount_dir = "/csi"
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Output from nomad job status:

ID              Type     Priority  Status   Submit Date
ceph-csi-nodes  system   50        running  2020-04-22T09:13:47+02:100

Output from nomad plugin status ceph-rdb:

ID                   = ceph-rdb
Provider             = rbd.csi.ceph.com
Version              = v2.1.0
Controllers Healthy  = 0
Controllers Expected = 0
Nodes Healthy        = 1
Nodes Expected       = 1

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
480b6015  61b2567b  nodes       3        run      running  56m36s ago  22s ago

I鈥檝e created a volume configuration like this one:

id = "ssd-volume"
name = "ssd volume"
type = "csi"
external_id = "<ceph_cluster_id>"
plugin_id = "ceph-rdb"
access_mode = "single-node-writer"
attachment_mode = "file-system"
mount_options {
   fs_type = "ext4"
   mount_flags = ["rw"]
}

Note: is the value got from ceph fsid

Output from nomad volume status:

Container Storage Interface
ID        Name        Plugin ID  Schedulable  Access Mode
ssd-volu  ssd volume  ceph-rdb   true         single-node-writer

If I try to create a job running that volume I get this error:

failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = InvalidArgument desc = stage secrets cannot be nil or empty

Because I guess I need to point to the Ceph monitors and configure the cluster to let the plugin know how to interact with ceph.

In kubernetes this is done with a ConfigMap. Check this example.

We need secrets support to pass values to the plugin.

Thanks!

+1

This is also needed for many on-prem storage solutions that provide CSI interfaces, such as NetApp and Dell EMC.

https://github.com/hashicorp/nomad/pull/7923 has been merged and this will ship in 0.11.2

Hello, i did try this with the version 0.11.2

job "ceph-csi-nodes" {
  datacenters = ["dc1"]

  # you can run node plugins as service jobs as well, but this ensures
  # that all nodes in the DC have a copy.
  type = "system"

  group "nodes" {
    task "plugin" {
      driver = "docker"

      config {
        image = "quay.io/cephcsi/cephcsi:v2.1.1"

        args = [
            "--nodeid=${node.unique.name}",
            "--type=rbd",
            "--nodeserver=true",
            "--endpoint=unix://csi/csi.sock",
            "--v=5",
            "--drivername=rbd.csi.ceph.com",
        ]
        mounts =[
          {
            type = "tmpfs"
            target = "/tmp/csi/keys"
            readonly = false
            tmpfs_options {
              size = 1000000 # size in bytes
            }
          },
        ]

        privileged = true
      }

      env {
        "clusterID" = "85a3220c-9487-11ea-92b7-ecb1d777f070"
        "pool" = "data"
      }

      csi_plugin {
        id        = "ceph-rdb"
        type      = "node"
        mount_dir = "/csi"
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Volume desc:

id = "es1"
name = "ES1 volume"
type = "csi"
external_id = "85a3220c-9487-11ea-92b7-ecb1d777f070"
plugin_id = "ceph-rdb"
access_mode = "single-node-writer"
attachment_mode = "file-system"
mount_options {
   fs_type = "ext4"
   mount_flags = ["rw"]
}

secrets {
  userID = "admin"
  userKey = "ATBF+rpepn2wNBCA1gn7mmEetFR3+Y4sd8rWiA=="
}

but now i've got this message :

failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = Internal desc = missing required parameter pool

@moodymob We have a similar issue. I think it must be related to this issue. For the Ceph CSI plugin to work I think you must specify a whole topology to the volume desc like the example here

missing required parameter pool

I admittedly don't have any experience with Ceph so I can't say for certain, but this error message may also suggest you need a parameters stanza in your volume spec.

@sbouts if you have more details on topology issues, if you could add them to https://github.com/hashicorp/nomad/issues/7669 that would be helpful for getting resources allocated to that issue.

missing required parameter pool

I admittedly don't have any experience with Ceph so I can't say for certain, but this error message may also suggest you need a parameters stanza in your volume spec.
i've added that in my volume definition but i've got this error message:

parameters {
  pool = "data"
}

Error decoding the volume definition: unexpected keys parameters

@tgross I think the implementation of this issue will allow for the creation of Ceph volumes. I am basing this answer on the 'Available volume parameters' section here: ceph-csi config

Citing from here:

The CSI plugin requires configuration information regarding the Ceph cluster(s), that would host the dynamically or statically provisioned volumes. This is provided by adding a per-cluster identifier (referred to as clusterID), and the required monitor details for the same, as in the provided sample config map.

From my point of view it seems new feature similar to Kubernetes configMap is required. Unfortunately I am not so much up to Kubernetes design but this should shed some light. If I got it right configMap is a kind of volume (basically yaml file provided as volume) and ceph-csi requires volume provided with name ceph-csi-config which could be mounted. If you are more up to Kubernetes reading provided information could let you figure out some easier way around.

@sbouts Could you explain how it would be possible to provision config map? Or is there other way to provision ceph monitor ips? If I am able to build master I will try it with my ceph deployment.

@sbouts Could you explain how it would be possible to provision config map? Or is there other way to provision ceph monitor ips? If I am able to build master I will try it with my ceph deployment.

@kriestof I overlooked the monitor list requirement. I think that in order for it to work Nomad CSI needs functionality like the Kubernetes configMap.

Typically you'd implement something like configMap with a template stanza in the job spec. In this case, it looks like you'd want to add that to the plugin's jobs and have it write out as a config.json file, I think?

@tgross I believe configMap could be more related to volume stanza. I'm not sure how it works internally in ceph-csi but it somehow uses configMap volume under name ceph-csi-config.

On the other hand I was able to track ceph-csi seems to handle configMap simply by reading file: look here. Unfortunately I was not able to find out how file path is guessed. Maybe csi handles that? Better someone more familiar with kubernetes/csi give advice here.

What's the "monitor" supposed to mean in this context?

@tgross Not sure if I got you right. Monitors implement consensus for ceph with Paxos the same way consul servers make consensus with Raft. In this context this could be just their ip (the same way you need to provide retry_join in consul).

Ok. Oddly I don't see the configMap actually use anywhere in that examples directory. But looking at the plugin deployment for k8s here it looks like the config map is being consumed by the plugins, and not the volumes.

Cool! That was my missing part. Then probably mounting config.json at /etc/ceph-csi-config/ should be enough. Sadly ceph-csi docs does not mention that concentrating on k8s.

I'll try to find some time and build master to test if that is enough to use ceph-csi.

Cool! That was my missing part. Then probably mounting config.json at /etc/ceph-csi-config/ should be enough. Sadly ceph-csi docs does not mention that concentrating on k8s.

I'll try to find some time and build master to test if that is enough to use ceph-csi.

job "ceph-csi-nodes" {
  datacenters = ["dc1"]

  # you can run node plugins as service jobs as well, but this ensures
  # that all nodes in the DC have a copy.
  type = "system"

  group "nodes" {
    task "plugin" {
      driver = "docker"

      config {
        image = "quay.io/cephcsi/cephcsi:v2.1.2"

        args = [
            "--nodeid=${node.unique.name}",
            "--type=rbd",
            "--nodeserver=true",
            "--endpoint=unix://csi/csi.sock",
            "--v=5",
            "--drivername=rbd.csi.ceph.com",
        ]
        mounts =[
          {
            type = "tmpfs"
            target = "/tmp/csi/keys"
            readonly = false
            tmpfs_options {
              size = 100000 # size in bytes
            }
          },
        ]

        privileged = true
      }

      env {
        "pool" = "data"
        "clusterID" = "7bf47f90-9909-11ea-a283-ecb1d777f070"
      }

      template {
        data = <<EOH
[
  {
    "clusterID": "7bf47f90-9909-11ea-a283-ecb1d777f070",
    "monitors": [
      "10.3.28.61",
      "10.3.28.62",
      "10.3.28.63"
    ]
  }
]
        EOH
        destination = "/etc/ceph-csi-config/config.json"
      }

      csi_plugin {
        id        = "ceph-rdb"
        type      = "node"
        mount_dir = "/csi"
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Here is my volume description:

id = "es1"
name = "ES1 volume"
type = "csi"
external_id = "7bf47f90-9909-11ea-a283-ecb1d777f070"
plugin_id = "ceph-rdb"
access_mode = "single-node-writer"
attachment_mode = "file-system"
mount_options {
   fs_type = "ext4"
   mount_flags = ["rw"]
}

secrets {
  userID = "admin"
  userKey = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
parameters {
  pool = "data"
}

But when i try to launch my job i've got something like before:

failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = Internal desc = missing required parameter pool

@moodymob

  • Are you sure you properly updated volume? I struggled with that yesterday with no effect so far.
  • Do you use Nomad v0.11.3 or and older one? If I got it right parameters stanza is introduced in Nomad v0.11.3.

@moodymob

  • Are you sure you properly updated volume? I struggled with that yesterday with no effect so far.
  • Do you use Nomad v0.11.3 or and older one? If I got it right parameters stanza is introduced in Nomad v0.11.3.

Sorry i forget to mention that i use Nomad 0.11.3 / ceph 15.2.3 / Debian 10.

And yes i did deregister and register my volume

Can confirm what @moodymob reported, upgraded cluster to Nomad 0.11.3 and using the csi plugin and volume definition above I got the same error:

_Jun 08, '20 09:10:20 +0200
Setup Failure
failed to setup alloc: pre-run hook "csi_hook" failed: node plugin returned an internal error, check the plugin allocation logs for more information: rpc error: code = Internal desc = missing required parameter pool_

So for some reason the parameters stanza isn't picked up by the csi plugin

Can you provide the plugin allocation logs, as well as the volume HCL you used?

Hi @tgross

here is the log, regards.

Jun 06, '20 12:10:28 +0200 | CSI | failed fingerprinting with error: rpc error: code = Canceled desc = context canceled

Below the logs of the alloc, in the log line when the GRPC request is received, the request contains the secrets stanza but no parameters stanza while the volume HCL I used does contain one.

plugin-csi-rbd logs

I0609 09:11:55.484819       1 utils.go:159] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeStageVolume
I0609 09:11:55.484852       1 utils.go:160] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC request: {"secrets":"***stripped***","staging_target_path":"/csi/staging/mysql0/rw-file-system-single-node-writer","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["rw"]}},"access_mode":{"mode":1}},"volume_id":"vol-0b756b75620d63af5"}
E0609 09:11:55.486134       1 utils.go:163] ID: 9474 Req-ID: vol-0b756b75620d63af5 GRPC error: rpc error: code = Internal desc = missing required parameter pool
I0609 09:11:59.725259       1 utils.go:159] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0609 09:11:59.725287       1 utils.go:160] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC request: {"target_path":"/csi/per-alloc/61402fcc-1084-6437-571a-ea4c282c4a27/mysql0/rw-file-system-single-node-writer","volume_id":"vol-0b756b75620d63af5"}
I0609 09:11:59.725909       1 nodeserver.go:501] ID: 9475 Req-ID: vol-0b756b75620d63af5 targetPath: /csi/per-alloc/61402fcc-1084-6437-571a-ea4c282c4a27/mysql0/rw-file-system-single-node-writer has already been deleted
I0609 09:11:59.725915       1 utils.go:165] ID: 9475 Req-ID: vol-0b756b75620d63af5 GRPC response: {}
I0609 09:11:59.726794       1 utils.go:159] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC call: /csi.v1.Node/NodeUnstageVolume
I0609 09:11:59.726806       1 utils.go:160] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC request: {"staging_target_path":"/csi/staging/mysql0/rw-file-system-single-node-writer","volume_id":"vol-0b756b75620d63af5"}
I0609 09:11:59.727382       1 nodeserver.go:586] ID: 9476 Req-ID: vol-0b756b75620d63af5 failed to find image metadata: open /csi/staging/mysql0/rw-file-system-single-node-writer/image-meta.json: no such file or directory
I0609 09:11:59.727396       1 utils.go:165] ID: 9476 Req-ID: vol-0b756b75620d63af5 GRPC response: {}


volume HCL

type = "csi"
id = "mysql0"
name = "mysql0"
external_id = "vol-0b756b75620d63af5"
access_mode = "single-node-writer"
attachment_mode = "file-system"
plugin_id = "plugin-csi-rbd"
mount_options {
   fs_type = "ext4"
   mount_flags = ["rw"]
}

secrets {
  userID = "***stripped***"
  userKey = "***stripped***"
}
parameters {
  pool = "volumes"
}

@tgross Could this be explained by the missing Parameters field in this _ControllerPublishVolumeRequest_ struct and its connected _ToCSIRepresentation_ func ?

https://github.com/hashicorp/nomad/blob/103d873ebe84cb57cbb09a599112a39c69592ac1/plugins/csi/plugin.go#L256-L278

I see the Parameters field is validated in _ControllerValidateVolumeRequest_ but not published in the request send to the csi plugin.

Found this container-storage-interface spec issue, so seems the Parameters field is also missing in some request types on their side.

Ceph-csi uses the same csi/spec module and they seem to get the incoming Parameters field from the csi.CreateVolumeRequest and move the parameters to volumeContext

See: https://github.com/ceph/ceph-csi/blob/3364fe7b781a97fff8c74e72d973bd498e0af50c/internal/rbd/controllerserver.go#L244-L253

So I don't know what's missing to enable nomad to pass in the parameters.

Hi all, I have looked into why the Nomad <--> ceph-csi communication was not as smooth as could be. In short: Nomad has yet to fully implement the CSI specifications. You can look at the feature request here: https://github.com/hashicorp/nomad/issues/8212

Has anyone been able to successfully get Ceph to integrate with CSI Nomad implementation?

@finish06 as @sbouts has pointed out above, our CSI support doesn't currently implement the full spec, including the CreateVolume (and related) RPCs we'd need to support Ceph.

Was this page helpful?
0 / 5 - 0 ratings