Nomad: Jobs Unable to attach to EBS CSI Volumes When Plugin Status Reports Incorrect Controller Healthy/Expected Count

Created on 28 Apr 2020 · 4Comments · Source: hashicorp/nomad

Nomad version

Nomad Server and Clients are both running the following build:

Nomad v0.11.1 (b43457070037800fcc8442c8ff095ff4005dab33)

Operating system and Environment details

Amazon Linux 2:

4.14.173-137.229.amzn2.x86_64

Issue

While running the EBS CSI plugin I have noticed that nomad expects plugin tasks that complete to still report as healthy:

$ nomad plugin status aws-ebs4
ID                   = aws-ebs4
Provider             = ebs.csi.aws.com
Version              = v0.6.0-dirty
Controllers Healthy  = 1
Controllers Expected = 2
Nodes Healthy        = 3
Nodes Expected       = 4

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
738adb4b  46e6db9e  controller  4        run      running   33m59s ago  31m16s ago
a999a840  4470dc51  controller  3        stop     complete  32m4s ago   31m25s ago
9290e85e  46e6db9e  nodes       0        run      running   42m23s ago  42m16s ago
eed3459a  ec4c06b3  nodes       0        stop     complete  42m23s ago  35m8s ago
d9ecfc6b  4470dc51  nodes       0        run      running   42m23s ago  42m8s ago
ad2698aa  eaac2f32  nodes       0        run      running   37m49s ago  37m31s ago

This seems unusual since if a CSI plugin has completed it should no longer be expected to be running and healthy. When this mismatch between healthy and expected plugin task counts occurs, all tasks that need to attach a CSI volume using the plugin in question are unable to do so. Instead of successfully mounting the volume, the following error occurs:

failed to setup alloc: pre-run hook "csi_hook" failed: rpc error: code = InvalidArgument desc = Device path not provided

The plugin returns this error when it is missing information in the PublishContext passed to a NodePublishVolume/NodeStageVolume RPC as seen here.
The PublishContext is returned by a ControllerPublishVolume RPC, however, after checking the logs of my controller plugin it turns out ControllerPublishVolume is never called.

Again, this only occurs when there is a mismatch between healthy and expected counts. Otherwise ControllerPublishVolume is called when a task requesting a CSI volume is scheduled and the volume is successfully attached.

Reproduction steps

The easiest way to create a healthy/expected value mismatch is to increase the number of controller tasks to 2 then decrement back to 1.

Run the CSI controller plugin job:

job "plugin-aws-ebs-controller" {
  datacenters = ["dc1"]

  group "controller" {
    task "plugin" {
      driver = "docker"

      config {
        image = "amazon/aws-ebs-csi-driver:latest"

        args = [
          "controller",
          "--endpoint=unix://csi/csi.sock",
          "--logtostderr",
          "--v=5",
        ]
      }

      csi_plugin {
        id        = "aws-ebs0"
        type      = "controller"
        mount_dir = "/csi"
      }

      resources {
        cpu    = 500
        memory = 256
      }

      # ensuring the plugin has time to shut down gracefully 
      kill_timeout = "2m"
    }
  }
}

2 Run the CSI node plugin job:

job "plugin-aws-ebs-nodes" {
  datacenters = ["dc1"]

  # you can run node plugins as service jobs as well, but this ensures
  # that all nodes in the DC have a copy.
  type = "system"

  group "nodes" {
    task "plugin" {
      driver = "docker"

      config {
        image = "amazon/aws-ebs-csi-driver:latest"

        args = [
          "node",
          "--endpoint=unix://csi/csi.sock",
          "--logtostderr",
          "--v=5",
        ]

        # node plugins must run as privileged jobs because they
        # mount disks to the host
        privileged = true
      }

      csi_plugin {
        id        = "aws-ebs0"
        type      = "node"
        mount_dir = "/csi"
      }

      resources {
        cpu    = 500
        memory = 256
      }

      # ensuring the plugin has time to shut down gracefully 
      kill_timeout = "2m"
    }
  }
}

Create and register an EBS volume with nomad. E.G. https://learn.hashicorp.com/nomad/stateful-workloads/csi-volumes
Optionally run the example MySQL job to verify that volumes can be attached successfully. Be sure to use constraints to run the task using the volume in the same availability zone as your EBS volume

job "mysql-server" {
  datacenters = ["dc1"]
  type        = "service"

  group "mysql-server" {
    count = 1

    volume "mysql" {
      type      = "csi"
      read_only = false
      source    = "mysql"
    }

    restart {
      attempts = 10
      interval = "5m"
      delay    = "25s"
      mode     = "delay"
    }

    task "mysql-server" {
      driver = "docker"

      volume_mount {
        volume      = "mysql"
        destination = "/srv"
        read_only   = false
      }

      env = {
        "MYSQL_ROOT_PASSWORD" = "password"
      }

      config {
        image = "hashicorp/mysql-portworx-demo:latest"
        args = ["--datadir", "/srv/mysql"]

        port_map {
          db = 3306
        }
      }

      resources {
        cpu    = 500
        memory = 1024

        network {
          port "db" {
            static = 3306
          }
        }
      }

      service {
        name = "mysql-server"
        port = "db"

        check {
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

Increment the count for the number of controller plugin tasks to 2 and wait for the new task to become healthy. Then scale down to 1 task and wait for the other task to complete.
Run nomad plugin status. You should see mismatched healthy/expected values for the controller plugins. E.G.

Container Storage Interface
ID        Provider         Controllers Healthy/Expected  Nodes Healthy/Expected
aws-ebs4  ebs.csi.aws.com  1/2                           3/4

Run the MySQL job. You should now see the "Device path not provided" error.

Additional Notes:

I am also seeing issues where plugins with no running jobs are not being garbage collected as described here:

https://github.com/hashicorp/nomad/issues/7743

$ nomad plugin status
Container Storage Interface
ID        Provider         Controllers Healthy/Expected  Nodes Healthy/Expected
aws-ebs0  ebs.csi.aws.com  0/3                           0/29
aws-ebs2  ebs.csi.aws.com  0/2                           0/25
aws-ebs3  ebs.csi.aws.com  0/2                           0/3
aws-ebs4  ebs.csi.aws.com  1/2                           3/4

Not sure if this could be related but I figured it was worth mentioning.

themstorage typbug

Source

tydomitrovich

All 4 comments

Hi @tydomitrovich! Thanks for the thorough reproduction!

This seems unusual since if a CSI plugin has completed it should no longer be expected to be running and healthy. When this mismatch between healthy and expected plugin task counts occurs, all tasks that need to attach a CSI volume using the plugin in question are unable to do so.

Yeah, agreed that this is totally a bug. That'll impact updates to plugins too, I think. I don't have a good workaround for you at the moment but I'll dig in and see if I can come up with a fix shortly.

tgross on 28 Apr 2020

👍1

Hello @tgross, thanks for taking a look! I will be monitoring as I am really excited about using the new CSI features.

tydomitrovich on 28 Apr 2020

I'm working up a PR https://github.com/hashicorp/nomad/pull/7844 which should clear this up. I need to check a few more things out but I'm making good progress on it.

tgross on 30 Apr 2020

👍1