Nomad: [CSI] "volume in use" error deregistering volume previously associated with now stopped job

Created on 27 May 2020 · 24Comments · Source: hashicorp/nomad

Getting a "volume in use" error when trying to deregister, long after the volume has stopped being in use. Possibly a recurrence or variation of https://github.com/hashicorp/nomad/issues/7625, but this time my whole cluster is running Nomad 0.11.2, whereas that issue was on a 0.11.0 RC.

Nomad version

v0.11.2

Operating system and Environment details

Ubuntu 16

Issue

I have stopped jobs, which have been stopped for over a day, but they're still somehow preventing me from deregistering the CSI volume that was associated with them.

Nomad Client logs (if appropriate)

[hashidev] hwilkins ~ $ nomad volume deregister test_default_1a
Error deregistering volume: Unexpected response code: 500 (volume in use: test_default_1a)
[hashidev] hwilkins ~ $ nomad volume status test_default_1a
ID                   = test_default_1a
Name                 = test_default_1a
External ID          = vol-<SCRUB>
Plugin ID            = aws-ebs0
Provider             = ebs.csi.aws.com
Version              = v0.6.0-dirty
Schedulable          = false
Controllers Healthy  = 0
Controllers Expected = 0
Nodes Healthy        = 0
Nodes Expected       = 0
Access Mode          = single-node-writer
Attachment Mode      = file-system
Mount Options        = <none>
Namespace            = default

Allocations
ID        Node ID   Task Group           Version  Desired  Status    Created   Modified
090982bb  fed5596e  csi-ebs-hello-world  0        run      complete  1d3h ago  1d3h ago
[hashidev] hwilkins ~ $ nomad alloc status 090982bb
No allocation(s) with prefix or id "090982bb" found
[hashidev] hwilkins ~ $ nomad status csi-ebs-hello-world
Multiple matches found for id "csi-ebs-hello-world"

Volumes:
csi-ebs-hello-world-ebs_1a, csi-ebs-hello-world_1a
[hashidev] hwilkins ~ $ nomad status csi-ebs-hello-world-ebs_1a
ID                   = csi-ebs-hello-world-ebs_1a
Name                 = csi-ebs-hello-world-ebs_1a
External ID          = vol-<SCRUB>
Plugin ID            = aws-ebs0
Provider             = ebs.csi.aws.com
Version              = v0.6.0-dirty
Schedulable          = false
Controllers Healthy  = 0
Controllers Expected = 0
Nodes Healthy        = 0
Nodes Expected       = 0
Access Mode          = single-node-writer
Attachment Mode      = file-system
Mount Options        = <none>
Namespace            = default

Allocations
ID        Node ID   Task Group           Version  Desired  Status    Created   Modified
7eba8625  fed5596e  csi-ebs-hello-world  0        run      complete  1d1h ago  1d1h ago
[hashidev] hwilkins ~ $ nomad status csi-ebs-hello-world_1a
ID                   = csi-ebs-hello-world_1a
Name                 = csi-ebs-hello-world_1a
External ID          = vol-<SCRUB>
Plugin ID            = aws-ebs0
Provider             = ebs.csi.aws.com
Version              = v0.6.0-dirty
Schedulable          = false
Controllers Healthy  = 0
Controllers Expected = 0
Nodes Healthy        = 0
Nodes Expected       = 0
Access Mode          = single-node-writer
Attachment Mode      = file-system
Mount Options        = <none>
Namespace            = default

Allocations
ID        Node ID   Task Group           Version  Desired  Status    Created   Modified
af81bebc  fed5596e  csi-ebs-hello-world  0        run      complete  1d2h ago  1d2h ago

I've also verified via the AWS console that the volume itself is not attached to an instance, but in the available state.

Nomad Server logs (if appropriate)

May 27 04:58:50 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T04:58:50.174Z [ERROR] nomad.fsm: CSIVolumeDeregister failed: error="volume in use: test_default_1a"
May 27 04:58:50 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T04:58:50.174Z [ERROR] http: request failed: method=DELETE path=/v1/volume/csi/test_default_1a error="volume in use: test_default_1a" code=500
May 27 04:58:50 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T04:58:50.174Z [DEBUG] http: request complete: method=DELETE path=/v1/volume/csi/test_default_1a duration=1.07008ms

themdrain themstorage typbug

Source

holtwilkins

👍2

Most helpful comment

Is there a workaround for "unsticking" volumes stuck in limbo like this?

The nomad volume deregister -force option was merged for 0.12.0-rc1 but unfortunately it looks like there's a bug in it that isn't going to make it into the GA (see https://github.com/hashicorp/nomad/pull/8380)

To be blunt, I don't know that CSI support can even be called "beta" at this point. The current bugs are severe enough to make CSI effectively unusable, and it doesn't appear from the release notes that any progress was made on it in 0.12.

I really can't disagree with you there, @jfcantu. The CSI project had been my primary focus but we were pretty short-handed for it during development. As you might note from some of the other open issues with the storage tag there's constant struggle with the spec itself and plugins that may not be bug-free themselves. I'll admit I'm not particularly proud of the shape it's in. Unfortunately I had to focus on some other projects for the 0.12.0 release cycle, but post-0.12.0 GA I've got some time carved out to spend on this to get the rest of these bugs worked out.

is there a release for the force option on nomad 0.11 ?

There is not and we won't be backporting that.

tgross on 8 Jul 2020

👍7 👀1

All 24 comments

I also just saw the -purge flag and tried that:

[hashidev] hwilkins ~ $ nomad status csi-ebs-hello-world-ebs_1a
nomaID                   = csi-ebs-hello-world-ebs_1a
Name                 = csi-ebs-hello-world-ebs_1a
External ID          = vol-<SCRUB>
Plugin ID            = aws-ebs0
Provider             = ebs.csi.aws.com
Version              = v0.6.0-dirty
Schedulable          = false
Controllers Healthy  = 0
Controllers Expected = 0
Nodes Healthy        = 0
Nodes Expected       = 0
Access Mode          = single-node-writer
Attachment Mode      = file-system
Mount Options        = <none>
Namespace            = default

Allocations
ID        Node ID   Task Group           Version  Desired  Status    Created   Modified
7eba8625  fed5596e  csi-ebs-hello-world  0        run      complete  1d1h ago  1d1h ago
[hashidev] hwilkins ~ $ nomad stop -purge csi-ebs-hello-world-ebs_1a
No job(s) with prefix or id "csi-ebs-hello-world-ebs_1a" found

It's also probably worth noting that the node in question where these old allocs were held had been deployed with 0.11.1, but they've since been cycled-out and replaced by nodes running 0.11.2

[hashidev] hwilkins ~ $ nomad node status fed5596e

error fetching node stats: Unexpected response code: 404 (No path to node)
ID              = fed5596e-968a-a260-e819-0b7abf8f64bf
Name            = hashidev-worker-<SCRUB>
Class           = <none>
DC              = us-east-1
Drain           = false
Eligibility     = ineligible
Status          = down
CSI Controllers = aws-ebs0
CSI Drivers     = aws-ebs0
Host Volumes    = <none>
CSI Volumes     = <none>
Driver Status   = docker

Node Events
Time                  Subsystem  Message
2020-05-26T05:12:58Z  Cluster    Node heartbeat missed
2020-05-26T05:12:23Z  CSI        failed fingerprinting with error: rpc error: code = Canceled desc = context canceled
2020-05-26T05:12:22Z  CSI        failed fingerprinting with error: rpc error: code = Canceled desc = context canceled
2020-05-26T05:12:22Z  Drain      Node drain complete
2020-05-26T05:12:21Z  Drain      Node drain strategy set
2020-05-26T04:00:11Z  Storage    Unmount volume
2020-05-26T04:00:04Z  Storage    Mount volume
2020-05-26T03:00:07Z  Storage    Unmount volume
2020-05-26T03:00:04Z  Storage    Mount volume
2020-05-26T02:40:09Z  Storage    Unmount volume

Allocated Resources
CPU         Memory       Disk
0/4250 MHz  0 B/6.4 GiB  0 B/65 GiB

Allocation Resource Utilization
CPU         Memory
0/4250 MHz  0 B/6.4 GiB

error fetching node stats: actual resource usage not present

Allocations
No allocations placed

So I'm just looking for a way to forcefully get rid of the reference to this volume first, and second to lodge this as a bug :).

I've also tried forcing nomad system gc many hours ago, so it doesn't seem to have cleared it up either.

holtwilkins on 27 May 2020

I'm a little confused by the CLI outputs you've provided here. It looks like you're showing me three volumes:

test_default_1a, claimed by alloc 090982bb
csi-ebs-hello-world-ebs_1a, claimed by alloc 7eba8625
csi-ebs-hello-world_1a, claimed by alloc af81bebc

But then you're trying to run a job stop on the volume, which isn't an operation supported on volumes:

$ nomad stop -purge csi-ebs-hello-world-ebs_1a
No job(s) with prefix or id "csi-ebs-hello-world-ebs_1a" found

By any chance do the jobs have the same name as the volume? I'm wondering if using nomad status :id without the more specific subcommands nomad volume status :id or nomad job status :id is confusing matters here.

Moving on to some observations:

I've also verified via the AWS console that the volume itself is not attached to an instance, but in the available state.

It looks like all 3 allocs that claim these 3 volumes are complete, so having the volume unattached is a good sign. Do you have any logs from when the volumes were detached? That might give us some insight.

This is a little weird:

$ nomad alloc status 090982bb
No allocation(s) with prefix or id "090982bb" found

I'm not sure how we could get into a state where the job would be GC'd out from under us but the allocation could still be queried by the volume API to give us information like complete status. What I'd want to look at is the job that are claiming the volumes, to see what they are showing for those allocations.

2020-05-26T05:12:23Z CSI failed fingerprinting with error: rpc error: code = Canceled desc = context canceled

Is the aws-ebs0 plugin still running and healthy on all the nodes? Can you provide a nomad plugin status aws-ebs0?

tgross on 27 May 2020

Hey @tgross , thanks for the response.

You're right, that there are 3 volumes that all claim the same volume ID. I was playing with trying to develop my automation, and ended-up creating 3 different volumes for the same EBS volume.

The job was previously called simply csi-ebs-hello-world, I don't think I ever created a job with the other suffixes, so those are probably only in the volume names.

[hashidev] hwilkins ~ $ nomad plugin status aws-ebs0
ID                   = aws-ebs0
Provider             = ebs.csi.aws.com
Version              = v0.6.0-dirty
Controllers Healthy  = 2
Controllers Expected = 2
Nodes Healthy        = 4
Nodes Expected       = 4

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
8b9cd937  ff38c034  nodes       0        run      running  13h29m ago  13h29m ago
e2430e2c  f1b03f8f  nodes       0        run      running  13h29m ago  13h29m ago
94c3e6b7  00f7f888  nodes       0        run      running  13h29m ago  13h29m ago
81d32416  d733c1a3  nodes       0        run      running  13h29m ago  13h29m ago
e1c0f03b  f1b03f8f  controller  11       run      running  13h29m ago  13h29m ago
62504f57  00f7f888  controller  11       run      running  13h29m ago  13h29m ago

However, note that this plugin was broken for some time as I transitioned it to use Nomad ACLs (which I may be doing incorrectly since how to do that isn't documented - the only doc I could find assumed you didn't have Nomad ACLs). I fixed it around 12h ago, and I still can't deregister the volume(s). I'm now wondering if this has something to do with ACLs. I found some more logs floating around on the Nomad leader as I attempt to deregister the volume:

May 27 21:04:35 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:35.726Z [ERROR] nomad.fsm: CSIVolumeDeregister failed: error="volume in use: csi-ebs-hello-world_1a"
May 27 21:04:35 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:35.726Z [ERROR] http: request failed: method=DELETE path=/v1/volume/csi/csi-ebs-hello-world_1a error="volume in use: csi-ebs-hello-world_1a" code=500
May 27 21:04:35 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:35.726Z [DEBUG] http: request complete: method=DELETE path=/v1/volume/csi/csi-ebs-hello-world_1a duration=1.269438ms
May 27 21:04:36 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:36.764Z [DEBUG] http: request complete: method=GET path=/v1/status/leader duration=131.481µs
May 27 21:04:38 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:38.777Z [DEBUG] http: request complete: method=GET path=/v1/status/leader duration=123.815µs
May 27 21:04:40 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:40.549Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=server duration=148.313µs
May 27 21:04:41 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:41.768Z [DEBUG] http: request complete: method=GET path=/v1/status/leader duration=129.428µs
May 27 21:04:43 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:43.778Z [DEBUG] http: request complete: method=GET path=/v1/status/leader duration=121.493µs
May 27 21:04:44 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:44.547Z [DEBUG] nomad: memberlist: Stream connection from=127.0.0.1:33592
May 27 21:04:46 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:46.770Z [DEBUG] http: request complete: method=GET path=/v1/status/leader duration=131.599µs
May 27 21:04:48 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:48.781Z [DEBUG] http: request complete: method=GET path=/v1/status/leader duration=117.261µs
May 27 21:04:50 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:50.550Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=server duration=135.271µs
May 27 21:04:51 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:51.217Z [DEBUG] worker: dequeued evaluation: eval_id=8d257055-07ce-ced6-a1e1-ae7f47ebcaf6
May 27 21:04:51 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:51.217Z [DEBUG] core.sched: CSI plugin GC scanning before cutoff index: index=1733620 csi_plugin_gc_threshold=1h0m0s
May 27 21:04:51 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:51.217Z [ERROR] core.sched: failed to GC plugin: plugin_id=aws-ebs0 error="Permission denied"
May 27 21:04:51 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:51.217Z [ERROR] worker: error invoking scheduler: error="failed to process evaluation: Permission denied"
May 27 21:04:51 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:51.217Z [DEBUG] worker: nack evaluation: eval_id=8d257055-07ce-ced6-a1e1-ae7f47ebcaf6
May 27 21:04:51 hashidev-nomad-<SCRUB> nomad[8186]:     2020-05-27T21:04:51.217Z [WARN]  nomad: eval reached delivery limit, marking as failed: eval="<Eval "8d257055-07ce-ced6-a1e1-ae7f47ebcaf6" JobID: "csi-plugin-gc" Namespace: "-">"

I also see some of this:

May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.412Z [DEBUG] worker: dequeued evaluation: eval_id=9e17f245-8036-101c-a6f7-1ec0a12b61e2
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.412Z [DEBUG] core.sched: node GC scanning before cutoff index: index=1721801 node_gc_threshold=24h0m0s
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.412Z [DEBUG] worker: ack evaluation: eval_id=9e17f245-8036-101c-a6f7-1ec0a12b61e2
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.414Z [ERROR] nomad.volumes_watcher: error releasing volume claims: namespace=default volume_id=csi-ebs-hello-world-ebs_1a error="1 error occurred:
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:         * could not detach from node: node lookup failed: index error: UUID must be 36 characters
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]: "
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.414Z [ERROR] nomad.volumes_watcher: error releasing volume claims: namespace=default volume_id=csi-ebs-hello-world_1a error="1 error occurred:
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:         * could not detach from node: node lookup failed: index error: UUID must be 36 characters
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]: "
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.414Z [ERROR] nomad.volumes_watcher: error releasing volume claims: namespace=default volume_id=test_default_1a error="1 error occurred:
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:         * could not detach from node: node lookup failed: index error: UUID must be 36 characters
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]: "
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.416Z [ERROR] nomad.volumes_watcher: error releasing volume claims: namespace=default volume_id=test_default_1a error="1 error occurred:
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:         * could not detach from node: node lookup failed: index error: UUID must be 36 characters
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]: "
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.416Z [ERROR] nomad.volumes_watcher: error releasing volume claims: namespace=default volume_id=csi-ebs-hello-world-ebs_1a error="1 error occurred:
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:         * could not detach from node: node lookup failed: index error: UUID must be 36 characters
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]: "
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.416Z [ERROR] nomad.volumes_watcher: error releasing volume claims: namespace=default volume_id=csi-ebs-hello-world_1a error="1 error occurred:
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:         * could not detach from node: node lookup failed: index error: UUID must be 36 characters
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]: "
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.417Z [ERROR] nomad.volumes_watcher: error releasing volume claims: namespace=default volume_id=test_default_1a error="1 error occurred:
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:         * could not detach from node: node lookup failed: index error: UUID must be 36 characters
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]: "
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.417Z [ERROR] nomad.volumes_watcher: error releasing volume claims: namespace=default volume_id=csi-ebs-hello-world-ebs_1a error="1 error occurred:
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:         * could not detach from node: node lookup failed: index error: UUID must be 36 characters
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]: "
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:     2020-05-27T21:00:46.418Z [ERROR] nomad.volumes_watcher: error releasing volume claims: namespace=default volume_id=csi-ebs-hello-world_1a error="1 error occurred:
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]:         * could not detach from node: node lookup failed: index error: UUID must be 36 characters
May 27 21:00:46 hashidev-nomad-<SCRUB>> nomad[8186]: "

holtwilkins on 27 May 2020

Hey @tgross , I've sent my plugin job definitions to nomad-oss-debug. Here's my Nomad anonymous ACL policy

namespace "default" {
  policy = "read"

  capabilities = [
    # I want to get rid of this line
    "csi-mount-volume",
    "submit-job",
  ]
}

# View status of Nomad servers
agent {
  policy = "read"
}

# View status of Nomad clients
node {
  policy = "read"
}

# manage Raft cluster status
operator {
  policy = "read"
}

# manage namespace resource governing
quota {
  policy = "read"
}

# read plugins, like the CSI plugin
plugin {
  policy = "read"
}

holtwilkins on 28 May 2020

Hey @tgross , any thoughts on how to workaround this bug? I can't seem to get Nomad to "let go" of these volume definitions because it's still tracking those allocations. And of course none of the allocations exist when I run nomad alloc status on them.

Not sure if it's helpful or not, but I also tried deploying a job to use this volume, since it's theoretically defined, and here's the relevant placement failure:

Constraint "CSI volume csi-ebs-hello-world-ebs_1a has exhausted its available writer claims": 1 nodes excluded by filter

Note that I've also recycled all nodes in the cluster at this point. The only thing I haven't done is totally destroy & recreate the Nomad raft DB from my deployment scripts. I'm fairly confident that will fix this, but I'd really like to figure out how to fix this in case we hit this in production at some point. However, if I can't get this fixed in the next day, a clean wipe of the DB is what I'm going to have to do so I can keep going on my automation here.

holtwilkins on 2 Jun 2020

I got this to happen again. It seems like the problem is related to draining a node. As in, my job using the EBS volume was running on node x, say. I issue a drain on node x, and the job tries to move to node y. However, the node plugin job (which is a system job as prescribed) didn't unmount the volume from the node. Then when I terminated the drained node, nomad still thinks that the volume's writer claims are exhausted, even though it's not claimed by any node (since the node originally claiming it has been destroyed).

Constraint "CSI volume csi-ebs-staticvolume-alt-service-test has exhausted its available writer claims": 1 nodes excluded by filter

holtwilkins on 5 Jun 2020

[hashidev] hwilkins ~ $ aws ec2 describe-volumes --volume-ids vol-00b...
{
    "Volumes": [
        {
            "Attachments": [],
            "AvailabilityZone": "us-east-1c",
            "CreateTime": "2020-05-18T02:39:41.645000+00:00",
            "Encrypted": true,
            "KmsKeyId": "arn:aws:kms:us-east-1:<NUM>:key/<KEY>",
            "Size": 1,
            "SnapshotId": "",
            "State": "available",
            "VolumeId": "vol-00b...",
            "Iops": 100,
            "Tags": [
                {
                    "Key": "Hashistack",
                    "Value": "hashidev..."
                },
                {
                    "Key": "JobName",
                    "Value": "csi-ebs-test-default"
                }
            ],
            "VolumeType": "gp2",
            "MultiAttachEnabled": false
        }
    ]
}
[hashidev] hwilkins ~ $ nomad volume status csi-ebs-staticvolume-hello-world-vol1
ID                   = csi-ebs-staticvolume-hello-world-vol1
Name                 = csi-ebs-staticvolume-hello-world-vol1
External ID          = vol-00b...
Plugin ID            = aws-ebs0
Provider             = ebs.csi.aws.com
Version              = v0.6.0-dirty
Schedulable          = false
Controllers Healthy  = 1
Controllers Expected = 1
Nodes Healthy        = 0
Nodes Expected       = 0
Access Mode          = single-node-writer
Attachment Mode      = file-system
Mount Options        = <none>
Namespace            = default

Allocations
ID        Node ID   Task Group                Version  Desired  Status    Created     Modified
80da1a5b  7c981673  staticvolume-hello-world  7        stop     complete  12h46m ago  4h7m ago
17596645  0e1c4bcf  staticvolume-hello-world  7        stop     complete  18h9m ago   12h46m ago
[hashidev] hwilkins ~ $ nomad volume deregister csi-ebs-staticvolume-hello-world-vol1
Error deregistering volume: Unexpected response code: 500 (rpc error: volume in use: csi-ebs-staticvolume-hello-world-vol1)

holtwilkins on 5 Jun 2020

Faced similar issue when trying to deregister absent volume.

Here's how to reproduce:

Run CSI plugin jobs
Properly register volume
Run some job claiming it
Stop it with -purge flag
Remove CSI plugin jobs
Remove volume resource

Nomad thinks my volume is still there in UI/cli and I'm not able to deregister it:

$ nomad system gc  # has no effect

$ nomad volume status 
Container Storage Interface
ID      Name    Plugin ID          Schedulable  Access Mode
mysql0  mysql0  csi.hetzner.cloud  false        single-node-writer

$ nomad volume deregister mysql0
Error deregistering volume: Unexpected response code: 500 (volume in use: mysql0)

Obviously my bad for doing things in wrong order, but still would be glad to have something like nomad volume deregister -force flag for such cases.

agrrh on 20 Jun 2020

@holtwilkins I'm looking into this, but in the meanwhile Nomad 0.11.3 was shipped with some improvements to the volume claim reaping (ref https://github.com/hashicorp/nomad/blob/master/CHANGELOG.md#0113-june-5-2020) so it might be worth your while to check that out.

tgross on 22 Jun 2020

Thanks @tgross , I've reproduced this same issue on Nomad 0.11.3 shortly after it was released, unfortunately.

holtwilkins on 23 Jun 2020

It seems like the problem is related to draining a node. As in, my job using the EBS volume was running on node x, say. I issue a drain on node x, and the job tries to move to node y. However, the node plugin job (which is a system job as prescribed) didn't unmount the volume from the node.

Ok, so this is probably related to https://github.com/hashicorp/nomad/issues/8232 and https://github.com/hashicorp/nomad/issues/8080

The advice I gave to folks in those issues applies here as well, but that only helps prevent the issue in the future by reducing the risk of triggering the race condition between plugin teardown and volume claim reaping:

Run multiple controller plugins, spread across the cluster such that a node that drains always has a controller to talk to.
Run the node plugins as system jobs, as you're doing
Use the -ignore-system flag while node draining to ensure that the node plugin stays on the host even after it's done draining so as to avoid the race with the volumewatcher.

tgross on 23 Jun 2020

I've also opened https://github.com/hashicorp/nomad/issues/8251 for the nomad volume deregister -force flag suggestion, which is something I've been meaning to do for a while.

tgross on 23 Jun 2020

Hi,

I've also encountered same problem. We have nomad client nodes in ASG and when they got rotated/replaced we've stuck with multiple clients in "ineligible" state even though they are long terminated. Removing job using volume, volume itself and plugin did not help.

Probably worth mentioning is fact, that as part of autoscaling clients are auto drained by lambda before termination proceeds and it worked as expected.

Every time GC is run or one of the servers restarted there are multiple messages like this:
Jun 23 15:44:26 nomad[8303]: 2020-06-23T15:44:26.510Z [ERROR] nomad.fsm: DeleteNode failed: error="csi plugin delete failed: csi_plugins missing plugin aws-ebs0"

I've even changed defaults in servers to:

node_gc_threshold = "30s"
csi_plugin_gc_threshold = "30s"

but it just made messages to appear more often, not only when forcing GC via command line.

Cheers

wasilak on 23 Jun 2020

Is there a workaround for "unsticking" volumes stuck in limbo like this? I have one that's currently attached to a five-days-dead job that I don't know how to reclaim.

jfcantu on 27 Jun 2020

Is there a workaround for "unsticking" volumes stuck in limbo like this? I have one that's currently attached to a five-days-dead job that I don't know how to reclaim.

Update: my five-days-dead job is now over two weeks dead, and the EBS volume is still attached.

To be blunt, I don't know that CSI support can even be called "beta" at this point. The current bugs are severe enough to make CSI effectively unusable, and it doesn't appear from the release notes that any progress was made on it in 0.12.

jfcantu on 8 Jul 2020

Hello, I'm facing a similar issue, is there a release for the force option on nomad 0.11 ?
It seems that http://localhost:4646/v1/volume/csi/id_of_the_volume does not update the write allocs on non existents allocations
It prevents any job of using this volume
The only solution I had was to register the same volume with another id

saez0pub on 8 Jul 2020

I have the same message and can't ~~detach~~ attach my volume which has write allocations linked in volume API but theses allocations does not exists in allocation API:

could not detach from node: node lookup failed: index error: UUID must be 36 characters

error seems to come from https://github.com/hashicorp/nomad/blob/master/nomad/state/state_store.go#L723
throwed by https://github.com/hashicorp/nomad/blob/master/nomad/volumewatcher/volume_watcher.go#L312

Could it be possible NodeID being empty preventing from ~~detaching the volume~~ removing the lost allocations ?
NodeID being could be provoked by a lost node ?

EDIT : my volume isn't attached it is just not claimable

saez0pub on 8 Jul 2020

Is there a workaround for "unsticking" volumes stuck in limbo like this?

To be blunt, I don't know that CSI support can even be called "beta" at this point. The current bugs are severe enough to make CSI effectively unusable, and it doesn't appear from the release notes that any progress was made on it in 0.12.

is there a release for the force option on nomad 0.11 ?

There is not and we won't be backporting that.

tgross on 8 Jul 2020

👍7 👀1

I really can't disagree with you there, @jfcantu. The CSI project had been my primary focus but we were pretty short-handed for it during development. As you might note from some of the other open issues with the storage tag there's constant struggle with the spec itself and plugins that may not be bug-free themselves. I'll admit I'm not particularly proud of the shape it's in. Unfortunately I had to focus on some other projects for the 0.12.0 release cycle, but post-0.12.0 GA I've got some time carved out to spend on this to get the rest of these bugs worked out.

That's great info. Thanks for the update! I'm eagerly anticipating the next steps.

jfcantu on 8 Jul 2020

Hello,

I have just updated to 0.12.0, it seems your fix is in.
I have most of my ghost volumes removed :tada:

However, I am still unable to remove this ghost allocation in volume locked state

# nomad volume status csi-disk-0
ID                   = csi-disk-0
Name                 = csi-disk-0
External ID          = projects/myproject/zones/myzone/disks/data-csi-disk-0
Plugin ID            = gce-pd
Provider             = pd.csi.storage.gke.io
Version              = v0.7.0-gke.0
Schedulable          = true
Controllers Healthy  = 2
Controllers Expected = 2
Nodes Healthy        = 2
Nodes Expected       = 2
Access Mode          = single-node-writer
Attachment Mode      = file-system
Mount Options        = <none>
Namespace            = default

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
88cfffe3  b1042488  csi-disk  47       run      running  3d4h ago   2d2h ago
bed001f0  511b5f41  csi-disk  47       stop     lost     3d14h ago  3d4h ago
# nomad alloc-status 88cfffe3
No allocation(s) with prefix or id "88cfffe3" found
# nomad node status b1042488
No node(s) with prefix "b1042488" found
# nomad volume deregister csi-disk-0
Error deregistering volume: Unexpected response code: 500 (volume in use: csi-disk-0)
# nomad volume deregister -force csi-disk-0
Are you sure you want to force deregister volume "csi-disk-0"? [y/N] y
Error deregistering volume: Unexpected response code: 500 (volume in use: csi-disk-0)

Corresponding log

     2020-07-10T07:09:57.713Z [ERROR] nomad.fsm: CSIVolumeDeregister failed: error="volume in use: csi-disk-0"
    2020-07-10T07:09:57.714Z [ERROR] http: request failed: method=DELETE path=/v1/volume/csi/csi-disk-0?force=true error="volume in use: csi-disk-0" code=500

saez0pub on 10 Jul 2020

Ok, that's an interesting state: the allocation doesn't exist _at all_ but is showing _running_ status for the volume. That gives me a new clue on where to look.

tgross on 10 Jul 2020

Wanted to give a quick status update. I've landed a handful of PRs that will be released as part of the upcoming 0.12.2 release:

#8561 retries controller RPCs so that we take advantage of controllers deployed in a HA configuration.
#8572 #8579 are some improved plumbing that makes volume claim reaping synchronous in common cases, which reduces the number of places where things can go wrong (and makes it easier to reason about).
#8580 uses that plumbing to drive the volume claim unpublish step from the client, so that in most cases (except when we lose touch with the client) we're running the volume unpublish synchronously as part of allocation shutdown.
#8605 improves our error handling so that the checkpointing we do will work correctly by ignoring "you already did that" errors.
#8607 fixes some missing ACLs and Region flags in the Nomad leader

I believe these fixes combined should get us into pretty good shape, and #8584 will give you an escape hatch to manually detach the volume via nomad volume detach once that's merged.

tgross on 7 Aug 2020

🎉2 ❤1

For sake of our planning, I'm going to close this issue. We'll continue to track progress of this set of problems in https://github.com/hashicorp/nomad/issues/8100.

tgross on 10 Aug 2020

I'm still seeing this issue on 12.4. Even all allocations are failed or stopped (or I do garbage collect so they don't exist anymore), the volume status still shows up as these ghost allocations.

I can deregister if I -force, but it doesn't seem to garbage collect these ghost allocations correctly.