Nomad: CSI volume keeps references to failed allocations

Created on 10 Jun 2020  路  9Comments  路  Source: hashicorp/nomad

Nomad version

0.11.3

Issue

  1. I build an own CSI plugin which mounts volumes from Gluster
  2. I run the CSI plugin and registered a new volume
  3. I run a job which uses the volume

I don't know what I have done afterwards. Probably, I removed the volume although the job was still in pending. The problem is now, that I cannot remove the volume because it is still in pending and it stays there.

stuck

I have stopped the plugin and the job but the allocation is still there. It is only visible in the UI. When I query the allocation with the nomad client I cannot find it.

I tried to run gc and restart the client and server, but nothing happens.

How can I remove the allocation?

themstorage typbug

Most helpful comment

Wanted to give a quick status update. I've landed a handful of PRs that will be released as part of the upcoming 0.12.2 release:

  • #8561 retries controller RPCs so that we take advantage of controllers deployed in a HA configuration.
  • #8572 #8579 are some improved plumbing that makes volume claim reaping synchronous in common cases, which reduces the number of places where things can go wrong (and makes it easier to reason about).
  • #8580 uses that plumbing to drive the volume claim unpublish step from the client, so that in most cases (except when we lose touch with the client) we're running the volume unpublish synchronously as part of allocation shutdown.
  • #8605 improves our error handling so that the checkpointing we do will work correctly by ignoring "you already did that" errors.
  • #8607 fixes some missing ACLs and Region flags in the Nomad leader

I believe these fixes combined should get us into pretty good shape, and #8584 will give you an escape hatch to manually detach the volume via nomad volume detach once that's merged.

All 9 comments

The problem seems to be that allocations from failed jobs are not removed from the volume. I could reproduce the problem with these steps:

  1. Register a volume
  2. Run a job which uses the volume
  3. The job failed for some reason and is in state pending or failed.
  4. Stop the job with nomad stop -purge

=> The volume still holds the allocation for the job.

In my case I registered a volume which could not be mounted because a volume for the external id does not exist.

Hi @mkrueger-sabio! Thanks for opening this issue. This is definitely unexpected behavior and I'll be digging into this.

It is only visible in the UI. When I query the allocation with the nomad client I cannot find it.

This is an interesting detail. ~Can the volume be claimed by a new allocation at this point, or does the Nomad server still think it has a claim?~ Nevermind, I see the below:

In my case I registered a volume which could not be mounted because a volume for the external id does not exist.

So what we end up with is a Nomad-registered volume, that has no physical counterpart, but because of that it can't clean up the allocs that claimed it? It shouldn't be possible to write the claim in that case, but that may be where the bug is.

I encountered quite the same problems but with "running" allocations that i stopped.
https://github.com/hashicorp/nomad/issues/8285

Running into the same issue. Volumes reference a non-existent allocation and are unable to be removed. Not sure of anyway to manually force these volumes outta existance (the deregister force option doesn't work unfortunately) so I'm assuming they will likely be stuck there until a fix is released.

Hey folks, just FYI we shipped a nomad volume deregister -force in 0.12.0 which might help you out here. In the meantime, we're getting ramped up to wrap up these remaining CSI issues over the next couple weeks so hopefully we should have some progress for you soon.

Thank, this helps to remove a lot of volumes.

I still have the problem that I cannot remove a volume which has a pending allocation. But the allocation does not exist anymore.

Understood. I'm pretty sure I know what's going on there and I'm working on a fix for this set of problems.

Wanted to give a quick status update. I've landed a handful of PRs that will be released as part of the upcoming 0.12.2 release:

  • #8561 retries controller RPCs so that we take advantage of controllers deployed in a HA configuration.
  • #8572 #8579 are some improved plumbing that makes volume claim reaping synchronous in common cases, which reduces the number of places where things can go wrong (and makes it easier to reason about).
  • #8580 uses that plumbing to drive the volume claim unpublish step from the client, so that in most cases (except when we lose touch with the client) we're running the volume unpublish synchronously as part of allocation shutdown.
  • #8605 improves our error handling so that the checkpointing we do will work correctly by ignoring "you already did that" errors.
  • #8607 fixes some missing ACLs and Region flags in the Nomad leader

I believe these fixes combined should get us into pretty good shape, and #8584 will give you an escape hatch to manually detach the volume via nomad volume detach once that's merged.

For sake of our planning, I'm going to close this issue. We'll continue to track progress of this set of problems in https://github.com/hashicorp/nomad/issues/8100.

Was this page helpful?
0 / 5 - 0 ratings