I see this error quite regularly after having updated to the current git head. It seems like current development versions of cabal-install don't work well when multiple instances are running in parallel, i.e. there is probably some race condition around the code that creates this lock/directory.
Just to add some more information: in my use-case cabal fetch is called from a process that's been started with cabal new-run, so there are definitely 2+ copies of cabal running simultaneously.
We definitely still have a couple issues regarding concurrent execution of multiple cabal processes working against the same ~/.cabal folder :-(
Looks like this is caused by hackage-security.
/cc @edsko
which failure mode do we want btw? shall cabal block until the lock is released, or just fail (w/o leaving locks behind unless the process is crashed abnormally, e.g. by SIGKILL)?
apt-get fails in this case, so perhaps we should do as well?
does this mean that the use case of @peti will now just lead to a more standard/reliable sort of failure?
Making it fail doesn't really solve @peti's problem; if cabal new-run calls a process that transitively calls cabal again you will deadlock/unconditionally fail with this logic. Would be a lot better to reduce the amount of time you actually need to take out the lock, and make lock-free reads possible.
@ezyang sure, but why would cabal new-run hold a lock over the package db after it had already completed its build-process and passed control over to the executed build artifact?
How about multiple "cabal fetch" processes running simultaneously? Is that going to create random failures?
Btw, cabal fetch is an interesting problem, since we don't have yet a content-indexed package cache (i.e. one where the filename is the sha256sum), so it poses a bit of a challenge with generalized package indices which allow package mutation (Hackage's primary index doesn't allow that by policy; but we need support for this for other kind of indices); as in this case we may have two cabal fetch operations running concurrently which may have an inconsistent view about which sha256sum is the correct one.
Most helpful comment
Making it fail doesn't really solve @peti's problem; if
cabal new-runcalls a process that transitively callscabalagain you will deadlock/unconditionally fail with this logic. Would be a lot better to reduce the amount of time you actually need to take out the lock, and make lock-free reads possible.