Cabal: Intermittent Appveyor failure: "you don't have permission to modify this file"

Created on 19 Oct 2016  路  5Comments  路  Source: haskell/cabal

This kind of failure happens fairly frequently on AppVeyor:

Build log ( C:\Users\appveyor\AppData\Roaming\cabal\logs\base16-bytestring-0.1.1.6.log ):
Registering base16-bytestring-0.1.1.6...
cabal.exe: 'C:\ProgramData\chocolatey\lib\ghc\tools\ghc-8.0.1\bin\ghc-pkg.exe'
exited with an error:
base16-bytestring-0.1.1.6: Warning: haddock-interfaces:
C:\Users\appveyor\AppData\Roaming\cabal\doc\x86_64-windows-ghc-8.0.1\base16-bytestring-0.1.1.6\html\base16-bytestring.haddock
doesn't exist or isn't a file
base16-bytestring-0.1.1.6: Warning: haddock-html:
C:\Users\appveyor\AppData\Roaming\cabal\doc\x86_64-windows-ghc-8.0.1\base16-bytestring-0.1.1.6\html
doesn't exist or isn't a directory
ghc-pkg.exe:
C:\Users\appveyor\AppData\Roaming\ghc\x86_64-mingw32-8.0.1\package.conf.d\package.cache:
you don't have permission to modify this file
cabal: Leaving directory 'C:\Users\appveyor\AppData\Local\Temp\cabal-tmp-836\base16-bytestring-0.1.1.6'
Installed cryptohash-sha256-0.11.100.1
Configuring hashable-1.2.4.0...

https://ci.appveyor.com/project/23Skidoo/cabal/build/%232365%20(master)

The obvious explanation is insufficient locking, but it's not altogether clear why: we DO take an MVar lock for copying and registering. Maybe there is some sort of concurrent reader/writer problem going on.

continuous-integration windows

Most helpful comment

@arybczak and I have a diagnosis, and @arybczak is working on a solution in ghc/ghc-pkg and possibly also a workaround to use with existing ghc versions.

So, cabal-install and stack are careful to only run one ghc-pkg register at a time, however this is not enough to avoid the problem.

The failing scenario goes like this:

  • ghc opens the package.cache file with a share mode that allows other readers and writers but not processes that will delete the file
  • ghc-pkg tries to do an atomic rename to replace the package.cache file. This involves opening the target file with a "delete" share mode. This conflicts with the share mode that ghc used when reading the file, and so the open/rename fails.

This problem cannot be solved simply by changing the share mode. It would not be ok to have ghc open the file with a share mode that allows delete. That would mean ghc-pkg can overwrite the file, but would instead mean that ghc sees a corrupted version of the file (it'd either appear truncated or it'd get a read error).

This problem cannot be solved by using the atomic overwrite trick, because that simply does not work on Windows. Windows supports atomic rename but does not allow one process to continue to read the old file while another has replaced the file with new content.

The solution is proper reader/writer file locking. Both ghc and ghc-pkg have to cooperate to do reader/writer locking. This also will allow us to do the locking properly, making it actually safe to run ghc-pkg registration updates concurrently which ultimately will be better as all tools that call ghc-pkg can benefit from that.

A workaround in cabal-install/stack is to switch from simply excluding writers from each other, to do reader/writer locking for registration, where configure/build counts as a reader. This will significantly delay when registration can be done, and may reduce overall build parallelism.

All 5 comments

@arybczak and I have a diagnosis, and @arybczak is working on a solution in ghc/ghc-pkg and possibly also a workaround to use with existing ghc versions.

So, cabal-install and stack are careful to only run one ghc-pkg register at a time, however this is not enough to avoid the problem.

The failing scenario goes like this:

  • ghc opens the package.cache file with a share mode that allows other readers and writers but not processes that will delete the file
  • ghc-pkg tries to do an atomic rename to replace the package.cache file. This involves opening the target file with a "delete" share mode. This conflicts with the share mode that ghc used when reading the file, and so the open/rename fails.

This problem cannot be solved simply by changing the share mode. It would not be ok to have ghc open the file with a share mode that allows delete. That would mean ghc-pkg can overwrite the file, but would instead mean that ghc sees a corrupted version of the file (it'd either appear truncated or it'd get a read error).

This problem cannot be solved by using the atomic overwrite trick, because that simply does not work on Windows. Windows supports atomic rename but does not allow one process to continue to read the old file while another has replaced the file with new content.

The solution is proper reader/writer file locking. Both ghc and ghc-pkg have to cooperate to do reader/writer locking. This also will allow us to do the locking properly, making it actually safe to run ghc-pkg registration updates concurrently which ultimately will be better as all tools that call ghc-pkg can benefit from that.

A workaround in cabal-install/stack is to switch from simply excluding writers from each other, to do reader/writer locking for registration, where configure/build counts as a reader. This will significantly delay when registration can be done, and may reduce overall build parallelism.

@dcoutts

@arybczak and I have a diagnosis, and @arybczak is working on a solution in ghc/ghc-pkg and possibly also a workaround to use with existing ghc versions.

Nice, this bug is super-annoying.

Relevant GHC ticket filed by @arybczak: https://ghc.haskell.org/trac/ghc/ticket/13194

That GHC bug seems to have been fixed with 8.2, but our AppVeyor builds are on 8.0. On the other hand I don't think I've seen this bug happen, so maybe it stopped of its own accord?

Was this page helpful?
0 / 5 - 0 ratings