Stack: Failing to unpack GHC (sometimes)

Created on 19 Jun 2019  Â·  13Comments  Â·  Source: commercialhaskell/stack

General info

The last couple of days I'm running into an issue where untaring of GHC fails:

Preparing to download ghc-8.6.5 ...
ghc-8.6.5: download has begun
ghc-8.6.5:   17.49 MiB / 175.83 MiB (  9.95%) downloaded...
ghc-8.6.5:   47.24 MiB / 175.83 MiB ( 26.87%) downloaded...
ghc-8.6.5:   76.67 MiB / 175.83 MiB ( 43.61%) downloaded...
ghc-8.6.5:  107.02 MiB / 175.83 MiB ( 60.86%) downloaded...
ghc-8.6.5:  136.92 MiB / 175.83 MiB ( 77.87%) downloaded...
ghc-8.6.5:  167.14 MiB / 175.83 MiB ( 95.06%) downloaded...
ghc-8.6.5:  175.83 MiB / 175.83 MiB (100.00%) downloaded...
Downloaded ghc-8.6.5.
Unpacking GHC into /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.temp/ ...
Received ExitFailure (-15) when running
Raw command: /bin/tar Jxf /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.tar.xz
Run from: /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.temp/


Error: Error encountered while unpacking GHC with
         tar Jxf /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.tar.xz
         run in /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.temp/

       The following directories may now contain files, but won't be used by stack:
         - /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5.temp/
         - /home/vsts_azpcontainer/.stack/programs/x86_64-linux/ghc-8.6.5/

       For more information consider rerunning with --verbose flag

Steps to reproduce

I don't have exact steps, but the code and CI builds are all open and available.

The code is available at: https://github.com/magthe/ci-test-hs/ (the branch Add Azure Pipelines)

Examples of CI builds at:

Building image locally (docker build -t foo:0 .) first failed, then I followed the suggestion and added --verbose, then it succeeded. Howerver, the CI builds keep failing sporadically.

Expected

I'm used to stack setup working like a charm.

Actual

Well, see above.

Stack version

The version I'm using on VMs is the pre-built 2.1.1 downloaded from GitHub, e.g. https://github.com/magthe/ci-test-hs/blob/153ca80eaca23eae6444abdbf32e0e3b91240d76/.travis.yml#L15

The version used in container, including when building images, is the one that's found in fpco/stack-build:lts-13 (I believe that's been fpco/stack-build:lts-13.25 and thus stack 2.1.1)

Method of installation

See above.

bug

Most helpful comment

We have nothing automated (though I wish we did). I generated a Linux executable and uploaded it to S3, and started using it for typed-process. If you'd like to use it too, it's available at

https://s3.amazonaws.com/www.snoyman.com/stack-1ed71cae36a64365ead72da1427e1685ccec8246.bz2

Relevant commit: https://github.com/fpco/typed-process/commit/af31b7bb7ba78e63694b7be9d7a12548518a92e9#diff-354f30a63fb0907d4ad57269548329e3

All 13 comments

I'm also seeing this issue with Snapcraft which uses Multipass to build snap packages. It happens every time on a fresh build. https://forum.snapcraft.io/t/haskell-stack-snaps-help/11909

I'm really confused by this one, and would love to hear some thoughts from others. I can't see any reason why a SIGTERM would be sent to Stack, what process would be sending it, or what changes in the Stack.Setup codepath could generate this difference.

I _am_ able to reproduce.

@jamesdbrock provided a Dockerfile for reproing this in #4889, but it's not a reliable repro. I'm not familiar with Snapcraft @dmp1ce. Do you think you'd be able to put together a reliable Docker-based repro for easier testing?

It isn't easy to get snapd installed in a docker container. snapd is needed to install snapcraft which in turn uses multipass to create a virtual machine for building snap images. Probably the easiest thing to do is run snapcraft on a spare computer running Ubuntu. Multipass might work in LXD but I'm not sure because multipass requires a KVM device.

On Ubuntu the steps would be:

  • Ensure snapd is running (sudo apt install snapd)
  • Install snapcraft (sudo snap install snapcraft --classic)
  • Get basic snap of a stack project (git clone https://github.com/dmp1ce/snapcraft-stack-example.git)
  • Try to build project with snapcraft (snapcraft)

I tried to look into stack setup with strace and I got this glimpse.

strace -e %signal stack setup --verbose
2019-06-24 00:20:51.863074: [debug] Unpacking /root/.stack/programs/x86_64-linux/ghc-8.6.5.tar.xz
2019-06-24 00:20:51.864394: [debug] Run process within /root/.stack/programs/x86_64-linux/ghc-8.6.5.temp/: /bin/tar Jxf /root/.stack/programs/x86_64-linux/ghc-8.6.5.tar.xz
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
tgkill(14, 16, SIGPIPE)                 = 0
kill(21, SIGTERM)                       = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=21, si_uid=0, si_status=SIGTERM, si_utime=21, si_stime=179} ---
2019-06-24 00:21:05.061188: [error] Received ExitFailure (-15) when running
Raw command: /bin/tar Jxf /root/.stack/programs/x86_64-linux/ghc-8.6.5.tar.xz
Run from: /root/.stack/programs/x86_64-linux/ghc-8.6.5.temp/



md5-0734ba7d0c55a85d3141eef715690956



strace -f -e %signal,%process stack setup

but I cannot reproduce the error again.

That strace output is just what we needed! @psibi and @nh2 provided the missing piece of insight: the SIGTERM is coming from Stack itself. This reminded me of a test suite bug I fixed recently:

https://github.com/snoyberg/conduit/commit/20fd6e2204e84ecf9c1ce99ed3778210a12545ff

Which ultimately led to this PR: #4902

I'd appreciate if those affected by this bug would be able to test this out and confirm that it fixes the problem for them.

I'm not that familiar with the whole stack/stackage setup, so I'll have to ask. Are there builds with this change included available somewhere, e.g. using some specific tag on DockerHub or in an artefact store on the CI system you use?

We have nothing automated (though I wish we did). I generated a Linux executable and uploaded it to S3, and started using it for typed-process. If you'd like to use it too, it's available at

https://s3.amazonaws.com/www.snoyman.com/stack-1ed71cae36a64365ead72da1427e1685ccec8246.bz2

Relevant commit: https://github.com/fpco/typed-process/commit/af31b7bb7ba78e63694b7be9d7a12548518a92e9#diff-354f30a63fb0907d4ad57269548329e3

Yes, that'll make it a little easier to try it out on CI services, since that's where I've observed the issue most frequently.

Running stack build with LTS 12.26 in Docker fpco/stack-build-small (3523caf4fba2) always fails with the mysterious -15 error message, even though execution of the corresponding tar command succeeds when done manually.

Stack build provided by @snoyberg heals my woes, and my app builds without issues.

I'm completely failing to reproduce without the fix for the last few days... don't ask me what's different on the various CI services I'm experimenting with.

I can't even reproduce using @SkyWriter 's recipe above :confused:

I generated a Linux executable and uploaded it to S3, and started using it for typed-process.

A new official release would be great – due to this issue, our CI builds fail more often than not nowadays.

@snoyberg is there anything I can do (regarding maintenance tasks) to help make the new release happen faster? My email is in my Github profile.

@neongreen The only holdup right now is that 4938-non-ascii-module-names is failing for macOS on CI (see https://github.com/commercialhaskell/stack/pull/4939#issuecomment-510066631). If there's anything you can do to help with that, it would push things along.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mgsloan picture mgsloan  Â·  3Comments

Toxaris picture Toxaris  Â·  4Comments

mgsloan picture mgsloan  Â·  3Comments

symbiont-joseph-kachmar picture symbiont-joseph-kachmar  Â·  3Comments

fizruk picture fizruk  Â·  3Comments