Stack: Avoid writing large amounts of data to system temp directory

Created on 16 Sep 2015 · 19Comments · Source: commercialhaskell/stack

Reported by joseph07 in #haskell. stack setup failed due to not having enough space on the device, despite there being 38GB. According to pdxleif, archlinux defaults to using tmpfs, which stores it in ram, and so is limited in size.

Here's where a system temp directory is used as a place to unpack the GHC tarball: https://github.com/commercialhaskell/stack/blob/c0525a2431b7e383e06473efacee1e2084abf2d0/src/Stack/Setup.hs#L760

It'd also likely be worthwhile to take a look at other usages of the system temp directory. On one hand, it seems odd to have a system configuration that has such limited space in tmp. On the other hand, I see few downsides to instead storing such temp directories somewhere in ~/.stack (and the big upside of resolving this issue)

awaiting pull request enhancement

Source

mgsloan

All 19 comments

Their article says tmpfs on Arch defaults to having its max size be half total RAM. https://wiki.archlinux.org/index.php/Tmpfs#Examples

LeifW on 16 Sep 2015

The downside to not using a proper temp directory is that if the process fails halfway through, the disk space is never reclaimed.

snoyberg on 16 Sep 2015

👍1

If that's a concern, what about using say https://hackage.haskell.org/package/temporary-1.2.0.3/docs/System-IO-Temp.html#v:withSystemTempDirectory ? Or are you referring to the Haskell process crashing?

LeifW on 16 Sep 2015

I'm referring to the Haskell process crashing (or system simply shutting down). And that's the function we're using already; the problem is Arch's setup not providing enough disk space in the temp directory.

snoyberg on 16 Sep 2015

👍1

I think other distros using systemd will have /tmp be tmpfs, too (edit: Sorry, apparently that's not the case. e.g. I don't think RedHat / CentOS have this; maybe just Arch and CentOS?). E.g. Fedora since version 18. http://fedoraproject.org/wiki/Features/tmp-on-tmpfs

LeifW on 16 Sep 2015

The downside to not using a proper temp directory is that if the process fails halfway through, the disk space is never reclaimed.

I figured we'd have stack check for that and free it up. This way, the only case where disk space is never reclaimed is if stack isn't run again. Could get a bit tricky with concurrent stack executions, though..

mgsloan on 16 Sep 2015

That Fedora page mentions a workaround: getTemporaryDirectory respects $TMPDIR, so one could override the default of /tmp on systems that have issues with TMPDIR=/var/tmp stack ... That Fedora page suggests applications needing to write large files to /tmp should use /var/tmp instead, or maybe "XDG user-dir's Download directory". It sounds like there's a cron job cleaning out /var/tmp, too?

If indeed /var/tmp is supposed to become some standard alternative to /tmp for large files; it makes one think there ought to be similar env vars for referencing it, and a corresponding method in System.Directory for getting its value in some cross-platform way?

LeifW on 16 Sep 2015

I've just noticed that there are a couple other related issues:

https://github.com/commercialhaskell/stack/issues/623

https://github.com/commercialhaskell/stack/issues/841 (this one mentions the TMPDIR workaround)

mgsloan on 18 Sep 2015

http://fedoraproject.org/wiki/Features/tmp-on-tmpfs#Detailed_Description says /tmp on tmpfs is becoming more common.
I was thinking it would go away on its own as the average system gradually got more memory, but I forgot about VPS's. Also, I'm wondering when that space is reclaimed - filling up your RAM with large files would be unhelpful while you're running a compiler.

Not sure of a cross-platform way of specifying /var/tmp; seems like that functionality might belong in System.Directory, anyways. Perhaps we could just check if it exists. Proposed behaviour: Check for $TMPDIR (to still allow override of location), check if /var/tmp exists (and is writeable), and finally fall back to getTemporaryDirectory (/tmp).

Though, articles I read seem to dissuade /tmp and /var/tmp (especially when not being used to communicate between processes) due to security concerns, and suggest something like XDG user dirs (or somewhere in ~/.stack seems perfect for that).

LeifW on 18 Sep 2015

I just ran into this issue. I have a VM setup where the root fs is mounted readonly and (therefore) /tmp is mounted as tmpfs with limited amount of space available. There is sufficient space available in /var/tmp though.

I know of at least one big installation where many systems are booted from the same image which is mounted over NFS. In such cases /tmp is mounted as a RAM file system. Though usually in enterprise settings the available RAM is usually enough to provide sufficient space but I guess there may be cases where /tmp is limited in space. However /var/tmp should usually have enough space. Preferring /var/tmp over /tmp might be a good idea.

harendra-kumar on 9 Nov 2015

Unfortunately, ~/.stack is usually NFS for any of our user accounts and thus a lot slower than /tmp/...

It sounds like there's a strong need for a portable way of finding a tmp directory that's on local disk, not in memory, and not on the network.

rrnewton on 13 Nov 2015

There doesn't seem to be any perfect solution to this, but I think writing big temporary files under ~/.stack is going to be the best default.

Advantages:

/tmp on tmpfs is increasingly common
modern practise is /tmp and /var/tmp recommended only for inter-process communication
no portable solution to find the "best" temporary location for every case, so let's go with the simplest option

Disadvantages:

Will be slower if $HOME is on NFS, but this will only effect stack setup so it won't be a drag on every-day performance
Users with a small quota on $HOME may have trouble, but they already do since ~/.stack gets big fast, so this will only make it happen a bit sooner

We must ensure any error messages about disk space are clear and offer workarounds.

borsboom on 30 Nov 2015

👎1

Sounds good.

Can we first check the amount of space available in a given volume before we try another one? We can try /tmp, $HOME, /var/tmp in that order based on the amount of space available. That way we will be able to try our best and bail out only if it is not at all possible to install. Space check will also allow us to provide a way to gracefully exit with an error message rather than trying and running out of space.

harendra-kumar on 30 Nov 2015

Another reason to manage our own tmp directory is that we can make the process invocations in the verbose log copy+pasteable. Currently, some commands don't work due to the temporary files being eagerly deletd. See the first point of https://github.com/commercialhaskell/stack/issues/1596 for more info.

mgsloan on 3 Jan 2016

Note that you need an absolute path e.g. TMPDIR=/home/user/tmp in some instances

maxzinkus on 13 Mar 2016

I've fixed this! The downside is that now there will be dirs leftover that the user will need to cleanup (unless they try to do setup again). This is even the case where stack gets to handle the exceptions, etc. The reason for this is that configure errors + etc say things like Seeconfig.log' for more details` - it's convenient for the user to be able to manually run the command:

2016-08-08-192859_583x115_scrot

mgsloan on 9 Aug 2016

I am now experiencing the exact opposite issue: stack setup fails because there is no space left on the disk that contains $LOCALAPPDATA\Programs\stack, but I have other disks with plenty of space. My home directory is on a small SSD that contains only my operating system and a few critical files. Unfortunately it is not possible on Windows to move the home directory after installation, so I am stuck with $LOCALAPPDATA\Programs\stack living on the small disk.

I tried the following:

Set $STACK_ROOT to be on a different disk. This works fine for most stuff, the index and snapshots are stored there, but not the download.
Set $TMP to a directory on a different disk, as indicated in the FAQ. However, Stack still downloads to $LOCALAPPDATA\Programs\stack.
As a hack, I tried to set $LOCALAPPDATA to a custom temporary directory when running stack setup. This allowed the download to complete, but then Stack went on to actually install GHC in that location, not in $STACK_ROOT.

To be able to support both the original scenario in this issue and the case of a small (or slow for that matter, as mentioned before here too) home directory, it would be nice if the download location were configurable. As there is TMPDIR already, would that be a good way to override the default download location?

And shouldn’t GHC be installed to $STACK_ROOT?