Stack: RFC: A new script command

Created on 24 Nov 2016 · 39Comments · Source: commercialhaskell/stack

I've discussed this in some part with various people, so I thought I'd get the thoughts down in writing. Currently, script interpreter usage of Stack has (IMO) three limitations:

It's slower than just using runghc directly. Some of that is inherent: it needs to check for the presence of GHC and libraries, for example, which runghc does not. Nonetheless, there is more overhead than necessary. From my understanding, settings file checking is a significant component of this (and also likely something we don't want to impact the behavior of script usage).
Scripts violate some reproducibility rules, because:
- They don't require that a resolver be set
- They don't require that all packages be listed
- They can be impacted by other packages installed
These trade-offs all make sense when considering the runghc feature in general, but are less appetizing when applied to reproducible scripts.

There are multiple options that should almost always be set but aren't enforced: --install-ghc, a --resolver as implied above.

Proposal: let's add a new Stack command: stack script. It would have the following semantics:

Require that a --resolver be set
Hide all packages from the package database except those specified with --package. (Perhaps make an exception for wired in packages, or at the very least base.)
Automatically imply --install-ghc
Ignore all settings files

As an alternative (or addition) to the above, we could consider:

Making these behaviors options available to runghc (e.g., --no-parse-settings-file)
Don't actually add a script command, but add support to the script interpreter bits of Stack to recognize a script command and do the right thing.

script interpreter discuss

Source

snoyberg

👍6

Most helpful comment

Additional nice-to-have feature for this: a --compiled flag which, instead of using runghc, will compile the script and then run the resulting executable.

snoyberg on 8 Jan 2017

👍2

All 39 comments

i prefer the first proposal; a new command. it will be cleaner indication that it is doing something special.

silky on 24 Nov 2016

I think the script command is a good proposal, better than changing runghc.

chrisdone on 24 Nov 2016

👍1

Sounds good to have a packaged command since this is a common enough use case. Users will not have to scratch their head and consult documentation on what all options to specify. Making this part of runghc will not solve the complexity of too many options (we already have tricky ways to specify options, for example see #2486).

I also do not like the second alternative of a magical command only visible to interpreter, it will cause confusion.

How about using stack run or stack runsure instead of stack script? Just a quick thought.

harendra-kumar on 24 Nov 2016

I think stack run would be confused with cabal run, which (IIRC) builds an executable in a project and runs it.

I agree with the overall consensus that an explicit new command is better than a bunch of new options or a secret command.

snoyberg on 24 Nov 2016

I was planning on hacking on this next week, but I was a little over-anxious to hit it, so: #2806. Still needs work, but may be worth poking at to see if the functionality meets what we're all expecting.

snoyberg on 24 Nov 2016

Do you think a shorter way of listing packages can be introduced? Instead of including as many --package as needed, perhaps a coma separated list ? Or a special comment in the haskell file, something such as:

#!/usr/bin/env/stack
-- resolver: lts-...
-- packages: foo bar baz

guibou on 24 Nov 2016

@guibou I've actually set this up already such that the following works:

#!/usr/bin/env stack
-- stack script --resolver lts-7.9 --package "stm async"
import Control.Concurrent.Async
import Control.Concurrent.STM

main = putStrLn "Hello World!"

I can see advantages in both directions on your request. The current script approach is very nice, because it's a direct translation of what you'd do on the command line. There's nothing extra that needs to be learnt. On the other hand, a more explicit syntax may be nice.

Given the ability to give a space-separated (or maybe also comma-separated as you suggested? seems unobjectionable to me) list of packages, do you think that reduces the pain enough to keep the syntax as-is?

snoyberg on 25 Nov 2016

@snoyberg The space separated list inside --package(s) is alright.

Another feature request, script needs a way to fallback to ghci once the script is done executing (or if there is no main method). I'm always using ghci as a interface for my scripts when they need some kind of user interaction, mostly to use the auto completion, the type checker and avoid writing a CLI parser.

guibou on 25 Nov 2016

@guibou you can use stack exec ghci command in interpreter comment. See #2788 .

harendra-kumar on 25 Nov 2016

@harendra-kumar Yes but, as far as I know, it suffers from the same issues about reproducibility discussed in this thread.

guibou on 25 Nov 2016

I'm not following the connection between this and ghci. Keep in mind that I almost never use ghci myself, so it's almost certain I'm just missing something very obvious.

snoyberg on 25 Nov 2016

Even I don't use ghci but it seems quite a few people use it. This is about starting ghci instead of running the script via the interpreter comment. It is already supported by using stack exec ghci command instead of stack runghc or stack script in the comment. But I guess @guibou wants reproducibility with ghci as well.

harendra-kumar on 25 Nov 2016

Re speed, here are some timings I posted yesterday.

As I mentioned there, and perhaps related to this stack ghci discussion, it would be lovely if stack ghc script.hs also used the stack script options, so that compiling a stack script just works.

simonmichael on 25 Nov 2016

Interesting point on a ghc command... maybe we should just allow a flag to script to switch its behavior to instead compile? It's also possible to try and apply the optimization I made to the exec command, but it makes some assumptions that won't generally apply (e.g., that only packages present in a snapshot can be specified).

I also pushed a commit to the PR as a WIP commit, which begins a major refactoring of the BuildConfig and EnvConfig types to allow for the possibility of no config files and no local databases. It's mostly busywork, but certainly adds some complexity, and I'm not convinced the complexity it worth it. By contrast, we can just go with the original implementation I wrote, and add special logic to the build command that, when given a flag, it will ensure that no local packages are specified.

Any feedback on which of those approaches I should take? Especially CCing @mgsloan @borsboom and @Blaisorblade from an ongoing-maintenance perspective.

snoyberg on 26 Nov 2016

I think a requirement for such a command is that it should be possible to just write

#/usr/bin/env stack script

and make it dwim.

I'd go as far as to ask for a different prelude in this case as well, to cut down on the import boiler plate. It should be possible to write a "hello world" haskell script in <5 lines.

alexanderkjeldaas on 26 Nov 2016

That's unfortunately impossible, there's no standard behavior across shells for what happens when multiple arguments are provided. That's why we've been using the special syntax until now.

An alternate prelude is almost certainly far too opinionated a decision to make here.

snoyberg on 26 Nov 2016

#!/usr/bin/env stack
-- stack script --resolver lts-7.9 --package "stm async"
import Control.Concurrent.Async
import Control.Concurrent.STM

main = putStrLn "Hello World!"

The above script violates DRY. Some solutions:

Imports should be implied by searching for the first package with the first module to export a function (thus utilizing the --package "stm async" part)
Packages are allowed to export a "main" module, which will be imported by default for all packages from --package.
--package is removed, and import Foo from "base" used.

Stack could have a global default resolver.

#!/usr/bin/env stack
import Turtle from "turtle"

main = putStrLn "Hello World!"

not sure how we can get rid of main, but at least that's slightly more succinct.

alexanderkjeldaas on 26 Nov 2016

Overall, I like the idea of the refactoring to make it possible to use with no config files in a somewhat general way. It does seem like something that will be useful to many people.

borsboom on 29 Nov 2016

@alexanderkjeldaas These features would be nice and be more "scripty", but would force Stack to parse the Haskell first to figure out the packages it's using and also change some of the behaviour of the compiler. We're sticking with "vanilla" GHC, but may be worth making a separate issue to discuss this, separate from the current proposal. BTW there is existing syntax with the PackageImports extension to specify which package to load a module from using import "turtle Turtle.

borsboom on 29 Nov 2016

@borsboom Stack doesn't actually have to parse the file. Instead if could use https://hackage.haskell.org/package/HFuse to lazily mount a full stackage copy, so all packages are available immediately (windows not supported).

alexanderkjeldaas on 29 Nov 2016

But would that solution actually work? I might be missing details, but it's far from clear.

"Windows not supported" probably means core stack wouldn't do it, but you can always write additional commands.

But I guess you'd need to mount (via HTTP? FTP? What else) a compiled snapshot of full Stackage, together with its ghc-pkg package database and more, and then download the binaries. This is assuming they'd work, and some wouldn't, since the build is supposed to customize those to your machine, see e.g. the ghc-paths package (maybe it'd work for a fixed distribution?).
Or you'd have to merge all the sources, except i don't think that would actually work.

Don't misunderstand me: if that works it'd be amazing and you could go around giving talks about it. But the effort and questions involved seem to go go beyond "let's just give stack devs a tip".

Blaisorblade on 29 Nov 2016

@Blaisorblade sure it might require work :-)

the build is supposed to customize those to your machine

it this is true, then that's an issue in itself. I think packages that don't have deterministic builds should be marked somehow in stackage, to avoid caching. The same applies to nixpkgs or any other distribution system for pre-compiled packages.

But I guess you'd need to mount (via HTTP? FTP? What else) ..

This is what HFuse does - mounts a file system that calls back into a daemon that implements the file system. So no HTTP/FTP, but stack would implement a file system while running the build, yes.

alexanderkjeldaas on 29 Nov 2016

@alexanderkjeldaas Note that this extends to GHC itselfs whose build-time configure tries to detect various aspects of the environment and select the optimal entry points into libc/libgmp/etc and may affect what becomes inlineable or not at the .hs level. And to make it worse, we've got quite a few autoconf-based (or even Setup.hs-based ones) on Hackage which detect the current cpu/architecture (some of the crypto packages do that) at compile-time (rather than generating code that makes the choice at runtime).

hvr on 29 Nov 2016

@hvr regarding the CPU architecture detection stuff, this is handled by distros by forcing a compatible one. For example we had the i386 -> i586 change years ago in linux distributions - the changes were basically which CPU architecture to compile against.

for libc/libgmp it _is_ hard. I don't know if it is possible to cache these unless at least the major version of the libraries are part of the cache key.

But note that the fuse based solution works partly without any sort of caching as well. stack can block the file system call and compile packages on demand. the package database might be a challenge, but if some packages are marked "can't be cached", but the package database is complete, then it could work.

alexanderkjeldaas on 29 Nov 2016

In some cases, you might have to block calls that try to list directory
contents, and predict what the contents are going to be.
And the ghc-pkg DB needs to be prepopulated in any case.
All in all, this looks very much like a proper research project. I'd be
happy to be proven wrong by somebody actually doing this work.

And BTW I'm confident this is harder than parsing Haskell files–if that
were the only goal, this would be IMHO a bad idea. But a development
environment in such a setup (with ghci, Haddock and everything), OTOH,
could be very cool, even if it required using Nix to avoid some of the
issues mentioned.

In any case, we're certainly derailing the original conversation, this
needs to be elsewhere. A separate issue was proposed above for some of the
goals. This hfuse-based setup sounds more like a separate project IMHO.

On Nov 29, 2016 11:48, "Alexander Kjeldaas" notifications@github.com
wrote:

@hvr https://github.com/hvr regarding the CPU architecture detection
stuff, this is handled by distros by forcing a compatible one. For example
we had the i386 -> i586 change years ago in linux distributions - the
changes were basically which CPU architecture to compile against.

for libc/libgmp it is hard. I don't know if it is possible to cache
these unless at least the major version of the libraries are part of the
cache key.

But note that the fuse based solution works without any sort of caching as
well. stack can block the file system call and compile packages on demand.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/commercialhaskell/stack/issues/2805#issuecomment-263537949,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AARsqA_9i_Wjdwnk78DTL1gdEz2OWcpZks5rDAMDgaJpZM4K7Ve7
.

Blaisorblade on 29 Nov 2016

@alexanderkjeldaas fwiw, for the new http://matrix.hackage.haskell.org/ prototype I've got a remote shared 2nd level nix-store cache. The new nix-style features in cabal make this quite easy to implement for the case of all nodes using the same hardware & distro (i.e. you have to index the cache by distro/abi in addition to os/arch/ghc-version etc), w/o having to resort to a layer like FUSE.

hvr on 29 Nov 2016

❤2

Additional nice-to-have feature for this: a --compiled flag which, instead of using runghc, will compile the script and then run the resulting executable.

snoyberg on 8 Jan 2017

👍2

Great. Here are a few use cases I've seen, and my guess of how they might be invoked in future:

compiling a script without running it. Because you are doing installation/setup and don't want to trigger the script's behaviour. UI: stack ghc script.hs, stack ghc ignores the stack script command but parses the options and installs deps. Maybe it should be stack build script.hs instead ?
installing the dependencies of a script without running it. Similar to the above, but you don't want to compile the script itself. UI: stack build script.hs --only-deps ?
doing a dry run of installing the dependencies of a script. As above, but you don't want to install the deps yet; just see if you can, and how many they will be. UI: stack build script.hs --dry-run ?
running a script in interpreted mode only. Because you are tweaking it and want to see changes right away. UI: stack runghc ./script.hs ? Though, a newcomer will guess "script" does that, rather than "runghc".
running a script with on-demand compilation for more speed. Because you don't want to care whether there's a binary there, and you'd like whatever speed you can get automatically (like python). UI: ./script.hs --compiled, assuming a stack script shebang line ? Though, having to add --compiled every time will be tedious.
running a script that is definitely compiled. Because you want maximum speed (and stack shebang line doesn't provide that yet), and/or a binary that works independently of stack being installed/what git revision is checked out/etc. UI: stack build script.hs, then ./script ?
loading a script with GHCI. Because you want to debug it, or you want to monitor compilation errors with ghcid or intero and those don't understand stack shebang options yet. UI: stack ghci script.hs, stack ghci installs deps like stack build ?

I get the feeling it would be worth using a nice short run command however best simplifies the above; I think it's fine for stack run to be different from cabal run.

simonmichael on 8 Jan 2017

PS as a thought experiment, what commands would you choose if starting over ? Eg:

stack build script[.hs] - installs deps, compiles script.hs
stack run script - installs deps and compiles script.hs if needed, runs compiled script
stack run script.hs - installs deps, runs script.hs in interpreted mode
stack repl script[.hs] - installs deps, loads script.hs in GHCI
./script - runs compiled script
./script.hs - runs shebang line, which is probably "stack run script ...", about as fast as ./script

simonmichael on 8 Jan 2017

One issue I find with the current "runghc" machinery for scripts is that I often want to reference a custom stack.yaml file using the --stack-yaml argument (eg to specify a package from a git repo). If a relative path is provided, it is relative to the current working directory. It would be more usefuly if it were a path relative to the script being executed.

Is this feasible?

timbod7 on 1 Feb 2017

That should be taken up in a separate issue. The script command under
discussion here would not use any stack.yaml file, by design.

On Wed, Feb 1, 2017, 7:02 AM Tim Docker notifications@github.com wrote:

One issue I find with the current "runghc" machinery for scripts is that I
often want to reference a custom stack.yaml file using the --stack-yaml
argument (eg to specify a package from a git repo). If a relative path is
provided, it is relative to the current working directory. It would be more
usefuly if it were a path relative to the script being executed.

Is this feasible?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/commercialhaskell/stack/issues/2805#issuecomment-276573203,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADBB1mYWz9xqP6OjOu7umDKgeU6DUseks5rYBHpgaJpZM4K7Ve7
.

snoyberg on 1 Feb 2017

Does this mean that scripts implemented with this new command can only reference packages from the specified resolver? That seems like a significant limitation.

timbod7 on 1 Feb 2017

@timbod7 You'll be able to specify extra dependencies atop the resolver via --package

mgsloan on 1 Feb 2017

@mgsloan: Will I be able to specify an explicit version, or a github commit for a package via the --package arg?

timbod7 on 1 Feb 2017

Explicit version yes. The --package flag used by exec / runghc does not yet support github commits.

mgsloan on 1 Feb 2017

This is not actually correct: the plan is to disallow packages outside the snapshot. There are many downsides that come with allowing additional packages, such as needing to create .stack-work directories which may conflict with other scripts. If you need non-snapshot packages, you can continue to use the runghc command. This command is intended to solve the common case in the best way possible.

snoyberg on 2 Feb 2017

This is now implemented on a branch:

2805-new-script-command-no-config-change

For anyone interested, please give this a shot and comment on the PR at:

https://github.com/commercialhaskell/stack/pull/2992

If I don't hear any objections, I'll merge later this week.

snoyberg on 13 Feb 2017

This has been merged into master, including the compile-to-executable capabilities and the import parsing (though that's experimental). Improvements can continue in separate issues, so closing.

snoyberg on 16 Feb 2017

I'm super late to this particular party, but this is an awesome feature @snoyberg! Thank you :)

I've been burned a few times by sending Haskell scripts using Stack that wouldn't run on the destination because I missed packages that I had installed locally.

This removes the "works on my machine" problem very elegantly.