Currently stack build will emit warnings if I do not specify size and sha256 for github dependencies in stack.yaml.
Running stack freeze will output the relevant information that I then can add to stack.yaml manually.
I propose to write the information that is currently produced by stack freeze to a .lock file after every successful build (e.g. pantry.lock):
.lock file exists then for each listed dependency:stack uses the information as if it were directly listed in stack.yaml; specifically it errors out on any hash mismatchesstack detects that a dependency was updated in stack.yaml it silently ignores the information from the .lock filestack creates the .lock file, overwriting any existing versions (which in effect means that we remove unused entries and add missing entries)Regarding revision control, we can follow conventions that have been developed elsewhere:
When developing an application, the .lock file should be checked into revision control. When developing a library, the .lock file should be added to .gitignore.
A word on naming: If we expect the .lock file to be specific to stack we may want to name it stack.lock (or possibly .stack.lock to not interfere with file name completion); if on the other hand we think that it will be relevant for other build tools that implement support for pantry then we could name it pantry.lock.
Assuming we adopt this approach we can:
stack freeze commandCC @snoyberg
This looks to be a sensible alternative approach, it appears to be a bit more intrusive and a bit less explicit about the resulting constraints. E.g. shouldn't I set explicit hashes for my library so users should be able to get reproducible result when trying to build on their machines? With somewhat implicit lock file this will look as not quite required so in my opinion having visible warnings is a good thing to promote more accurate constraints. I'd prefer to follow "explicit is better than implicit" for this case but probably @snoyberg has some more arguments on this question.
Let's separate out the case of _snapshot files_ and _stack.yaml files_. For snapshot files: a fair assumption is that authors will want to share these files outside of their project at some point. Having a lock file to go along with snapshot files in that context probably wouldn't make much sense. For example: imagine saying "if you download a snapshot file from http://example.com/snapshot.yaml, also check http://example.com/snapshot.yaml.lock." That seems pretty arbitrary and awkward.
By contrast, I _can_ picture having lock files for stack.yaml. It's definitely nicer than barfing warnings at the user (my current design choice 😄) and doesn't require manual intervention from users. However:
freeze name is also a bit of a problem according to this argument.)stack.yaml file itself. On the other hand, I also like the idea that the stack.yaml file is easily writeable by hand.I'd say a good next step is identifying exactly what Stack would do in the possible cases it could encounter:
As I'm starting to think this through, it may not be as complicated as I was worried it would be.
@snoyberg I think the last 2 questions you posted are easily resolved with what @sol originally proposed: missing hash will get completed and will get saved in a lock file a successful build, and that new lock file will contain only data corresponding to a stack.yaml.
But there's still a question about snapshots: we could add that info also into a pantry.lock (or stack.lock) but shouldn't snapshot maintainers get warned if they have incomplete data in their snapshots? And if they should then probably stack freeze could be useful for them?
And another thing which bothers me while implementing this: what to do with changes user specifies on the command line? That could influence dependency graph quite a bit and it doesn't seem meaningful to store exact dependencies per every used CLI flag combination.
I don't have a strong opinion on custom snapshots; what I care about is extra-deps in stack.yaml.
what to do with changes user specifies on the command line?
That makes things more complicated indeed. I'll describe what I would ideally want to use from a users perspective:
I would want the lock file to be created as if all build targets were requested (libs, executables, tests, benchmarks). I guess this implies that stack would have to construct the whole build plan whenever there is no pantry.lock (or the existing file is stale). We would still only want to actually build what the user requested.
I would tend to ignore the fact that flags can influence the content of pantry.lock. That is, just use the flags that are specified by the user and produce the pantry.lock that results from that combination of flags.
Regarding (2) I think this is a corner case that should be relatively rare (you would need both, a dependency that is guarded by a flag + that same dependency listed as extra-deps in stack.yaml).
If a user still runs into this then I think there is a workaround that s/he can use in most situations: The user can change the .cabal or package.yaml file so that the dependencies that are listed under extra-deps are not guarded by flags (e.g. make them top-level dependencies in package.yaml; this won't work for Win32 but for almost anything else it should be ok).
Why is this ok?
pantry.lock (== put it under revision control) for applications (not for libraries)Finally, there is only one thing I'm worried about. Let's consider the following scenario:
I have a flag
-fproductionthat guards a dependencyfoo; and I also havefoolisted underextra-deps. Locally during development I don't build with-fproduction, so mypantry.lockdoes not contain a hash forfoo. My CI/CD pipeline on the other hand builds with-fproductionand will use a version offoothat is not properly locked.
One way to address this could be to fail on missing entries in the lock file on --pedantic.
I'm not sure what (1) would mean on the implementation side.
- when
stackdetects that a dependency was updated instack.yamlit silently ignores the information from the.lockfile
At the risk of stating the obvious, I think this implies that if we e.g. have
extra-deps:
- github: hspec/hspec
commit: 3c0f266bd3ce71958bf6b6daaf7d0cbcda7e7227
subdirs:
- hspec-core
in stack.yaml then we need to include the commit hash 3c0f266bd3ce71958bf6b6daaf7d0cbcda7e7227 in the .lock file so that we can detect when the "dependency was updated in stack.yaml".
That is, the .lock file is a superset of extra-deps.
But there's still a question about snapshots: we could add that info also into a pantry.lock (or stack.lock) but shouldn't snapshot maintainers get warned if they have incomplete data in their snapshots? And if they should then probably stack freeze could be useful for them?
I'd say, for now, let's ignore the problems of creating snapshot files. That seems like something distinctly out of scope for Stack itself. Perhaps we'll have a Pantry executable that is useful for filling in missing info in a snapshot. After all, Pantry snapshots will hopefully work for more than just Stack users (e.g., Nix users).
what to do with changes user specifies on the command line?
Seems to be totally out of scope. The lock file should only address things that are in the stack.yaml itself. Command line changes are non-reproducible by nature.
Let's try to make things just a bit more concrete. Let's say we have @sol's example stack.yaml:
extra-deps:
- github: hspec/hspec
commit: 3c0f266bd3ce71958bf6b6daaf7d0cbcda7e7227
subdirs:
- hspec-core
I'm imagining we'll end up with a lock file that looks like this:
dependencies:
- original:
github: hspec/hspec
commit: 3c0f266bd3ce71958bf6b6daaf7d0cbcda7e7227
subdirs:
- hspec-core
complete:
- size: 99713
subdir: hspec-core
url: https://github.com/hspec/hspec/archive/3c0f266bd3ce71958bf6b6daaf7d0cbcda7e7227.tar.gz
cabal-file:
size: 4506
sha256: f79008d723c203f36815b304c0db90b065fa1b20cc06033e5055ccea8b096c3b
name: hspec-core
version: 2.6.0
sha256: ad03a807857ce4ed07c6a8410dbce8b5244c4e193e86767ce0ca93a3ba28dadd
pantry-tree:
size: 3751
sha256: 1bad19b4c4afde31c5f57d919ed768a0e034587e30070e4ac3420896fcb48b90
Rules:
I think this approach will address all of the reproducibility and performance concerns I have, and mean that the original stack.yaml can be _even more_ lightweight than it is today, e.g. including the @rev:0 information there will no longer be necessary at all.
@qrilka @sol does that make sense?
I'd say, for now, let's ignore the problems of creating snapshot files. That seems like something distinctly out of scope for Stack itself. Perhaps we'll have a Pantry executable that is useful for filling in missing info in a snapshot. After all, Pantry snapshots will hopefully work for more than just Stack users (e.g., Nix users).
:+1:
what to do with changes user specifies on the command line?
Seems to be totally out of scope. The lock file should only address things that are in the stack.yaml itself. Command line changes are non-reproducible by nature.
That makes sense. I was misguided by the wrong assumption that you can specify Hackage dependencies without exact versions as extra-deps (but that is not the case and wouldn't really make sense anyway).
Now everything actually looks neat and simple! That's great!
Actually I have one question still: the original proposal talks about saving lock file only on successful build but what if that build was modified by some flags - in that case we don't have any guarantees about original (completed) dependencies. Should we exclude that requirement of a successful build then @snoyberg ? And use lock file only to lock package versions without giving extra guarantees?
Regarding the question of pantry.lock vs stack.lock - this feature looks to be out of Pantry scope at least with the current separation between Stack and Pantry: the details of stack.yaml are in Stack and in Pantry there's no entity combining some list of packages (i.e. extra-deps + project packages, though the latter don't need to be locked) and a snapshot. So I would propose to stick with stack.yaml if there are no objections to that.
Also it wasn't stated explicitly in this ticket yet but it's better to be noted that except information about completed packages we need to complete package location.
Should we exclude that requirement of a successful build then
At least I would be fine with that. My original feature request is influenced by what I implemented in sol/tinc where I do dependency solving and have the successful build requirement. But for stack it seems like this can just be an (impure) function from stack.yaml to stack.lock.
Regarding the question of
pantry.lockvsstack.lock
I used pantry.lock as a placeholder name only. I'm ok with stack.lock or anything else. One drawback of stack.lock is that completion a la
vim sta<tab><enter>
to edit stack.yaml won't work anymore. But not a major issue for me.
Still, should we consider .stack.lock?
the original proposal talks about saving lock file only on successful build but what if that build was modified by some flags - in that case we don't have any guarantees about original (completed) dependencies. Should we exclude that requirement of a successful build then @snoyberg ? And use lock file only to lock package versions without giving extra guarantees?
Yes, I think so, good catch.
Regarding the file name, we need to consider the fact that the stack.yaml file may _not_ be called stack.yaml. It might be stack-lts-12.yaml, for example, or even something ridiculous like my-stack-yaml.txt. How about simply appending .lock to the filename?
Sounds good to me, also I think it makes sense to resolve symlinks to get the real file name so stack.yaml pointing to stack-lts-12.yaml will work the same as using the file stack-lts-12.yaml itself.
How about simply appending
.lockto the filename?
I like this 👍
This is also vital for improving performance. Currently, Stack will spend significant time completing package information on each startup. Automatically writing a .lock file will bypass that overhead.
Specification for this lock file implementation is now written here: https://github.com/commercialhaskell/stack/blob/f2ac3e64d70c990c6a5045f02c13f5869a77c5c4/doc/lock_files.md
Looks great 👍
QUESTION Do we want to go the easy way at first and later implement the more complicated update procedure?
No strong preference, but it's the approach I tend to go for. Maybe even simplify the file format by skipping originals until we actually need them.
@simonmichael regarding originals - as far as I understand in #4550 they are used to simplify tracking between locations source yaml files and the resulting completed locations, so I'd say that we need them already.