Stack: Write pantry hashes into a .lock file instead of emitting warnings

Created on 4 Sep 2018 · 18Comments · Source: commercialhaskell/stack

Currently stack build will emit warnings if I do not specify size and sha256 for github dependencies in stack.yaml.

Running stack freeze will output the relevant information that I then can add to stack.yaml manually.

I propose to write the information that is currently produced by stack freeze to a .lock file after every successful build (e.g. pantry.lock):

when the .lock file exists then for each listed dependency:
- stack uses the information as if it were directly listed in stack.yaml; specifically it errors out on any hash mismatches
- when stack detects that a dependency was updated in stack.yaml it silently ignores the information from the .lock file
after a successful build stack creates the .lock file, overwriting any existing versions (which in effect means that we remove unused entries and add missing entries)

Regarding revision control, we can follow conventions that have been developed elsewhere:
When developing an application, the .lock file should be checked into revision control. When developing a library, the .lock file should be added to .gitignore.

A word on naming: If we expect the .lock file to be specific to stack we may want to name it stack.lock (or possibly .stack.lock to not interfere with file name completion); if on the other hand we think that it will be relevant for other build tools that implement support for pantry then we could name it pantry.lock.

Assuming we adopt this approach we can:

remove any warnings related to pantry hashes
remove the stack freeze command

CC @snoyberg

pantry

Source

sol

All 18 comments

This looks to be a sensible alternative approach, it appears to be a bit more intrusive and a bit less explicit about the resulting constraints. E.g. shouldn't I set explicit hashes for my library so users should be able to get reproducible result when trying to build on their machines? With somewhat implicit lock file this will look as not quite required so in my opinion having visible warnings is a good thing to promote more accurate constraints. I'd prefer to follow "explicit is better than implicit" for this case but probably @snoyberg has some more arguments on this question.

qrilka on 9 Sep 2018

Let's separate out the case of _snapshot files_ and _stack.yaml files_. For snapshot files: a fair assumption is that authors will want to share these files outside of their project at some point. Having a lock file to go along with snapshot files in that context probably wouldn't make much sense. For example: imagine saying "if you download a snapshot file from http://example.com/snapshot.yaml, also check http://example.com/snapshot.yaml.lock." That seems pretty arbitrary and awkward.

By contrast, I _can_ picture having lock files for stack.yaml. It's definitely nicer than barfing warnings at the user (my current design choice 😄) and doesn't require manual intervention from users. However:

Unlike other systems, stack.yaml files have always been intended to be "locked" already. These additional lock files are simply providing more secure locking. In contrast, Rust's lock files are providing the same level of information that our stack.yaml files provide. The difference could be confusing. (Note: the choice of freeze name is also a bit of a problem according to this argument.)
It's not clear to me what the exact workflow would be for how we update these lock files.
I _do_ like the idea that everything necessary for reproducing the build is in the stack.yaml file itself. On the other hand, I also like the idea that the stack.yaml file is easily writeable by hand.

I'd say a good next step is identifying exactly what Stack would do in the possible cases it could encounter:

extra-deps value is present with missing hash info and no entry in the lock file
Info is available in the lock file but not in extra-deps

As I'm starting to think this through, it may not be as complicated as I was worried it would be.

snoyberg on 12 Sep 2018

@snoyberg I think the last 2 questions you posted are easily resolved with what @sol originally proposed: missing hash will get completed and will get saved in a lock file a successful build, and that new lock file will contain only data corresponding to a stack.yaml.
But there's still a question about snapshots: we could add that info also into a pantry.lock (or stack.lock) but shouldn't snapshot maintainers get warned if they have incomplete data in their snapshots? And if they should then probably stack freeze could be useful for them?

qrilka on 23 Nov 2018

And another thing which bothers me while implementing this: what to do with changes user specifies on the command line? That could influence dependency graph quite a bit and it doesn't seem meaningful to store exact dependencies per every used CLI flag combination.

qrilka on 23 Nov 2018

I don't have a strong opinion on custom snapshots; what I care about is extra-deps in stack.yaml.

what to do with changes user specifies on the command line?

That makes things more complicated indeed. I'll describe what I would ideally want to use from a users perspective:

I would want the lock file to be created as if all build targets were requested (libs, executables, tests, benchmarks). I guess this implies that stack would have to construct the whole build plan whenever there is no pantry.lock (or the existing file is stale). We would still only want to actually build what the user requested.
I would tend to ignore the fact that flags can influence the content of pantry.lock. That is, just use the flags that are specified by the user and produce the pantry.lock that results from that combination of flags.

Discussion

Regarding (2) I think this is a corner case that should be relatively rare (you would need both, a dependency that is guarded by a flag + that same dependency listed as extra-deps in stack.yaml).

If a user still runs into this then I think there is a workaround that s/he can use in most situations: The user can change the .cabal or package.yaml file so that the dependencies that are listed under extra-deps are not guarded by flags (e.g. make them top-level dependencies in package.yaml; this won't work for Win32 but for almost anything else it should be ok).

Why is this ok?

A user really only cares about the content of pantry.lock (== put it under revision control) for applications (not for libraries)
For applications (in contrast to libraries) it's not crucial to minimize dependencies => guarding dependencies with flags is not crucial for applications

Finally, there is only one thing I'm worried about. Let's consider the following scenario:

I have a flag -fproduction that guards a dependency foo; and I also have foo listed under extra-deps. Locally during development I don't build with -fproduction, so my pantry.lock does not contain a hash for foo. My CI/CD pipeline on the other hand builds with -fproduction and will use a version of foo that is not properly locked.

One way to address this could be to fail on missing entries in the lock file on --pedantic.

Implementation

I'm not sure what (1) would mean on the implementation side.

sol on 23 Nov 2018

when stack detects that a dependency was updated in stack.yaml it silently ignores the information from the .lock file

At the risk of stating the obvious, I think this implies that if we e.g. have

extra-deps:
  - github: hspec/hspec
    commit: 3c0f266bd3ce71958bf6b6daaf7d0cbcda7e7227
    subdirs:
      - hspec-core

in stack.yaml then we need to include the commit hash 3c0f266bd3ce71958bf6b6daaf7d0cbcda7e7227 in the .lock file so that we can detect when the "dependency was updated in stack.yaml".

That is, the .lock file is a superset of extra-deps.

sol on 23 Nov 2018

But there's still a question about snapshots: we could add that info also into a pantry.lock (or stack.lock) but shouldn't snapshot maintainers get warned if they have incomplete data in their snapshots? And if they should then probably stack freeze could be useful for them?

I'd say, for now, let's ignore the problems of creating snapshot files. That seems like something distinctly out of scope for Stack itself. Perhaps we'll have a Pantry executable that is useful for filling in missing info in a snapshot. After all, Pantry snapshots will hopefully work for more than just Stack users (e.g., Nix users).

what to do with changes user specifies on the command line?

Seems to be totally out of scope. The lock file should only address things that are in the stack.yaml itself. Command line changes are non-reproducible by nature.

Let's try to make things just a bit more concrete. Let's say we have @sol's example stack.yaml:

extra-deps:
  - github: hspec/hspec
    commit: 3c0f266bd3ce71958bf6b6daaf7d0cbcda7e7227
    subdirs:
      - hspec-core

I'm imagining we'll end up with a lock file that looks like this:

dependencies:
- original:
    github: hspec/hspec
    commit: 3c0f266bd3ce71958bf6b6daaf7d0cbcda7e7227
    subdirs:
    - hspec-core
  complete:
  - size: 99713
    subdir: hspec-core
    url: https://github.com/hspec/hspec/archive/3c0f266bd3ce71958bf6b6daaf7d0cbcda7e7227.tar.gz
    cabal-file:
      size: 4506
      sha256: f79008d723c203f36815b304c0db90b065fa1b20cc06033e5055ccea8b096c3b
    name: hspec-core
    version: 2.6.0
    sha256: ad03a807857ce4ed07c6a8410dbce8b5244c4e193e86767ce0ca93a3ba28dadd
    pantry-tree:
      size: 3751
      sha256: 1bad19b4c4afde31c5f57d919ed768a0e034587e30070e4ac3420896fcb48b90

Rules:

Any package in the extra-deps or any snapshots with incomplete information will appear in this lock file, regardless of whether or not it's used by the build.
If a package description in extra-deps or a snapshot changes (e.g., new commit, changes to subdirs), then a new entry will be added and the old one removed.
Similarly, if any packages are removed, they'll be removed from the lockfile.
Each time additions, changes, or removals happen, we'll get some message printed to the user

I think this approach will address all of the reproducibility and performance concerns I have, and mean that the original stack.yaml can be _even more_ lightweight than it is today, e.g. including the @rev:0 information there will no longer be necessary at all.

@qrilka @sol does that make sense?

snoyberg on 23 Nov 2018

I'd say, for now, let's ignore the problems of creating snapshot files. That seems like something distinctly out of scope for Stack itself. Perhaps we'll have a Pantry executable that is useful for filling in missing info in a snapshot. After all, Pantry snapshots will hopefully work for more than just Stack users (e.g., Nix users).

:+1:

what to do with changes user specifies on the command line?

Seems to be totally out of scope. The lock file should only address things that are in the stack.yaml itself. Command line changes are non-reproducible by nature.

That makes sense. I was misguided by the wrong assumption that you can specify Hackage dependencies without exact versions as extra-deps (but that is not the case and wouldn't really make sense anyway).

Now everything actually looks neat and simple! That's great!

sol on 23 Nov 2018

Actually I have one question still: the original proposal talks about saving lock file only on successful build but what if that build was modified by some flags - in that case we don't have any guarantees about original (completed) dependencies. Should we exclude that requirement of a successful build then @snoyberg ? And use lock file only to lock package versions without giving extra guarantees?

qrilka on 23 Nov 2018

Regarding the question of pantry.lock vs stack.lock - this feature looks to be out of Pantry scope at least with the current separation between Stack and Pantry: the details of stack.yaml are in Stack and in Pantry there's no entity combining some list of packages (i.e. extra-deps + project packages, though the latter don't need to be locked) and a snapshot. So I would propose to stick with stack.yaml if there are no objections to that.
Also it wasn't stated explicitly in this ticket yet but it's better to be noted that except information about completed packages we need to complete package location.

qrilka on 23 Nov 2018

Should we exclude that requirement of a successful build then

At least I would be fine with that. My original feature request is influenced by what I implemented in sol/tinc where I do dependency solving and have the successful build requirement. But for stack it seems like this can just be an (impure) function from stack.yaml to stack.lock.

Regarding the question of pantry.lock vs stack.lock

I used pantry.lock as a placeholder name only. I'm ok with stack.lock or anything else. One drawback of stack.lock is that completion a la

vim sta<tab><enter>

to edit stack.yaml won't work anymore. But not a major issue for me.

Still, should we consider .stack.lock?

sol on 23 Nov 2018

👍1

the original proposal talks about saving lock file only on successful build but what if that build was modified by some flags - in that case we don't have any guarantees about original (completed) dependencies. Should we exclude that requirement of a successful build then @snoyberg ? And use lock file only to lock package versions without giving extra guarantees?

Yes, I think so, good catch.

Regarding the file name, we need to consider the fact that the stack.yaml file may _not_ be called stack.yaml. It might be stack-lts-12.yaml, for example, or even something ridiculous like my-stack-yaml.txt. How about simply appending .lock to the filename?

snoyberg on 24 Nov 2018

👍1

Sounds good to me, also I think it makes sense to resolve symlinks to get the real file name so stack.yaml pointing to stack-lts-12.yaml will work the same as using the file stack-lts-12.yaml itself.

qrilka on 24 Nov 2018

👍1

How about simply appending .lock to the filename?

I like this 👍

sol on 25 Nov 2018

This is also vital for improving performance. Currently, Stack will spend significant time completing package information on each startup. Automatically writing a .lock file will bypass that overhead.

snoyberg on 18 Jan 2019

Specification for this lock file implementation is now written here: https://github.com/commercialhaskell/stack/blob/f2ac3e64d70c990c6a5045f02c13f5869a77c5c4/doc/lock_files.md

snoyberg on 6 Mar 2019

Looks great 👍

QUESTION Do we want to go the easy way at first and later implement the more complicated update procedure?

No strong preference, but it's the approach I tend to go for. Maybe even simplify the file format by skipping originals until we actually need them.

sol on 6 Mar 2019

@simonmichael regarding originals - as far as I understand in #4550 they are used to simplify tracking between locations source yaml files and the resulting completed locations, so I'd say that we need them already.

qrilka on 6 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings