Spack: Proposal: Enforcement of Trusted Downloads

Created on 2 Sep 2016 · 33Comments · Source: spack/spack

@tgamblin @davydden @ad

In an effort to ensure safety, Spack:

Checksums tarball downloads.
Refuses to install non-checksummed tarballs by default (unless the user specifies --no-checksums).

Unfortunately, this approach does not meet Spack's security goals. Because Spack will quite happily process any Git download, whether or not the user has specified --no-checksums. Unless a checksum or Git hash is used to verify a download, I call it an untrusted download. This is a real issue because a number of packages in Spack will default to untrusted Git downloads. For example:

$ grep 'branch=' `find . -name '*.py'`
./lib/spack/external/nose/plugins/cover.py:                branch=self.coverBranches, data_suffix=conf.worker,
./lib/spack/spack/fetch_strategy.py:                hg='https://jay.grs.rwth-aachen.de/hg/lwm2', branch='torus')
./var/spack/repos/builtin/packages/bbcp/package.py:            branch="master")
./var/spack/repos/builtin/packages/cbtf/package.py:    version('1.6', branch='master',
./var/spack/repos/builtin/packages/cbtf-argonavis/package.py:    version('1.6', branch='master',
./var/spack/repos/builtin/packages/cbtf-krell/package.py:    version('1.6', branch='master',
./var/spack/repos/builtin/packages/cbtf-lanl/package.py:    version('1.6', branch='master',
./var/spack/repos/builtin/packages/cityhash/package.py:    version('master', branch='master',
./var/spack/repos/builtin/packages/cleverleaf/package.py:            branch='develop')
./var/spack/repos/builtin/packages/cnmem/package.py:    version('git', git='https://github.com/NVIDIA/cnmem.git', branch="master")
./var/spack/repos/builtin/packages/flux/package.py:    version('master', branch='master',
./var/spack/repos/builtin/packages/hdf5-blosc/package.py:            branch='master')
./var/spack/repos/builtin/packages/julia/package.py:            git='https://github.com/JuliaLang/julia.git', branch='master')
./var/spack/repos/builtin/packages/julia/package.py:            git='https://github.com/JuliaLang/julia.git', branch='release-0.5')
./var/spack/repos/builtin/packages/julia/package.py:            git='https://github.com/JuliaLang/julia.git', branch='release-0.4')
./var/spack/repos/builtin/packages/openspeedshop/package.py:    version('2.2', branch='master',
./var/spack/repos/builtin/packages/qthreads/package.py:            branch="release-1.10")
./var/spack/repos/builtin/packages/r-BiocGenerics/package.py:            branch='release-3.3')
./var/spack/repos/builtin/packages/r-BiocGenerics/package.py:            branch='release-3.2')
./var/spack/repos/builtin/packages/raja/package.py:    version('git', git='https://github.com/LLNL/RAJA.git', branch="master")
./var/spack/repos/builtin/packages/rose/package.py:    version('master', branch='master',

Note that this list violates another assumption we've been making: that numeric-versioned packages are always trusted (checksummed), whereas non-numeric versions are not. If you assume that, you will be surprised when downloading version 1.6 of cbtf, for example. Or more seriously, [email protected].

Another problem is that --no-checksums is too coarse. It's the equivalent to turning off a firewall when you really need to just open one port. You usually want to turn off safety for just one package, not all its possible dependencies.

I therefore propose an improved scheme for trusted downloads that _will_ meet Spack's security goals:

Define a trusted download as one in which Spack could verify what it downloaded was the same as what the author downloaded when the package was created. This means checksums, git hashes, etc. Someone will have to go through the different download methods and figure out how to determine when a download is trusted vs. not trusted. (Do NOT rely on https://. This can guarantee the source of the download, but not its content). This needs to get written up in Python, so Spack can know when a download is trusted.
By default, Spack will only install trusted downloads.
The user can override this default by adding something to packages.yaml, specifying that Spack should allow untrusted downloads on certain packages ONLY. For example, I might use the following while developing a package:

    ibmisc:
        version: [develop]
        verify: no    # Allow untrusted downloads
        variants: +python +netcdf

Thoughts? Feedback?

bug discussion feature question

Source

citibeth

Most helpful comment

@tgamblin fwiw: I have plans to fix the lack of consistency there, probably combined with a switch to sha256 or sha512, because, why not... Just need to find some time for it.

boegel on 13 Jan 2017

👍2

All 33 comments

Forgot to mention.. if we make this change, we will also have to fix any packages that currently provide untrusted downloads by default.

citibeth on 2 Sep 2016

I'm interested in feedback on the commit referenced above.

citibeth on 2 Sep 2016

I think one can distinguish between untrusted release downloads and @develop version tag.

I agree that for cases when VCS is used for delivering release versions (which I find a bad practice), one should be able to checksum the downloaded content and ensure that this is exactly what Spack downloads.

Whereas for @develop version of a package i would not require setting verify: no or --no-checksums in order to run spack install package@develop. If one needs to make sure that the user understand that @develop is not checksummed, one could print a warning message along the lines:

You are about to install a @develop version which can not be checksummed. The downloaded content and thereby the installed package may change depending on when the installation is done

ps. if the collective decision is to use verify: no # Allow untrusted downloads, it would be just a single line to add to my packages.yaml. Not a biggie.

davydden on 2 Sep 2016

I think that @develop discussions will become clearer once we have better security in place. Therefore, I suggest we address the security issues (this Issue) first, and put further discussion of @develop on hold. Please read my original examples above assuming @develop is any old non-numeric version, not a special case.
Suppose we have two possible solutions to support our workflow requirements --- one relies on special cases, and one does not (@develop is a special case). Other things being equal, I suggest we should prefer the solution _without_ the special cases.

With that in mind...

if the collective decision is to use verify: no # Allow untrusted downloads, it would be just a single line to add to my packages.yaml. Not a biggie.

The verify: no solution would avoid us having to bake special cases in the Spack security stuff. Are you saying that it would meet your needs? Would you want to use verify: no on a per-package basis, or a global basis, or both? (Global verify: no would be equivalent to the current --no-checksums, but implemented correctly).

citibeth on 2 Sep 2016

I suggest we address the security issues (this Issue) first, and put further discussion of @develop on hold.

I am fine with that.

The verify: no solution would avoid us having to bake special cases in the Spack security stuff. Are you saying that it would meet your needs? Would you want to use verify: no on a per-package basis, or a global basis, or both?

Per-package veiry: no is enough for my needs. Do i understand right that one would still have CLI interface like spack install <packageA> --no-checksum to do the same for a single <packageA>, not its dependencies?

davydden on 2 Sep 2016

Do i understand right that one would still have CLI interface like spack install --no-checksum to do the same for a single , not its dependencies?

The possibilities are wide open on what UI this system would have. My only suggestion is we don't call it --no-checksum because:

Trusted downloads is about more than just checksums.
Make sure we break any past uses of --no-checksum, and get users to convert to the new style. (Spack is Alpha software, we are allowed to break UIs).

Would you be interested in spec'ing out command line options that would control how Spack works with packages in this framework of trusted / untrusted downloads?

citibeth on 2 Sep 2016

Would you be interested in spec'ing out command line options that would control how Spack works with packages in this framework of trusted / untrusted downloads?

yes, i think there should be CLI option in addition to verify: no in packages.yaml.

davydden on 2 Sep 2016

Would you like to spec it out further?

What would the option be named? How would it work?
Would it apply to just the top level of the spec or everything in the spec? Or would there be two versions?

citibeth on 2 Sep 2016

@scheibelp (because you made the Spack download cache), @tgamblin

To summarize: I'm working on a PR that introduces a new concept I call "trusted download." A download is trusted if Spack can verify that what it sees is the same as what the Spack package author saw when the package was created. The idea is, unless directed otherwise, Spack will only install packages from trusted downloads. That is and important security feature.

In the past, we've had notions that are almost like trusted downloads, but not quite. The idea here is to "harden" the system and bring everything in line with a more safe/secure setup.

As far as I know, the following download methods are the only ones that are trusted:
a) Tarball + checksum
b) Git hash
c) GitHub dynamic tarball + checksum (could be based on a Git hash, tag or branch)
d) Probably something similar with hg

The following download methods I believe are NOT trusted:
a) Git tag or branch --- because they could change
b) SVN revision number --- because it could change if the server is hacked

Spack cache implicitly has a similar but different notion, which I will call "permanence." A download is permanent if we don't expect it to change. The following download methods are permanent:
a) Tarball + checksum
b) Git hash + checksum (whether straight git or GitHub tarball)
d) Probably something similar with hg
e) SVN revision numbers

The following download methods are not permanent:
a) Git tag or branch (as longGitHub dynamic tarball + checksum (could be based on a Git hash, tag or branch)
b) Git hash + tag or branch (whether straight git or GitHub tarball)
c) Probably something similar with hg

Note the overlap but difference between the notions of trust vs. permanence.

The big question is, how do we integrated the idea of trust with the Git cache? I'm going to suggest the following, and am interested in your comments:

Spack will cache any download it expects to be permanent (as in current situation).
Spack will keep (conceptually at least) two caches: one of trusted/verified downloads, and one of untrusted/non-verifiable downloads.
Spack will NEVER cache something where verification (checksum) failed.
When using the cache, Spack would do:
a) If our requested download is of a trusted type, look for it ONLY in the trusted cache.
b) If our requested download is of an untrusted type, look for it FIRST in the trusted cache, and THEN in the untrusted cache. (Or should it look in the other oder)?
1. We might also think of adding ways to verify additional download methods. For example, why not put a checksum on the download of a Git branch? This is a nice "extra" feature.

Thoughts? Comments? Feedback from @scheibelp the cache author is appreciated.

citibeth on 3 Sep 2016

What would the option be named? How would it work?

I don't have any preferences for names.

Would it apply to just the top level of the spec or everything in the spec? Or would there be two versions?

perhaps having two versions is the most flexible, but i don't have a strong opinion on this.

davydden on 3 Sep 2016

Some thoughts/questions which IMO are relevant to the above:

Does "git hash" mean the git commit hash, i.e. that identifies a commit? If so I'm confused why "Git hash + tag or branch" isn't permanent (I thought git hash is like a primary key in that it implies the tag/branch). Furthermore based on some preliminary reading, the commit hash may be kind of OK for trust purposes because it's based on a sha1 ~~which~~ and includes the file contents (although I think it is explicitly _not_ advertised as a security feature).

Likewise I'm not sure what you mean by the checksum - are you talking about the algorithm to produce the commit hash in git, or some arbitrary technique that produces a checksum by looking at the retrieved source? In the latter case it's a challenge to produce a consistent checksum: for example if one tries to checksum a .tar.gz archive, that changes for two archives of the same repository because a timestamp is included in the archive. (turns out this shouldn't be difficult: GZIP=-n tar -cz ... will create a consistent tar.gz) The latter case is attractive if a solution can be found because it introduces a consistent method for verification across git, hg, svn, etc.

On a side note: with the above in mind does github guarantee that two .zip downloads of a repo for the same commit hash have the same checksum?

Also note that Spack doesn't explicitly verify git repositories, including cases where versions have a commit hash (the "check" method for all SCM fetchers is a noop): in other words it doesn't invoke any particular command, although for the git case the cloning action itself I think could be considered sufficient based on the above.

Spack will NEVER cache something where verification (checksum) failed.

I wanted to make sure the intent here was not to allow skipping verification of sources retrieved from the trusted cache. If so, IMO it's easier to relax this and just make sure that when you pull something out of the cache it's what you expect. This allows the user to edit checksums for versions in their package.py files without worrying about sync issues with the cache.

Also IMO it may be useful to still allow the user to install things automatically in an untrusted mode (perhaps printing warnings about which packages are not trusted): If we presume that a Spack package only uses untrusted techniques when no other option is available, then a user may be annoyed to explicitly enable untrusted downloads for a large number of packages when there would be no other option for them anyways (regardless of whether they were using Spack).

scheibelp on 6 Sep 2016

As far as I know, the following download methods are the only ones that are trusted:
a) Tarball + checksum
b) Git hash
c) GitHub dynamic tarball + checksum (could be based on a Git hash, tag or branch)
d) Probably something similar with hg

I mentioned this already in a different issue, but downloading any git tag or any tarball can also be verified by gpg signatures. We just need to add which (list of) fingerprint/pubkeys to trust once in the package - then the upstream maintainers just need to continue signing with it (which most people maintaining a large user base do, even on github). See this comment: https://github.com/LLNL/spack/issues/562#issuecomment-245323463 and this test repo.

Signing releases is also not uncommon in the Python/pip release world: https://github.com/pypa/twine/issues/157 although tools are currently not verifying those signatures on download (but they could...).

ax3l on 12 Sep 2016

We need to be careful about what we're guaranteeing here. I think Spack
should guarantee that the version downloaded by the user will be the same
as the version downloaded by the Spack package author, as long as Spack
itself hasn't been tampered with.

Relying on upstream publishers to provide signatures does not give this
guarantee. The Spack package author should use the upstream signatures to
increase the assurance that the tarball the Spack package author sees is
the same as the tarball intended by the upstream author. However... since
so many upstream signatures appear on the same website / FTP server beside
the tarball itself, they provide little or no actual security. (I suppose
they do help ensure something didn't go wrong in the download).

On Mon, Sep 12, 2016 at 1:13 PM, Axel Huebl [email protected]
wrote:

As far as I know, the following download methods are the only ones that
are trusted:
a) Tarball + checksum
b) Git hash
c) GitHub dynamic tarball + checksum (could be based on a Git hash, tag or
branch)
d) Probably something similar with hg

I mentioned this already in a different issue, but downloading any git tag
or any tarball can also be verified buy signatures. We just need to add
which (list of) fingerprint/pubkeys to trust once in the package - then the
upstream maintainers just need to continue signing with it (which most
people do, even on github). See this comment: #562 (comment)
https://github.com/LLNL/spack/issues/562#issuecomment-245323463 and
this test repo https://github.com/ax3l/test_pgp_release/releases.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/LLNL/spack/issues/1696#issuecomment-246419630, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AB1cd1_cVAKcVMouMQh22hPggi-dJa5gks5qpYg9gaJpZM4JzWOg
.

citibeth on 12 Sep 2016

I think Spack should guarantee that the version downloaded by the user will be the same as the version downloaded by the Spack package author, as long as Spack itself hasn't been tampered with.

That's actually the question if you want to guarantee that a spack package author _guarantees_ you a version (whatever that means. it does not guarantee the packer did a review, or even remotely knows the code base).

Or if spack simply creates a way to trustfully download and install a specific software, where trusting the upstream dev is enough, see the pip/npm/docker/conda-forge/... "trust" models. Everything else will just delay shipping software by waiting for stable releases of spack.

In my understanding, HPC is a moving target and even the fundamental software such as MPI, I/O libraries, etc. are still quickly evolving.

ax3l on 13 Sep 2016

👍1

see the pip/npm/docker/conda-forge/... "trust" models.

Do you have a reference?

On Tue, Sep 13, 2016 at 6:06 AM, Axel Huebl [email protected]
wrote:

I think Spack should guarantee that the version downloaded by the user
will be the same as the version downloaded by the Spack package author, as
long as Spack itself hasn't been tampered with.

That's actually the question if you want to guarantee that a spack package
author _guarantees_ you a version (whatever that means. it does not
guarantee the packer did a review, or even remotely knows the code base).

Or if spack simply creates a way to trustfully download and install a
specific software, where trusting the upstream dev is enough, see the
pip/npm/docker/conda-forge/... "trust" models. Everything else will just
delay shipping software by waiting for stable releases of spack.

In my understanding, HPC is a moving target and even the fundamental
software such as MPI, I/O libraries, etc. are still quickly evolving.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/LLNL/spack/issues/1696#issuecomment-246635458, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AB1cd0EYO8QBFVefzACd_TSg0Gpxu3wAks5qpnW2gaJpZM4JzWOg
.

citibeth on 13 Sep 2016

I try as far as I get. Basically all of the above just provide repositories and tools around them without the need to trust the repository itself (but the need to trust the individual packager).

PyPI & Pip

_everyone_ with a PyPI account (aka, everyone with an email) can upload packages to the default repo (but other repos exist)
uploaders can sign their uploads via gpg
pip downloads are secured via transport encryption
builds are historically done from source, but if wheel binary blobs are available those are taken to speed up the install (force "from-source" possible)
one can check hashes of the downloaded packages
repeatability via version pinning and freezing
following: external repos and signature checking

NPM

basically the same "trust the uploader" scheme
they regularly scan the hosted software and handle disputes with e.g. uploaded malware (sometimes not too successfully)

Conda (Forge)

conda: binary blob package manager & virtual environment manager
official channel(s) and "trust the channel author" methodology (as in pip/npm)
maintainer signing again possible
conda forge: basically an open, continuous delivery service, similar to arch linux policy of "if it builds, we ship it"; GitHub organization with recipes
reproducibility via version pinning, freezing and standardized environment files

Docker

again, trust the container author (or not)
authors: signing changes in containers possible since 1.8
several papers about scans of those containers (and how vulnerable/old the software is it ships)
a few trusted "official" containers for e.g., ubuntu (which basically means: trust a single person of the ubuntu maintainers)

All of the above seem to work well for fast-moving targets (e.g., python and nodejs packages). The central-pinning approach otherwise can quickly lead to a collection of outdated software.

(Examples: anyone still using the Debian stable HDF5 1.8.13 from 2014? Or waiting for a new release of pyinstaller to finally include a few tweaks to ship your Python package?)

ax3l on 13 Sep 2016

👍1

I think the strength of spack are:

always ship the source
compile with all optimizations and matching compiler (_"wohoww, MPI ABI incompatbilities..."_)
in the correct environment/architecture (_"hey, I might be quicker using AVX-512 on my prototype!"_)
its easy way to express, maintain and build variants (_"ever tried explaining >install h5py with parallel HDF5< to a user?"_)
"HPC first" recipes and user-side additions for system specific tweaks possible
module ("virtual environment") support for install multiple applications side by side
quick and reproducable install and updating (while quick means: without the need to read all configure options for each dependency)

. None of the existing solutions I know meets that set of requirements but both HPC sysadmins, HPC software developers and users face them daily.

But if we are getting stuck in a long release cycle for upstream releases of software spack ships this could outweigh the benefit of the easy install. I think the HPC community should read more announcements like this:

yt 3.3 has been released!
Install with
  “pip install -U yt”
or
  “conda install -c conda-forge yt”.

but of course via spack install and of course... with all the cluster support we need.

Comparison:
conda/docker: pre-configured binary blobs
pip: python only
npm: nodejs only
btw, I seem to remember homebrew also builds from source on it's very limited number of platforms

ax3l on 13 Sep 2016

😄1 👍1

Am I right that the central issue here is getting new releases into Spack
quickly? It seems that's the kind of issue that can be addressed by Spack
project policy, without changing core Spack infrastructure.

citibeth on 13 Sep 2016

Yep, that's the central concern.
But besides policy one might be able to achieve that by dividing the package repo from the central spack "client" source code - which would be an infrastructure change.

With such a division one could achieve quick, even continuous delivery of new third party software releases independent of the CLI release cycle.

ax3l on 13 Sep 2016

But besides policy one might be able to achieve that by dividing the
package repo from the central spack "client" source code - which would be
an infrastructure change.

No, this is independent of policy. Current policy is, updates are
submitted as PR's. How long it takes to get into the Spack repo depends on
how long it takes for someone to get around to approving the PR.

Do people feel that Package PR's have been approved too slowly in the past?

citibeth on 13 Sep 2016

@citibeth nope, quick and competent response is definitely given in the PRs of spack! :rocket: :sparkles:

it's more about the workflow in general: would we like to draft a release of the CLI even if it did not change at all?
A peer-reviewed update of a package repo on GitHub could do exactly the same. The CLI can simply cache that repo when internet access is available and all updates of the packages still go in via PRs but in a _second_ repo.

An example is conda forge _staged-recipes_. E.g., an update of a software does not need to test the CLI functionality ("is spack bootstrap working", "is spack spec working", "is spack help working", ...) but only that specific package (and it's deps).

It's just about a division of fully orthogonal aspects of the software.

ax3l on 13 Sep 2016

We've discussed this before. It is likely to happen at some point in the
future. Spack is currently alpha software, and the packages are not always
orthogonal from core Spack.

On Tue, Sep 13, 2016 at 2:22 PM, Axel Huebl [email protected]
wrote:

@citibeth https://github.com/citibeth nope, quick and competent
response is definitely given in the PRs of spack! 🚀 ✨

it's more about the workflow in general: would we like to draft a release
of the CLI even if it did not change at all?
A peer-reviewed update of a package repo on GitHub could do exactly the
same. The CLI can simply cache that repo when internet access if available
and all updates of the packages still go in via PRs but in a _second_
repo.

An example is conda forge https://github.com/conda-forge/staged-recipes.
E.g., an update of a software does not need to test the CLI functionality
but only that specific package (and it's deps).

It's just about a division of fully orthogonal aspects of the software.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/LLNL/spack/issues/1696#issuecomment-246775219, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AB1cd52LZJaKoi8i5Xe9dsBUh0R4LAYOks5qpunugaJpZM4JzWOg
.

citibeth on 13 Sep 2016

👍1

@ax3l Something you might be interested in is my proposal in #1576. Specifically, I recommended only running Spack's unit tests when the core Spack libraries are modified, and instead running package-specific tests when a certain package is modified. For example, if you add a new package, Travis could make sure it installs. If you add a new version to a package, it could check to make sure it is downloadable or that the previous patches still work. We ended up merging #1576 prematurely to get the documentation updates in, but I can always get back to working on package-specific tests. @tgamblin was concerned that spack install would take too long for Travis, so I would probably leave those out for now, but I think there was talk of running those on another cluster ourselves.

adamjstewart on 14 Sep 2016

👍2

I mentioned elsewhere that e.g. XSEDE (or NCSA?) are offering cloud services where people can apply for allocations, and then run tests like these. I'd assume that this requires more manual setup than Travis, but a respective Docker image should not be too difficult to define.

eschnett on 14 Sep 2016

👍2

Oh that sounds great. I feel I accidently pushed the main question OT here, sorry for that.

Should we maybe document the "future splitting of CLI and package repo" and its implications on testing orthogonal parts of the software in a separate proposal issue?

For the initial issue, I think the change is fine if we ignore the questions arising from develop versions for now.

ax3l on 14 Sep 2016

@scheibelp

On a side note: with the above in mind does github guarantee that two .zip downloads of a repo for the same commit hash have the same checksum?

I could not find that guarantee nor an API in the docs for tags/sha's. For version(branch="...") the zips checksum pointing to the branches HEADs will be subject to change. (Where "is commit signed with KeyID ...?" could still be a valid check if one wants to follow a branch. Nevertheless, GitHub web merge-commits will not be signed for example.)

I would generally go for tag + sha on github and clone with a short --depth. Technically you are right, the tag is kind of superfluous and could (well: shouldn't) change but it makes the version() way more readable.

ax3l on 14 Sep 2016

@ax3l : see https://groups.google.com/forum/?fromgroups#!topic/spack/zGs-0RKz3_U that includes a discussion about splitting of Spack into separate repositories.

davydden on 14 Sep 2016

👍1

Summary

Ok I'm just getting to this discussion. Go to Taiwan for a week, come back to 500 spack emails 💥. I'm glad people think Spack is fast, though! That was actually quite surprising to me 🚀✨. I'll try to summarize and clarify:

@citibeth wants to make sure all Spack downloads are secure. The way we trust sources is currently rather inconsistent and depends on the fetch strategy.
Finally, @citibeth proposes "trusted downloads" and a per-package setting for how to handle them. She points out that --no-checksum should be reconsidered.
@ax3l gave a really nice summary of the security schemes and rationales from existing package managers, which is a good guide. I also like the summary of what Spack offers.
@ax3l mentioned using git's GPG signing. I think there are enough repos we care about where people don't sign their commits that this can't be an exclusive solution, so I'd rather look at hashes. We can probably consider trusting particular GPG keys eventually but I think a consistent trust model ought to get done first, before we decide what we think of that.
We talked about better ways to test Spack. I am working on getting testing set up at LLNL, NERSC, and in AWS. That is going slower than I would like, but the eventual intent is to have real package tests running in lots of places. Apparently XSEDE and/or NCSA is offering public testing resources, and I did not know that. I would definitely like to run our tests there if they are offering. Conda Forge is a really nice example of what Spack should probably have.

Basically I agree with all of the above and if someone wants to work on this that would be great. A few points I didn't see mentioned above:

I don't think cached archives are so different from remote downloads, so I don't think "permanence" is as much of a special case as you might think. Cached archives are actually fetched using curl, using the same fetch mechanism as remote downloads, and it does checksum the archive, _except_ when it's a tarred up repo.
I think git and hg will be easy to download and verify using commit hashes. I also think we can come up with a way to cache them that allows the commit hash to be verified again.
I don't know what to do about svn.

Suggestions

Finally, on the security scheme itself, here are my suggestions:

I don't think "trusted" is the right word, as it's overloaded with SSL terminology, and one should really specify _who_ is being trusted. I like "verified" better -- as in "verified" with a checksum.
I think the user should probably be able to specify several trust levels:
1. Trust only checksum-verified downloads. We should probably switch to require SHA-256 for this, as homebrew has done.
2. Trust remote server certs (i.e., you're ok if SSL trusts the certificate, as with svn)
3. Trust anything.

Maybe the context is narrow enough that you could use trusted to mean something downloaded form a trusted svn or https server, and verified to mean something you checksummed, with a checksum from some second party (like the Spack repo).

maybe this could go in etc/spack/defaults/config.yaml, and you could override it in a site- or user- config.yaml, or per-package in packages.yaml:

# allow only verified downloads
allow-downloads: [verified]

User override in ~/config.yaml:

# allow both verified and trusted downloads
allow-downloads: [verified, trusted]

The other thing I should mention is that there are going to be other situations we care about that don't quit fit the above model. More below...

Binary caching

If you've been following recent telcons, we've been talking to the Fermi guys about adding binary caching. We will likely not be adding hashes per configuration to package.py files for binaries, mainly because there would be zillions of them. Unlike homebrew, there is not a 1:1 or 1:<small number> relationship between package.py files and binaries.

Generally to scale this you use some kind of signing setup, and in discussions with LLNL folks we came up with this scheme:

Spack ships with an LLNL GPG public key.
Binary mirrors host installation tarballs whose names include their full spack DAG hash.
Mirrors ALSO host signed metadata files per tarball. The signed metadata files would contain:
1. A mapping from Spack hash to the SHA256 checksum for the tarball.
2. The full spec.yaml for the tarball.

Metadata might not be a file-per-tarball thing, but at any rate it's signed. With this scheme, you trust the LLNL GPG public key that comes with Spack, and you rely on that to tell you that a particular binary checksum is ok. You then verify the tarball with the checksum you read from the trusted metadata file. This is not unlike debian package signing. it also scales to lots of tarballs.

Now, what if LLNL gets compromised? We release a new version of spack with a new key, and we revoke the old one.

URL downloads

@ax3l mentioned quick "staged installs". We've talked about quick developer installs with @DavidPoliakoff pretty extensively in #1158 and #1151. That would allow people to do quick installs from URLs. I consider that something I wouldn't put in a mainline Spack repo, but I might like it in a bundler file or something similar. I think David has that on hold -- it's not a short-term priority, but just listing some precedent. It seems like a useful use case that could give you fast turnaround for interactions directly with a developer (that's what he wanted it for -- working w/his users).

tgamblin on 22 Sep 2016

👍1

@ax3l I'm interested in how Spack compares to EasyBuild, from your viewpoint? I've been meaning to set up a page presenting a comparison between EasyBuild and Spack (endores by @tgamblin), any input regarding that is interesting.

(I don't want to hijack this issue though, let me know if there's a better place to discuss this)

boegel on 22 Sep 2016

Sorry for the late answer, did not try EasyBuild before.

A question on "security" for trusted downloads: I just added a new CUDA release manually an... good lord, why are we still using md5 hashes in the versions? Why not sha512 or something that is known to be... secure?

ax3l on 13 Jan 2017

I plan to switch to sha256 to be future proof based on the latest NIST recommendations, but maybe a little clarification is needed here. We using MD5 to checksum downloads. MD5 is broken as a cryptographic hash for collisions. That means you shouldn't use it for things like certificates, but it doesn't mean it's broken (yet) as a download checker. See these two answers on the security stackexchange:

Think about what you would need to do to attack an MD5-checked download. You would need to come up with a second tarball that has the same MD5 sum as the source tarball, and you would need to ensure that it responds to at least ./configure (or whatever the build instructions are) in a way that attacks the installer. That is a very hard preimage problem.

Either way, we should switch, but Spack is not insecure (at the moment) just because we're using MD5.

For The EB/Spack matrix: It's worth noting that the vast majority of EasyBuild packages are not checksummed at all. e.g., here is openssl with no checksum over raw http. 😬

tgamblin on 13 Jan 2017

👍1

I see, the warning signs for broken pre-image resistance are only alarmingly high, but not yet reported ;)

(In case someone finds a way to predict hashes from additional padding data, it would indeed be possible to build a tar with a configure script and fill it up with crumble data to fit the hash.)

Thanks for the clarification!

ax3l on 13 Jan 2017

😄1

@tgamblin fwiw: I have plans to fix the lack of consistency there, probably combined with a switch to sha256 or sha512, because, why not... Just need to find some time for it.

boegel on 13 Jan 2017

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

filter_compiler_wrappers() does not wok on Cray systems as advertised

balay · 33Comments

Relicense Spack from LGPL-2.1 to MIT/Apache-2.0

tgamblin · 39Comments

Lmod, Core compilers, and Spack

adamjstewart · 46Comments

spack crashing during openmpi installation due to fortran issues

amklinv · 39Comments

Multiple providers found for blas

davydden · 36Comments