What should our strategy be for external dependencies in Mason given research gathered in previous research spike? An answer to this spike includes the beginning of a prototype.
#### 1) Extend mason via a third party package manager
Conan - An extensible C/C++(primarily) package manager
- Has an active developer community (100 contributors)
- Provides easy generator method to export build information via JSON
- Allows for use of a number of build systems with simple configuration
- Developer team has expressed interest in helping us
- Good documentation
- Lacks a number of HPC packages
- Smaller curated registry than Spack (~100 in conan-center, ~300 in bincrafters)
Spack - A Flexible HPC package manager
- Active developer community (~300 contributors)
- Created with HPC in mind
- Multi-version dependencies, similar to nix in that respect
- Specify compiler per dependency
- More curated packages, many of which are HPC specific (~3000)
- Works specifically with Cray modules
- More overhead to installation and first use for new users compared to Conan
Both PM's:
- Written in Python
- Good documentation
- Tested daily with Travis and passing builds
- Work with multiple languages
Some example use cases of the build command are:
This option is how rust goes about managing external dependencies. Rust/Cargo, more or less, puts it all on the user to figure out how to build their program if it have external dependencies. They provide a number of fields in the configuration file to make it easier for users to use such build scripts such as [build-dependencies] and specific build script overrides. These could easily be included in the Mason.toml as well should we decide to go this route.
apt, brew, pkg-config, etc.. The user would then specify what packages that they wanted to include in the Mason.toml and Mason would search the known locations of such packages prompting for manual entry upon failure. This is the quickest route to get up and running with external dependencies since most users already have these system package managers installed on their machines already. This option puts the majority of the work on the user.
1) (Option 1: Spack) Interface with Spack and allowing Spack to handle dependency resolution and package installation while still building with Mason.
2) (Option 2 + 3: apt/pkg-config/etc + build script): Download and install with system package managers and offer multiple methods of building such files (build.py, buildspec.toml, etc) all specified in the Mason.toml
If we went with Conan or Spack, what would happen for something that depends on a library that is typically already installed? Say the OpenSSL library, for example. Just to get a sense of portability, I ran pkg-config --list-all | grep ssl on 3 systems: Mac OS X (with homebrew), Ubuntu 18.04, and SLES 12. Not only did the command succeed but all 3 (happened to) report that libssl was installed. So, not only did these 3 systems all have the library information available with pkg-config but also it had the same name. E.g. pkg-config --libs libssl works on all 3 to show the necessary link arguments. I think the pkg-config approach is reasonable and easy enough for things normally included in an OS these days.
My guess here is that it will make sense for us to both provide pkg-config integration and Conan or Spack integration, where we would use pkg-config for system-wide-OS-distribution-installed things and Conan/Spack for things that exist in them and are probably less likely to be installed in /usr/include and /usr/lib (say). E.g. pkg-config is appropriate for libssl but Conan/Spack would give us more esoteric libraries that would commonly be downloaded during regular use of mason.
But, my feeling on the matter might change if Conan or Spack already included pkg-config (or apt/dpkg or brew or yum or whatever) integration to get system-installed packages. I suspect they do not, but that would be really good to know.
(Option 2 + 3: apt/pkg-config/etc + build script): Download and install with system package managers and offer multiple methods of building such files (build.py, buildspec.toml, etc) all specified in the Mason.toml
I don't understand why a build script is necessary in the main use case I see for pkg-config, which is where there is a system-installed library we want to use. We'd simply query pkg-config for compile and link flags that the library requires and include them in the build command. Why would somebody need to do more than that?
Separately, is there some set of these we're pretty sure we must have eventually? I.e. for flexibility reasons, are we always going to need the build script strategy, no matter what else we also support?
Are we going to need more general script-ability of Mason TOML files anyway? Say we have to run a command to figure out what package name we want to use with pkg-config - how could that work?
@mppf - What is the benefit of not managing libraries such as ssl through the external package manager? Is it an optimization in package installation time?
I think more control of package dependencies and their versions is better in general. OpenSSL is an interesting example. Say a user was using a SLES 11 system without sudo access and had a package that depended on the latest SSL. If we relied on pkg-config and the system package manager in this case, the user would be out of luck.
@mppf - What is the benefit of not managing libraries such as
sslthrough the external package manager? Is it an optimization in package installation time?
I don't think one should be required to install such packages if they are included in the OS. It significantly reduces the amount of code necessary to download to build a package. Besides that, OS version of packages (if they are available) are probably better tested than alternatives.
But, my opinion here would change if we knew that Conan / Spack knows how to work with these OS packages.
Say a user was using a SLES 11 system without sudo access and had a package that depended on the latest SSL. If we relied on
pkg-configand the system package manager in this case, the user would be out of luck.
Sure, in that event I'd hope that they could specify both pkg-config and Spack/Conan dependencies for the same package and we'd try the pkg-config one first (since it easier / faster / more stable).
I don't understand why a build script is necessary in the main use case I see for pkg-config, which is where there is a system-installed library we want to use. We'd simply query pkg-config for compile and link flags that the library requires and include them in the build command. Why would somebody need to do more than that?
Are we going to need more general script-ability of Mason TOML files anyway? Say we have to run a command to figure out what package name we want to use with pkg-config - how could that work?
@mppf Build scripts would be more of a bonus to users than a requirement, and if we interface with a system package manager they might be design overkill as most build configurations could be set in the Mason.toml. For Rust, most of the build configurations are set in the config file. We could do something similar and expand it to support system package managers without needing a specific build script. But for a reference to what build scripts do in Rust and some of their build configurations look here.
But, my feeling on the matter might change if Conan or Spack already included pkg-config (or apt/dpkg or brew or yum or whatever) integration to get system-installed packages. I suspect they do not, but that would be really good to know.
Spack and Conan both offer a way to use libraries found on the user's system. Spack allows system paths to be included in the config(yaml) file
Spack and Conan both offer a way to use libraries found on the user's system. Conan uses imports , whereas Spack allows system paths to be included in the config(yaml) file
The documentation about Conan imports leads me to believe 'import' is a totally different thing.
The Spack documentation says it's better to use the Spack versions of packages, except in certain cases (one of with is OpenSSL). I find this surprising.
@mppf Conan offers a tool to interface with pkg-config, as well as many other integrations
I think my main question at this point is - are we trying to develop the plan for how this will work? Or merely choosing what to try to prototype next?
In terms of prototyping, I'd lean towards starting with pkg-config because it's relatively simple & we can get some basic functionality in this area pretty quickly. But, it would be reasonable to argue that it'd be better to go directly to Conan or Spack. It just depends on if the goal is incremental progress or a long term solution.
In terms of design, I think our effort there probably should go towards deciding if we want to work with Conan or Spack. (Or deciding that we'll eventually support any number of these). Are there questions that you'd like to answer in order to help us make this choice?
I see that with Spack, hashing your package is a regular part of releasing it: https://spack.readthedocs.io/en/latest/workflows.html#release-your-software and additionally they support GPG signing: https://spack.readthedocs.io/en/latest/getting_started.html#gpg-signing . I think such security / validity protections are really important and Conan stuff I've read seemed to indicate that checksumming is supported but optional and signing packages is a TODO: https://github.com/conan-io/conan/issues/773.
Anyway I'd personally lean towards Spack for this reason & because it's already working with the HPC community. The audience of cluster/supercomputer users tends to have slightly different requirements and I'd be happy to have confidence that we don't have to do all the work to get whatever we choose working in that context.
I think my main question at this point is - are we trying to develop the plan for how this will work? Or merely choosing what to try to prototype next?
I think we are aiming to answer the latter right now, but that is somewhat dependent on knowing what we want in the long term.
In terms of prototyping, I'd lean towards starting with pkg-config because it's relatively simple & we can get some basic functionality in this area pretty quickly.
I am also leaning this direction now. It seems like we could support non-Chapel packages with pkg-config as a stepping stone. Integrating with an external package manager could be a follow-on task to handle the case where a dependency is not available on the system or in the user-specified environment variables already.
Some questions:
pkg-config is available on the system (e.g. I don't think it comes on OS X)pkg-config to determine the version number?.pc metadata file for pkg-config to read?.pc file on OS X
- How will we ensure
pkg-configis available on the system (e.g. I don't think it comes on OS X)
Right, it doesn't come with OS X but does with homebrew.
- Can we specify version numbers and rely on
pkg-configto determine the version number?
e.g.
$ pkg-config --modversion libssl
1.1.0g
- What will the manifest file listing a non-Chapel package look like?
Good question. I think we're imagining it could be just included in an existing Mason package or another way of using it would be to create Mason packages for the dependencies. I think including in an existing package is easier to think about and you can build the independent Mason package out of that functionality. But we still have to decide on the preferred approach.
- How will we handle cases where a dependency is installed, but doesn't necessarily have a
.pcmetadata file forpkg-configto read?
At this stage -- give up / ask the user to write the .pc file.
From the OP:
Allow for third-party non-Chapel code to be retrieved, installed and compiled via script specified in the Mason.toml
I would like Mason to adopt a consistent interface for all dependencies. The C/C++ dependencies should be annotated as such in Mason.toml, but the other metadata should be consistently documented similar to other Chapel dependencies in Mason. I don't want to fall into a trap where Mason points to an arbitrarily executable local script that has to go with each external dependency; instead, I would rather that decision be offloaded to some notional "external dependency" backend that will support the actual retrieval and possible management of said dependency.
On the topic of which package manager, I would suggest building the necessary abstraction/interface and not be tied to any one technology if at all possible because of the lack of a dominant package manager for C/C++. Each of these package managers have enough special features that it's hard to completely abstract these details, but we should sufficiently try (sort of, kind of, similar to the model that Rust is using with Cargo, .. sort of).
I also do want to echo the viewpoint of packages already installed on a system. If I have a system that has manually updated packages by the sysadmins in a local directory, I want to be able to point to a specific version of a package, especially if it isn't tracked by a traditional distro repo manager like apt or yum.
This problem is fairly multifaceted. It'd be worthwhile to have a quick bake-off with two or three packages in each of e.g., Conan, Spack, distro repo managed, or locally available modules.
Progress has been made with the first prototype with pkg-config. In my current branch I am able to declare and use openblas and lapack installed on my OSX machine via brew in mason. An example of using the Linear Algebra library that requires both BLAS and Lapack:
Mason.toml
[brick]
name = "cholesky"
version = "0.1.0"
chplVersion = "1.18.0"
compopts = "--ccflags -Wno-enum-conversion --ccflags -Wno-strict-prototypes"
[dependencies]
[external]
[external.pkgconfig]
lapack = "3.8.0"
openblas = "0.3.1"
This Mason.toml is updated to reflect the necessary dependency information for Mason to build the package in the Mason.lock. Mason gets external dependency information, in this case, from the .pc files found via the PKG_CONFIG_PATH.
Mason.lock
[root]
name = "cholesky"
compopts = "--ccflags -Wno-enum-conversion --ccflags -Wno-strict-prototypes"
version = "0.1.0"
chplVersion = "1.18.0..1.18.0"
[external]
[external.pkgconfig.lapack]
name = "lapack"
pcDir = "/usr/local/opt/lapack/lib/pkgconfig"
version = "3.8.0"
libs = "-L/usr/local/Cellar/lapack/3.8.0_1/lib -llapack"
include = "/usr/local/Cellar/lapack/3.8.0_1/include"
[external.pkgconfig.openblas]
name = "openblas"
pcDir = "/usr/local/opt/openblas/lib/pkgconfig"
version = "0.3.1"
libs = "-L/usr/local/Cellar/openblas/0.3.1/lib -lopenblas"
include = "/usr/local/Cellar/openblas/0.3.1/include"
The source code of the example using the Linear algebra library.
use LinearAlgebra;
var D = {1..4, 1..4};
var A: [D] real = ((18.0, 22.0, 54.0, 42.0),
(22.0, 70.0, 86.0, 62.0),
(54.0, 86.0, 174.0, 134.0),
(42.0, 62.0, 134.0, 106.0));
var L = cholesky(A);
var U = cholesky(A, lower=false);
writeln("A:");
writeln(A);
writeln("L:");
writeln(L);
writeln("U:");
writeln(U);
This is not the final design, but it is close to how we plan on supporting external dependencies. In this design the packages listed under the [external] table, not a sub-table of [external], will all be from the same package manager. This example shows how one can quickly get up and running with their existing system packages using pkg-config. For Mason packages that go into the registry, however, we will want the external packages to come from a secure package manager such as Spack.
There are still a few problems with the pkg-config prototype:
PKG_CONFIG_PATH themselves.pkg-config does a poor job supporting multiple versionsMore to come!