Conan: Authentication of downloaded binaries

Created on 13 Sep 2016  路  43Comments  路  Source: conan-io/conan

We are evaluating usage of Conan in CI infrastructure of Qt Project (http://qt.io), however concerns were raised that Conan does not guarantee that downloaded binaries could not be replaced silently in unpredictable moment.

In IRC discussion @memsharded proposed following solution: add a tool that captures list of installed dependencies of current project, and produces manifest file with hash sums allowing to verify them.

Feedback please! hook

All 43 comments

Current manifests contain md5 checksums, could be forged, consider upgrading the checksum (might be a complicated migration, though).

An additional approach would be package-signing, as a way to authenticate and trust the creator of the package, irrespective of the conan.io or other remote accounts.

To specify the concern and look for a fix:

  1. Current manifest system allows to verify if the stored binaries matches with the downloaded manifest. You can check the integrity with conan install zlib/1.2.8@lasote/stable -i command. I think the md5 sum is good enough for this purpose. The collisions in md5 are possible but can be considered impossible. I think update the hash algorithm won't fix anything.
  2. But if someone alters both a binary and the manifest file the conan install -i check will success. <= problem
  3. You can check the remote for updates (conan install -u) and see if the package binaries has been changed. I'm analyzing the conan's source code and I think we can improve it easily. Now, if the timestamp is the same but the files' md5 changes we print Current package is newer than remote upstream one. It is not correct, if the timestamp is the same but not the md5s is clearly a bad package.
  4. But what if we don't want to update a package (really updated in remote) but want to check the integrity? The answer is we can't, we don't store the history of versions in remotes.
  5. About the downloaded binaries could not be replaced silently in unpredictable moment, well I suppose the better way to achieve this is to use conan server behind a nginx or similar with https protocol with valid and public SSL certificates. Then if files are altered in the consumer computer... Well, I think the hacker can directly do what he wants instead of playing with the conan binaries.
  6. About the GPG signing, it is a great thing to guarantee the authority, especially for ensure that the author has uploaded the package and no one has spoofed his account. But it requires big work to implement it.

Please, I'm sure I'm missing something, let's write all the conan's security weak points in this issue and then we can look for the best fix.

@lasote, I think the approach is not related to integrity checks (which are basically to ensure that you didn't modify by mistake files in the local cache, e.g. editing a header that was navigated to by the IDE), nor to update.

The feature would be more related to snapshoting the manifests, something like the trusted hosts in ssh. So, with this feature enabled, conan install could just copy manifests to a "conanmanifests" folder, just besides the consumer conanfile, which can be also be committed to version control. The system could ask for permissions when adding new manifests that were not there before, like "do you trust this new package?"

Such "conanmanifests" folder can be later used from a clean clone to conan install and ensure that all installed and used packages match the existing ones. In that way, it is almost impossible to forge/replace an existing package/binary without the user noticing.

What about the updates?

For updates user sets higher package version in conanfile and updates manifest accordingly.

Updates can happen also without changing version numbers, as packages can be re-uploaded by the package creator.

I think that updates goes through the same procedure as a normal conan install, just changing the behavior for updates, instead of throwing an error, just telling or requesting the user for confirmation.

Updates can happen also without changing version numbers, as packages can be re-uploaded by the package creator

And it's undesirable if such update happens silently, as it will break reproducibility of the build. For example in repositories of Linux distibutions such behavior is prohibited by policies, instead packages have separate packaging version numbers.

Yes, totally agree. So far, conan allows package creators to do that if they want (it was a requested feature, as it is something very common for continuous development, a different use case than depending on very stable third parties), but that policy might be enforced at some time, maybe for "stable" channel packages.

And even if they are not allowed by the regular tools, there is always a risk that they are replaced by hacking, so the mechanisms to avoid such silent replacement (package signing, package checksums verification) have to be there.

BTW, I've just detected that released conan-win_0_12_0.exe file was silently updated without changing version number. Behavior like this is not acceptable for public releases.

@annulen yeah, sorry about that. Crazy release. It's not the normal behaviour.
About the manifest mechanism, now I get it. As I said I lost the chat between you and didn't understand it. thanks

That's my fault as well, I should have been provided full context in the issue description, not just conclusion

So far, conan allows package creators to do that if they want (it was a requested feature, as it is something very common for continuous development, a different use case than depending on very stable third parties), but that policy might be enforced at some time, maybe for "stable" channel packages.

Tha's fine, but I think it would be a good idea to add another property to conanfile.py named e.g. "package_revision", and require it to be defined in new packages. conan install should support both "old" way zlib/1.2.8@lasote/stable implying that you request latest package available for zlib 1.2.8, or exact package revision zlib/1.2.8-1@lasote/stable.

I have started to implement the manifest check. Initially I implemented an "inline", implicit check that was configured in conan.conf configuration. But I have considered again, and I think it is better the explicit, command driven approach.

So my proposal for it would be something like:

$ conan verify

That would take the conanfile being used in the project (accept path as optional param), get the full dependency tree and capture all the manifests in a folder besides the conanfile, default "conanmanifests" or something like that, but configurable also on the command line. Typically this folder can be commited to version control if desired, so changes in deps can also be tracked. Behavior can also be defined, as "ask always", "yes to all", "allow updates", etc, so it will prompt for new manifests, error or update for changed manifests, etc.

Then, running the $ conan verify command again with different behavior parameters can be used to check for integrity, update when new requirements, etc.

After considering several different options, like doing verification for the whole local conan cache, I think this is the more convenient, clear and pragmatic way to do so. And what is more important, it is probably the safest, as it protects against any change, mistake, or tampering, even against the local conan cache.

Is this ok? Please tell me, I will try to finish the implementation next week. Thanks!

Looks good to me, but I think it would be more logical to check manifest when running conan install, e.g. add another command line option like -v path/to/manifest.

After considering several different options, like doing verification for the whole local conan cache, I think this is the more convenient, clear and pragmatic way to do so. And what is more important, it is probably the safest, as it protects against any change, mistake, or tampering, even against the local conan cache.

Yeah, that's exactly what I had in mind.

For updates I still think it would be better to have package revisions, like I've described above, in this case it would be obvious that new revision cannot be verified against old mainfest entry.

Also it doesn't seem logical that command conan verify captures state instead of doing verification.

And for true paranoids there should be a way to gent an equivalent of manifest entry for locally built package, so that package author can compare it with manifest generated elsewhere Just show manifest entry and modification date in web UI in detailed package info

Yes, the conan verify is just a poor naming, probably will call it conan manifest capture, conan manifest check, etc. Also, probably ends as an option to the conan install command, as you point out, I think it will be clearer while defining the use cases and the tests, I am not very concerned about it yet, as long as the workflows are clear enough.

About the package revisions, that is true, if package creators follow that convention, that would really help, but with a small change: having version numbers that diverge from the real version of the library that is being packaged, is rejected by quite a few users, so the package revision should be probably associated to the channel. As right now, we can't guarantee/enforce this behavior, lets keep focus in this issue of the authentication side, as both things can evolve separately.

Yes, the conan verify is just a poor naming, probably will call it conan manifest capture, conan manifest check, etc

FWIW, Gentoo's portage uses ebuild digest command to capture hash sum of package sources when adding new package to the tree (and checks it automatically when package is actually built)

having version numbers that diverge from the real version of the library that is being packaged, is rejected by quite a few users

Sure, that's why it has to be a separate field (e.g. in FreeBSD it's PORTREVISION variable and software version is PORTVERSION)

we can't guarantee/enforce this behavior

That's not needed to enforce anything, but having this in place will simplify dealing with well-behaved packages.

The final proposal would be opt-in for the capture and verify using:

$ conan install --manifests

Equivalent to $ conan install --manifests=capture

Installs the manifests in the .conan_manifests folder (default), configurable with --manifests-folder=PATH, with PATH either relative or absolute
Optional --manifests-interactive for interactive user prompt (yes/no) to add new manifests

$ conan install --manifests=verify

Do not add new manifests, but check them against previous stored versions. Same --manifests-folder=PATH argument as above

Slightly simplified the interface, the default "capture" was a bit obscure:

Capture manifests:

$ conan install --manifests

Capture manifests in specified folder:

$ conan install --manifests="my_manifest_folder"

--manifests can be substituted by --manifests-interactive for prompting the user before accepting to install a new manifest.

Verify manifests:

$ conan install --verify

Verify manifests against specified folder:

$ conan install --verify="my_manifest_folder"

With regards the md5/sha1, it is not necessary to upgrade the algorithm, neither of them are broken for second preimage resistance, which is the use case being used here, so we can keep with the md5 and avoid such a migration, that can be very complicated.

Well, I'd rather stick to the safe side and upgraded to sha1 at least. While md5 may be not broken with respect to this particular attack, it doesn't provide proper collision resistance => brute force can be a viable strategy to produce second preimage. (I'm not aware of any practical attempts to do it though)

conan install --manifests

I think it would be useful to have a variant of capture command that is not combined with install

At least with current state of the art, there is no viable strategy to produce such a second pre-image that matches the md5 hash. Broken collision resistance does not imply at all broken second pre-image resistance, so we are safe.

Please consider that upgrading to sha1 implies a massive migration, all the existing recipes and package binaries have to be migrated to the sha1, both in servers and in local caches of users, keeping everything in sync. This is a very complicated and risky thing, and can involve a huge amount of development work and troubleshooting. So it seems a bit overkill and premature right now.

The label "fixed" is applied when features are merged in develop branch. Issues are closed when they are finally released (this one will be in 0.13)

There are two closely related issues:

  • How tool packages like cmake_installer should verify their binaries? Surely it's possible to check has sum of downloaded file right in conanfile.py, however it seems to me that more generic manifest-like approach would be beneficial here
  • Similar issue with installing packages from sources, sources archives also need authentication

Should I create new issue(s) for these items? For Qt project they are not critical for now, as we use binary packages and decided to postpone cmake_installer use.

I think for the first one, the current proposal is valid, they can be verified as libraries too, just by capturing their manifests in a given folder that can be used afterwards. Or is there something that I am missing?

For the second one, there are two different approaches to get the sources with conan:

  • The in-source, which has the conanfile.py in the same repo as the source code, and uses the exports feature to snapshot the source code into the recipe. This approach is already checked in the current proposal, as package recipes also have their own manifest, which is checked too in the verify process.
  • The out-source, in which the conanfile.py uses the source() method to retrieve source code from external sources. Current helpers in tools implement the check for different algorithms: tools.check_sha1, tools.check_sha256 and tools.check_md5. They can be easily used in the package recipe to authenticate the external sources.

Released in conan 0.13.1

It seems to me that current implementation does not play well with version control and code review

  • Manifests are stored in deeply nested tree with hashes in paths, which makes it hard to review changes.
  • Manifests can be large, e.g. for cmake_installer manifest has > 4K lines

https://codereview.qt-project.org/#/c/172823/

Yes, I see that it's exactly the same structure as local cache of Conan, but honestly I expected to have a single file with hash sums of files that were downloaded from network. If this is not compatible with your architecture, I think it would be fine to store hashes of manifest files, check hashes of downloaded manifests, and then use contents of those manifests to check integrity.

Sorry for not raising these concerns before release :(

Thanks for the feedback, let me expose what we thought to come up with the current implementation:

  • We thought that having the full manifests was useful, it would allow to diff it to be aware of the changes when something happens. It is not implemented right now, but it is not difficult to do.
  • The sizes of manifests didn't seem a problem, especially because they are designed to not change that much. Custom output folder allows for output-ing to its own repo if desired. Git uses compression for storage, and is very efficient with packfiles for changes, so those files sizes shouldn't be a problem at all for git.
  • The important thing is that the hashes from the downloaded files from the network didn't seem a guarantee. The downloads are often called "conan_package.tgz" which is a compressed file for the whole package contents, but it will be typically removed after transfer. Other packages (not used in the dependency tree, maybe retrieved to test, to play, for other pet projects) can tamper/hack the local cache, replacing one file with another.
  • Doing a hash for the full package contents, as the "tgz" that came from the network, would be more inefficient, as it will always require to read all files, zip them again to compare the hashes. Just checking the hashes of the manifests (cache manifest against project manifests) is not enough, we actually need to check that the files in the cache folders match the declared manifest.
  • Manifests that actually list the files are more robust than just reading all the files in a folder, as some OSs can introduce other temporary, hidden files into folders (.DSstore, .thumbs, etc). Conan would need to take care of filtering out those files to compute the hashes or they will never match. It is likely that some of those files will not be filtered, and troubleshooting can be a thing.
  • Using that folder layout is done for maintenance and robustness: A single file instead of a hierarchy of folders would be more difficult to implement and more error prone. Please note that new binaries for the same recipe can be more easily added/removed to the folder layout than to a single file. Actually the current approach allows concurrent installations of manifests, while the file would have to implement locks. There are users that are currently building different binaries concurrently, so it is important to be taken into account. It is not that it cannot be done, but indeed more complicated and error prone.

These were our thoughts, we know that they have tradeoffs as those you suggested, but for those above reasons, the current implementation might be a good first approach. Please tell me what you think, thanks!

Just checking the hashes of the manifests (cache manifest against project manifests) is not enough, we actually need to check that the files in the cache folders match the declared manifest.

Could you explain this point please? AFAIU, checking hash (e.g. SHA-256) of downloaded manifest file protects it from contents change, and then we use md5 sums inside manifest file to check individual files in a package.

Oh, yes sure, I probably didn't understand properly. Yes, that could be doable. But other issues like concurrency (folder layout) still remain... Will have a second look.

Anyone else, more feedback about @annulen suggestions?

Using that folder layout is done for maintenance and robustness...

OK, still I think it would be nicer to have one file per package&configuration pair, containing hash sums of conanfile and manifest, instead of 2 files, e.g. file cmake_installer/0.1/lasote/testing/conanmanifest.txt with contents

exports: <manifest hash here>
package/354fbd74f1ae59f60305867d3d280c6d3cb25fbc: <manifest hash here>
...

Actually, any hierarchical format would be fine, JSON for instance.

And I still don't see how concurrency can matter in operation "capture manifests" which is done rarely, is not performance critical, and can be executed after all installation is completed

Is not that rarely done. Some conan users are just firing in CI concurrent building of a package or project for different configurations, one configuration per process. If they want to use this feature, which makes sense, so they get their project release, together with the manifests for their dependencies, they will likely have concurrent access to the project file containing the manifests.

Even more likely if it is executed after the first install (that could take different times for different configurations), because configurations of already installed deps will basically take exactly the same time, so the users should be explicitly warned against launching this in parallel, which they would likely do, because they are already doing that for the normal install. But we can't detect that they are running in parallel, so if they do => race conditions over the manifest file, unless we start to use locks on it.

It sounds like you are talking about manifest verification step (--verify) which indeed can be performed often and is performance-critical, and I'm talking about capturing manifests which is done once when dependencies change. Sorry if I'm misunderstanding you.

Oh, you mean that concurrent CI processes start capturing manifests to the same output dir? It doesn't make any sense to me, and it as well can have race if 2 CI configurations somehow use the same configuration of single package. Looks like you need some locking, or just use different manifest dirs for different configurations. (In Qt we even use separate conanfile.txt per CI configuration)

Yes, that is true, if you run concurrently the same configuration over the same package but that doesn't make sense either. That is why the CONAN_USER_HOME is useful for, to be able to run concurrently different processes that could eventually build the same package for the same configuration (like jobs of two different projects).

But this is something that is already been done, by users, in their companies. Building the same project, concurrently for different configurations. No package will be generated twice, so no problem here. And they are not splitting into different conanfiles, they are using the same for all CI configurations, and providing variable options and settings over the command line.

OK, another idea: in manifests directory create subdir, which is a hash of OS and compiler settings (like this is done for package builds), and create there single file (e.g. in JSON format) with all hash sums for all packages. If all JSON keys are sorted e.g. alphabetically, it will be easy to review any changes made to package configuration without looking out for files getting removed or moved to different paths (git and gerrit do not always visualize such changes properly)

what would be so hard about signing? a signature is simply an additional file. it would have to be created during uploading (with some option like --sign-with ) and checked during installation. and users who need this would probably be ready to go the extra mile and fulfil key infrastructure requirements, others should not even notice (they could just ignore the file).
It could be done on top of conan i guess, but it would be much harder.

would there be any interest in a corresponding pull request? First step would be generating a signature and transport that along with, maybe, the manifest.

Hi @maddanio

The code for signing might not be complicated, but doing a conan release (pip packages, installers) that all include the signer (GPG, for example), might be a more complex task that it seems at first sight, specially if it has to work accross so many platforms (including Windows), OSX and linux distros, both in pip packages and in pyinstaller generated binaries. Anything that is built and distributed in conan should be MIT-compatible too. At the moment we are not integrating anything that cannot be managed as a pure python pip requirement. Can you elaborate a bit further what would be your approach, which tools would you use, etc?

I think this could be one of the strong candidates for the "plugins" feature that is right now under development. In this way, signing packages can be done by adding a plugin, and same for checking the signed packages, and this could be done with different licenses, and no need to create pre-compiled installers with them.

Hello,

So how far off is usability of this plugins infrastructure? Thing is we would like to use this asap (i think our admin will really hold of conan usage until we have signing).

Couldn't this be made optional with options (--sign on upload, --check-signature on install) that only work with available gpg packages? Also the server would not need gpg ever, since all it would have to do is transport the signature file.

Hi,

The plugin infrastructure is ongoing development, I think it will not be available until 1.8 or 1.9 (1 or 2 months).

The feature migth require more changes than just the upload function, as packages and recipes have a manifest file with the md5 checksums of the files that go there. Adding an extra file on the fly would probably confuse conan, and might require extra changes. Also, checking signature requires checking in other places, as there are other commands that can retrieve things (create, info, for example). Other users might want to use other signatures, sounds more like an extension point than a builtin feature.

This PR contains some work to use GPG checking of downloaded stuff: https://github.com/conan-io/conan/pull/2356. We also have concerns regarding it.

In any case, probably this discussion belongs more to this issue: https://github.com/conan-io/conan/issues/773, please continue there. I am pushing that feature to be discussed, but I doubt there will be time enough to move it forward soon enough.

May I ask why is it so important to have this signing? Don't you have the infrastructure and your own conan servers (Artifactory) under your control? That is, packages doesn't need to be signed because all the packages are yours. Thanks for all the feedback!

Was this page helpful?
0 / 5 - 0 ratings