Conan: Occassional random failure of package uploads on CI system

Created on 19 Apr 2019  路  5Comments  路  Source: conan-io/conan

My CI is set up to build Conan packages as soon as the new feature is merged into the stable branch. The CI system must build Conan packages for all platforms we support, namely Android (all ABIs), iOS (all architectures), Windows, MacOS and Linux (both with clang and gcc). We are using Jenkins as our CI system and step that performs the build is done in parallel on multiple Jenkins slaves. Since we do not want to upload any packages until we are completely certain that the building of packages has succeeded on all platforms, each slave does not upload built packages directly to the Artifactory. Instead, each slave performs conan upload * -r myRemote --all --skip-upload --confirm so that it prepares .tgz file, and then stashes it with using Jenkins' stash command (you can find more info here why we are doing that this way).

Finally, after all nodes have completed the building of packages, a single node then unstashes all the artifacts and uploads them to the Artifactory. However, with Conan v1.13 and later, the created package also contains file metadata.json, which contains some meta information about built binaries. However, when multiple nodes have been building different versions of packages on different machines, the built packages contain only the partial metadata.json file. When the "upload node" unstashes all built binaries, it continuously overwrites the metadata.json with each stash. This then makes it impossible for conan to upload all the binaries because the final metadata.json does not contain references to all package IDs available and it fails.

One possible solution for that would be to manually merge the metadatas using Jenkins Groovy scripting, but since I found that rather complex, I implemented a different approach - spawn new N parallel jobs (using Jenkins' parallel command), each on a separate node (just like when building), where each node will upload only a single stash. Thus, the "merging" of metadata.json actually happens on the Artifactory server and the upload is actually faster, as Conan still does not support concurrent uploading of packages (see issue #3452).

However, occassionally, at completely random, one or more parallel upload "jobs" fail with following stack trace:

[b] $ sh -c "conan upload \* -r 090a7942-cd0f-45af-9347-465cbbe94a6e --confirm --all --no-overwrite "
MicroblinkConanFile/3.0.2@microblink/stable: Not found in local cache, looking in remotes...
MicroblinkConanFile/3.0.2@microblink/stable: Trying with '090a7942-cd0f-45af-9347-465cbbe94a6e'...
Downloading conanmanifest.txt

Downloading conanfile.py

MicroblinkConanFile/3.0.2@microblink/stable: Downloaded recipe revision 0
Uploading to remote '090a7942-cd0f-45af-9347-465cbbe94a6e':
Uploading CoreUtils/2.2.0@microblink/master to remote '090a7942-cd0f-45af-9347-465cbbe94a6e'
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/conans/client/command.py", line 1579, in run
    method(args[0][1:])
  File "/usr/lib/python3.7/site-packages/conans/client/command.py", line 1182, in upload
    retry_wait=args.retry_wait, integrity_check=args.check)
  File "/usr/lib/python3.7/site-packages/conans/client/conan_api.py", line 93, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/conans/client/conan_api.py", line 868, in upload
    retry_wait, integrity_check, policy, remote_name, query=query)
  File "/usr/lib/python3.7/site-packages/conans/client/cmd/uploader.py", line 88, in upload
    integrity_check, policy, remote, upload_recorder)
  File "/usr/lib/python3.7/site-packages/conans/client/cmd/uploader.py", line 190, in _upload_ref
    self._upload_recipe(ref, conanfile, retry, retry_wait, policy, recipe_remote)
  File "/usr/lib/python3.7/site-packages/conans/client/cmd/uploader.py", line 230, in _upload_recipe
    remote, remote_manifest)
  File "/usr/lib/python3.7/site-packages/conans/client/cmd/uploader.py", line 344, in _recipe_files_to_upload
    if remote_manifest == local_manifest:
  File "/usr/lib/python3.7/site-packages/conans/model/manifest.py", line 132, in __eq__
    return self.file_sums == other.file_sums
AttributeError: 'NoneType' object has no attribute 'file_sums'

ERROR: 'NoneType' object has no attribute 'file_sums'

I've currently worked around that by retrying the upload several times until it succeeds, but this is a hacky and dirty solution I would like to remove from my Jenkins pipeline script.

Unfortunately, I don't have a deterministic way of reproducing the above issue, as it never happens on my development machine (which never uploads anything in parallel), so I hope that above stack trace will help you somehow track down the part of Conan's code that may be troubling.

Our CI servers use conan client 1.14.3 and the Artifactory being used is v6.9.0 Community Edition for C++.

high bug

Most helpful comment

Closed by #5014, will be released in 1.14.4

All 5 comments

Thanks for reporting. I've seen this error randomly also with Bintray.
I'll try to investigate.

About the issues with the "stash", I think you should open a new issue because it is a totally different issue. That approach is indeed very problematic for the metadata.json maybe we can discuss or figure out a better one. (Related to https://github.com/conan-io/conan/issues/3073)

This issue about the ERROR: 'NoneType' object has no attribute 'file_sums' might be a duplicate of https://github.com/conan-io/conan/issues/4953

Thank you for answering to the report at such a quick notice.

About the issues with the "stash", I think you should open a new issue because it is a totally different issue. That approach is indeed very problematic for the metadata.json maybe we can discuss or figure out a better one. (Related to #3073)

I wouldn't call this an issue since this is not something that is part of Conan's design. As Conan does not support uploading of rogue .tgz files (something that was requested in #3073), I am OK with that that I need to merge metadata.json manually, or even better, perform the upload of packages in parallel from multiple machines. Since support for concurrent upload is still not here (issue #3452), I actually get a faster upload by making it from different Jenkins nodes - the available network bandwidth gets used more efficiently.

However, this leads to some race conditions that you mentioned in issue 4953 (comment), since my workflow is highly parallel (see the screenshot from my Jenkins job below - the first attempt failed with the above error, but the second succeeded).
screenshot

Therefore, I think this probably is a duplicate of #4953.

Closed by #5014, will be released in 1.14.4

Was this page helpful?
0 / 5 - 0 ratings