Kaniko: docker push retry loop

Created on 24 Feb 2019  路  9Comments  路  Source: GoogleContainerTools/kaniko

Actual behavior
first of: this is feature request rather than a bug report
i am running a private cluster and sometimes run into the occasion, that the push to the (on-cloud) registry receives a 504.
I am aware, that this is an issue with the cluster itself, but this doesn't make the feature request any less valid:

please wrap the actual docker push command into i retry loop.
additional thoughts on that:

  • exponential backoff
  • activated by flag (like --retries=0 as default to --retries=10 or more)

Expected behavior
avoid starting kaniko all over again and computing everything all over again.
have it retry the push only

To Reproduce
Steps to reproduce the behavior:
i would assume it should be possible to shutdown the target repository for the first few retries to test/implement that

Additional Information
irrelevant, i suppose

best regards and thanks for this marvelous project!

areregistry kinfeature-request prioritp3

Most helpful comment

+1 Very much desired functionality

All 9 comments

I just ran into this on our integration tests for skaffold - it would be very good if we could retry temporary errors in Kaniko, Docker also retries pushes:

error pushing image: failed to push to destination gcr.io/k8s-skaffold/skaffold-example:v0.37.1-223-gd2701f2: Patch https://gcr.io/v2/k8s-skaffold/skaffold-example/blobs/uploads/ABTmro6w_NXP3bfCN4xRTOpqbI8zPHZLwF0fTLX-NZaZUHaPliXwgjlx9nB31-z1prYDx0Rss-9fXdxowbL1vLs: io: read/write on closed pipe

Retry is already in https://github.com/google/go-containerregistry/pull/459 but kaniko needs to upgrade this dependency.

Did extra checking and I think the issue is that just tryUpload part but not checkExistingBlob part in https://github.com/GoogleContainerTools/kaniko/blob/e0e59e619c03da1e60e9e9520aee5cc741000e3d/vendor/github.com/google/go-containerregistry/pkg/v1/remote/write.go#L291 is wrapped in retry

+1 Very much desired functionality

I have Kaniko jobs that run in parallel on my CI system. If the system is busy, I see these errors:

PUT https://registry.mydomain.com/myimage/manifests/latest: MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:7bbd61231447a971b972bf6e62b7e5aecc52c8edbdf1cf372d63aa1a3b1ed821

This appears to be some problem on my registry side, but it only happens in high traffic. It would be great to have a retry, since the build succeed, but the push temporarily failed.

+1 Continue waiting for this functionality.

+1 Very much desired functionality

:+1:

in my experience p3 is the "edge feature" / "nice to have" category. i would kindly request bumping it to p2.

Since the continued annoyance of devs and ops using this project seems not to be enough: please think of the excess co2 produced by the unnecessary recompilations of already successfully compiled steps. (yes, the 'think of the trees!' argument)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

astefanutti picture astefanutti  路  3Comments

ErikWegner picture ErikWegner  路  4Comments

maurorappa picture maurorappa  路  4Comments

PatrickXYS picture PatrickXYS  路  4Comments

HoiPangCHEUNG picture HoiPangCHEUNG  路  4Comments