Actual behavior
first of: this is feature request rather than a bug report
i am running a private cluster and sometimes run into the occasion, that the push to the (on-cloud) registry receives a 504.
I am aware, that this is an issue with the cluster itself, but this doesn't make the feature request any less valid:
please wrap the actual docker push command into i retry loop.
additional thoughts on that:
--retries=0 as default to --retries=10 or more)Expected behavior
avoid starting kaniko all over again and computing everything all over again.
have it retry the push only
To Reproduce
Steps to reproduce the behavior:
i would assume it should be possible to shutdown the target repository for the first few retries to test/implement that
Additional Information
irrelevant, i suppose
best regards and thanks for this marvelous project!
I just ran into this on our integration tests for skaffold - it would be very good if we could retry temporary errors in Kaniko, Docker also retries pushes:
error pushing image: failed to push to destination gcr.io/k8s-skaffold/skaffold-example:v0.37.1-223-gd2701f2: Patch https://gcr.io/v2/k8s-skaffold/skaffold-example/blobs/uploads/ABTmro6w_NXP3bfCN4xRTOpqbI8zPHZLwF0fTLX-NZaZUHaPliXwgjlx9nB31-z1prYDx0Rss-9fXdxowbL1vLs: io: read/write on closed pipe
Retry is already in https://github.com/google/go-containerregistry/pull/459 but kaniko needs to upgrade this dependency.
Did extra checking and I think the issue is that just tryUpload part but not checkExistingBlob part in https://github.com/GoogleContainerTools/kaniko/blob/e0e59e619c03da1e60e9e9520aee5cc741000e3d/vendor/github.com/google/go-containerregistry/pkg/v1/remote/write.go#L291 is wrapped in retry
+1 Very much desired functionality
I have Kaniko jobs that run in parallel on my CI system. If the system is busy, I see these errors:
PUT https://registry.mydomain.com/myimage/manifests/latest: MANIFEST_BLOB_UNKNOWN: blob unknown to registry; sha256:7bbd61231447a971b972bf6e62b7e5aecc52c8edbdf1cf372d63aa1a3b1ed821
This appears to be some problem on my registry side, but it only happens in high traffic. It would be great to have a retry, since the build succeed, but the push temporarily failed.
+1 Continue waiting for this functionality.
+1 Very much desired functionality
:+1:
in my experience p3 is the "edge feature" / "nice to have" category. i would kindly request bumping it to p2.
Since the continued annoyance of devs and ops using this project seems not to be enough: please think of the excess co2 produced by the unnecessary recompilations of already successfully compiled steps. (yes, the 'think of the trees!' argument)
Most helpful comment
+1 Very much desired functionality