Checklist:
argocd version.Describe the bug
I'm trying to deploy an ArgoCD Application that contains a configmap, two certificates (Certificate custom resource from cert-manager) and a KafkaConnect instance (from Strimzi operator).
I defined the following annotations: argocd.argoproj.io/sync-wave (to make sure to have configmap and certificates before the kafkaconnect instance) and argocd.argoproj.io/sync-options on CRDs. When the application is deployed, the sync gets stuck: it keeps saying OutOfSync and Syncing (see attached image). However, if I stop the sync (click on Syncing, terminate) and then Sync the application again, then it successfully deploys all the defined resources.
Although I am using Custom Resources here (Certificate from cert-manager and KafkaConnect from Strimzi), the related custom health seem to exist already (https://github.com/argoproj/argo-cd/tree/master/resource_customizations).
The main problem is that I have several applications of this kind, so I would like to be able to automate this (instead of relying on manually stopping the sync and restarting it for all these applications). Any idea?
To Reproduce
Deploy an ArgoCD Application that contains the resources mentioned above. The Sync phase will start by itself and get stuck.
Expected behavior
The Sync should not get stuck and continue, without needing any manual action (terminate Sync and start it again)
Screenshots

Version
argocd: v1.7.6+b04c25e
BuildDate: 2020-09-19T00:50:44Z
GitCommit: b04c25eca8f1660359e325acd4be5338719e59a0
GitTreeState: clean
GoVersion: go1.14.1
Compiler: gc
Platform: linux/amd64
argocd-server: v1.7.6+b04c25e
BuildDate: 2020-09-19T00:52:04Z
GitCommit: b04c25eca8f1660359e325acd4be5338719e59a0
GitTreeState: clean
GoVersion: go1.14.1
Compiler: gc
Platform: linux/amd64
Ksonnet Version: v0.13.1
Kustomize Version: {Version:kustomize/v3.6.1 GitCommit:c97fa946d576eb6ed559f17f2ac43b3b5a8d5dbd BuildDate:2020-05-27T20:47:35Z GoOs:linux GoArch:amd64}
Helm Version: version.BuildInfo{Version:"v3.2.0", GitCommit:"e11b7ce3b12db2941e90399e874513fbd24bcb71", GitTreeState:"clean", GoVersion:"go1.13.10"}
Kubectl Version: v1.17.8
I have been playing with a similar issue on version v1.7.8+ef5010c, in an app of apps scenario.
When the syncing hit a "Degraded" state, as part of the certificate issuing, it seems that the application syncing started waiting on everything all over again and would never get the healthy notifications.
I think it is an issue with the certificate health check, I overrode the default check by changing the argocd-cm.yaml file with the below (only replacing Degraded with Progressing):
data:
resource.customizations: |
cert-manager.io/Certificate:
health.lua: |
hs = {}
if obj.status ~= nil then
if obj.status.conditions ~= nil then
for i, condition in ipairs(obj.status.conditions) do
if condition.type == "Ready" and condition.status == "False" then
hs.status = "Progressing"
hs.message = condition.message
return hs
end
if condition.type == "Ready" and condition.status == "True" then
hs.status = "Healthy"
hs.message = condition.message
return hs
end
end
end
end
hs.status = "Progressing"
hs.message = "Waiting for certificate"
return hs
Seems to have resolved the issue for me (at least in the very small number of tests I have done since).
Thanks a lot, this seems to work for this specific ArgoCD Application! On other applications I have similar issues that require further investigation, but now at least I know how I can start approaching the problem.
Just wondering: it looks like the Certificate health check proposed by argocd itself (mentioned in https://argoproj.github.io/argo-cd/operator-manual/health/ and in https://github.com/argoproj/argo-cd/tree/master/resource_customizations/cert-manager.io/Certificate) causes this issue. Do you think it would make sense to open a PR to change that check, in order to fix that?
I can say this is happening also to us.
We have a very odd situation, where we have 4 environments, with pretty much the same configuration, e.g. same ArgoCD application deployed in all of them. And only ONE of those, very frequently has this sync issue.
I'd like to understand how to debug it, because there is no apparent problem, other than random syncs getting stuck D:
Our setup does not include custom health checks, but we do have sync waves.
Just wondering: it looks like the Certificate health check proposed by argocd itself (mentioned in https://argoproj.github.io/argo-cd/operator-manual/health/ and in https://github.com/argoproj/argo-cd/tree/master/resource_customizations/cert-manager.io/Certificate) causes this issue. Do you think it would make sense to open a PR to change that check, in order to fix that?
Yes, if the health check is not functioning, please send a PR.
Thinking about this a bit more - It may be that the application syncing issue is just a symptom of a wider issue?
The health check does eventually get to a Healthy state and is in a "Degraded" state when the certificate issuance is pending.
I can make a PR for the small change above, which reduces issues when using cert-manager certificates, but that won't fix the underlying sync issue. Where after getting into a degraded state, the application will wait for all resources to report "healthy" and seemingly deadlocks or waits for an event that never arrives.
Most helpful comment
Thinking about this a bit more - It may be that the application syncing issue is just a symptom of a wider issue?
The health check does eventually get to a Healthy state and is in a "Degraded" state when the certificate issuance is pending.
I can make a PR for the small change above, which reduces issues when using cert-manager certificates, but that won't fix the underlying sync issue. Where after getting into a degraded state, the application will wait for all resources to report "healthy" and seemingly deadlocks or waits for an event that never arrives.