Argo-cd: CertManager becomes `SyncError` with ArgoCD v1.1.0-rc1

Created on 27 Jun 2019  路  17Comments  路  Source: argoproj/argo-cd

Describe the bug
I tried to deploy CertManager stable(v0.8.0) with ArgoCD v1.1.0-rc1.
But CertManager sometimes became SyncError and auto-sync had stopped.

When using Argo CD v1.0.1, this error didn't occur.
I'm afraid v1.1.0-rc1 is degraded...

Detail
In our tryout, CertManager's CRDs (Certificate and Issuer) sometimes become Degraded, and CertManager's task was judged as SyncError.

As far as I searched, the CRDs' status is False immediately after the resource is created.
Just at this time, if ArgoCD health check is executed, the task will be judged as SyncError by following steps.

  1. The resources are judged as Degraded by this Lua script. (This step is unchanged from v1.0.1)
  2. The operation is judged as failed by this code. (Probably spec change in v1.1.0-rc1?)

When doing declarative operations, it sometimes happens that resources are judged as Degraded.
So I hope auto-sync is not stopped in this situation.

Version
v1.1.0-rc1

Logs

When resources are judged as Degraded

time="2019-06-27T05:22:37Z" level=info msg="updating resource result, status: 'Synced' -> 'Synced', phase 'Running' -> 'Failed', message 'issuer.certmanager.k8s.io/cert-manager-webhook-ca created' -> 'Error getting keypair for CA issuer: secret \"cert-manager-webhook-ca\" not found'" application=external-dns kind=Issuer name=cert-manager-webhook-ca namespace=external-dns phase=Sync
time="2019-06-27T05:22:37Z" level=info msg="updating resource result, status: 'Synced' -> 'Synced', phase 'Running' -> 'Failed', message 'certificate.certmanager.k8s.io/cert-manager-webhook-webhook-tls created' -> 'Certificate does not exist'" application=external-dns kind=Certificate name=cert-manager-webhook-webhook-tls namespace=external-dns phase=Sync

When the task becomes SyncError

time="2019-06-27T05:23:39Z" level=info msg=tasks application=external-dns isSelectiveSync=false tasks="[Sync/0 resource /Namespace:external-dns/external-dns obj->obj (Synced,Succeeded,namespace/external-dns configured), Sync/0 resource /ServiceAccount:external-dns/cert-manager obj->obj (Synced,Succeeded,serviceaccount/cert-manager created), Sync/0 resource /ServiceAccount:external-dns/cert-manager-cainjector obj->obj (Synced,Succeeded,serviceaccount/cert-manager-cainjector created), Sync/0 resource /ServiceAccount:external-dns/cert-manager-webhook obj->obj (Synced,Succeeded,serviceaccount/cert-manager-webhook created), Sync/0 resource /ServiceAccount:external-dns/external-dns obj->obj (Synced,Succeeded,serviceaccount/external-dns created), Sync/0 resource apiextensions.k8s.io/CustomResourceDefinition:external-dns/certificates.certmanager.k8s.io obj->obj (Synced,Succeeded,customresourcedefinition.apiextensions.k8s.io/certificates.certmanager.k8s.io created), Sync/0 resource apiextensions.k8s.io/CustomResourceDefinition:external-dns/challenges.certmanager.k8s.io obj->obj (Synced,Succeeded,customresourcedefinition.apiextensions.k8s.io/challenges.certmanager.k8s.io created), Sync/0 resource apiextensions.k8s.io/CustomResourceDefinition:external-dns/clusterissuers.certmanager.k8s.io obj->obj (Synced,Succeeded,customresourcedefinition.apiextensions.k8s.io/clusterissuers.certmanager.k8s.io created), Sync/0 resource apiextensions.k8s.io/CustomResourceDefinition:external-dns/dnsendpoints.externaldns.k8s.io obj->obj (Synced,Succeeded,customresourcedefinition.apiextensions.k8s.io/dnsendpoints.externaldns.k8s.io created), Sync/0 resource apiextensions.k8s.io/CustomResourceDefinition:external-dns/issuers.certmanager.k8s.io obj->obj (Synced,Succeeded,customresourcedefinition.apiextensions.k8s.io/issuers.certmanager.k8s.io created), Sync/0 resource apiextensions.k8s.io/CustomResourceDefinition:external-dns/orders.certmanager.k8s.io obj->obj (Synced,Succeeded,customresourcedefinition.apiextensions.k8s.io/orders.certmanager.k8s.io created), Sync/0 resource rbac.authorization.k8s.io/ClusterRole:external-dns/cert-manager obj->obj (Synced,Succeeded,clusterrole.rbac.authorization.k8s.io/cert-manager reconciled. clusterrole.rbac.authorization.k8s.io/cert-manager configured), Sync/0 resource rbac.authorization.k8s.io/ClusterRole:external-dns/cert-manager-cainjector obj->obj (Synced,Succeeded,clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector reconciled. clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector configured), Sync/0 resource rbac.authorization.k8s.io/ClusterRole:external-dns/cert-manager-edit obj->obj (Synced,Succeeded,clusterrole.rbac.authorization.k8s.io/cert-manager-edit reconciled. clusterrole.rbac.authorization.k8s.io/cert-manager-edit configured), Sync/0 resource rbac.authorization.k8s.io/ClusterRole:external-dns/cert-manager-view obj->obj (Synced,Succeeded,clusterrole.rbac.authorization.k8s.io/cert-manager-view reconciled. clusterrole.rbac.authorization.k8s.io/cert-manager-view configured), Sync/0 resource rbac.authorization.k8s.io/ClusterRole:external-dns/cert-manager-webhook:webhook-requester obj->obj (Synced,Succeeded,clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:webhook-requester reconciled. clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:webhook-requester configured), Sync/0 resource rbac.authorization.k8s.io/ClusterRole:external-dns/external-dns obj->obj (Synced,Succeeded,clusterrole.rbac.authorization.k8s.io/external-dns created), Sync/0 resource rbac.authorization.k8s.io/ClusterRoleBinding:external-dns/cert-manager obj->obj (Synced,Succeeded,clusterrolebinding.rbac.authorization.k8s.io/cert-manager reconciled. clusterrolebinding.rbac.authorization.k8s.io/cert-manager configured), Sync/0 resource rbac.authorization.k8s.io/ClusterRoleBinding:external-dns/cert-manager-cainjector obj->obj (Synced,Succeeded,clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector reconciled. clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector configured), Sync/0 resource rbac.authorization.k8s.io/ClusterRoleBinding:external-dns/cert-manager-webhook:auth-delegator obj->obj (Synced,Succeeded,clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:auth-delegator created), Sync/0 resource rbac.authorization.k8s.io/ClusterRoleBinding:external-dns/external-dns-viewer obj->obj (Synced,Succeeded,clusterrolebinding.rbac.authorization.k8s.io/external-dns-viewer created), Sync/0 resource rbac.authorization.k8s.io/RoleBinding:kube-system/cert-manager-webhook:webhook-authentication-reader obj->obj (Synced,Succeeded,rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:webhook-authentication-reader created), Sync/0 resource /Service:external-dns/cert-manager-webhook obj->obj (Synced,Succeeded,service/cert-manager-webhook created), Sync/0 resource /Service:external-dns/external-dns-metrics obj->obj (Synced,Succeeded,service/external-dns-metrics created), Sync/0 resource apps/Deployment:external-dns/cert-manager obj->obj (Synced,Succeeded,deployment.apps/cert-manager created), Sync/0 resource apps/Deployment:external-dns/cert-manager-cainjector obj->obj (Synced,Succeeded,deployment.apps/cert-manager-cainjector created), Sync/0 resource apps/Deployment:external-dns/cert-manager-webhook obj->obj (Synced,Running,deployment.apps/cert-manager-webhook created), Sync/0 resource apps/Deployment:external-dns/external-dns obj->obj (Synced,Succeeded,deployment.apps/external-dns created), Sync/0 resource apiregistration.k8s.io/APIService:external-dns/v1beta1.admission.certmanager.k8s.io obj->obj (Synced,Running,apiservice.apiregistration.k8s.io/v1beta1.admission.certmanager.k8s.io created), Sync/0 resource admissionregistration.k8s.io/ValidatingWebhookConfiguration:external-dns/cert-manager-webhook obj->obj (Synced,Succeeded,validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created), Sync/0 resource certmanager.k8s.io/Certificate:external-dns/cert-manager-webhook-ca obj->obj (Synced,Succeeded,Certificate is up to date and has not expired), Sync/0 resource certmanager.k8s.io/Issuer:external-dns/cert-manager-webhook-ca obj->obj (Synced,Failed,Error getting keypair for CA issuer: secret \"cert-manager-webhook-ca\" not found), Sync/0 resource certmanager.k8s.io/Issuer:external-dns/cert-manager-webhook-selfsign obj->obj (Synced,Succeeded,issuer.certmanager.k8s.io/cert-manager-webhook-selfsign created), Sync/0 resource certmanager.k8s.io/Certificate:external-dns/cert-manager-webhook-webhook-tls obj->obj (Synced,Failed,Certificate does not exist), Sync/1 resource certmanager.k8s.io/ClusterIssuer:external-dns/clouddns nil->obj (,,)]"
time="2019-06-27T05:23:39Z" level=info msg="updating resource result, status: 'Synced' -> 'Synced', phase 'Running' -> 'Succeeded', message 'deployment.apps/cert-manager-webhook created' -> 'deployment.apps/cert-manager-webhook created'" application=external-dns kind=Deployment name=cert-manager-webhook namespace=external-dns phase=Sync
time="2019-06-27T05:23:39Z" level=info msg="updating resource result, status: 'Synced' -> 'Synced', phase 'Running' -> 'Succeeded', message 'apiservice.apiregistration.k8s.io/v1beta1.admission.certmanager.k8s.io created' -> 'all checks passed'" application=external-dns kind=APIService name=v1beta1.admission.certmanager.k8s.io namespace=external-dns phase=Sync
time="2019-06-27T05:23:39Z" level=info msg="Updating operation state. phase: Running -> Failed, message: 'one or more tasks are running' -> 'one or more synchronization tasks completed unsuccessfully'" application=external-dns
time="2019-06-27T05:23:39Z" level=info msg="sync/terminate complete" application=external-dns
time="2019-06-27T05:23:39Z" level=info msg="Sync operation to 62802b64bf4a3df19bce40e4f44354a39655b5b1 failed: one or more synchronization tasks completed unsuccessfully" application=external-dns reason=OperationCompleted type=Warning
bug

All 17 comments

@alexec - I think the health assessment logic is regression from previous behavior. We really should not be assessing health unless we are either:

  1. using sync waves and depend on previous wave
  2. using sync hooks and depend on previous hook to complete

@jessesuen I'm not sure about this. I think what's happening is that the certs become degraded before they become healthy. This can happen in both normal and wave/hook syncs. This would mean that you could not use these in either of those styles at all. I think that's a bug, but a different bug. Let me ponder this.

If I apply a single resource (no waves, no hooks), which is a Certificate, as long as the kubectl apply returned zero exit code, then the sync should be deemed successful regardless if the Certificate is degraded.

Health should only come into play when there are dependencies.

I think we should have a point fix for this in v1.1, but I'd like to address the issue of wave-based syncs that flip into degraded before healthy.

@ishii-masayuki just to check, do you have any hooks in your app?

@alexec @jessesuen
Thank you for the discussion!
Now, we use only waves. We don't use any hooks.

This is our manifest files for CertManager. We sync some resources with no waves, and ClusterIssuer is synced as "wave 1". Because there is a clear dependency to use validation webhook.

We tried a lot, and now we do a workaround like this. Overriding the default Lua scripts to vail the Degraded condition for our resources.

In addition, we also need a custom script for APIService in our app.

APIService is always healthy by default(no health check).
So ArgoCD might progress to the next wave at a slight timing from the webhook's deployment becomes healthy to the APIService's state becomes true. In this case, applying ClusterIssuer will fail.

In addition, we also need a custom script for APIService in our app.

Nice. We should make the health check for APIService a built in (native golang) one so everyone will benefit from this.

I've created a ticket to make it built-in.

This won't be fixed by the related PR.

@ishii-masayuki - you have a workaround. So you don't need a fix anymore?

@alexec
Thanks.
I think that APIService's health check benefits everyone who uses APIService, including cert-manager.
So I want the fix to be built in.

When I have time, I will make it. Please wait a moment.

thank you @ishii-masayuki - that'd be fantastic!

What's the status on this? What is the recommended work-around until the fix is available? Is it simply adding the APIService health check to argocd-cm?

I'm sorry. I've been a little busy, and I have free time this week.
If this issue is urgent, please fix ...

What's the status on this? What is the recommended work-around until the fix is available? Is it simply adding the APIService health check to argocd-cm?

I believe so.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

travis-sobeck picture travis-sobeck  路  3Comments

haf picture haf  路  3Comments

alexec picture alexec  路  3Comments

clintberry picture clintberry  路  3Comments

chiragthaker picture chiragthaker  路  3Comments