I don't know if its the same issue, but I have observed similar behaviour. I have a simple Argo CD application consisting of only secrets and config maps. Occasionally (once in a few runs) the application is never getting to Synced/Healthy state. What's more, is that application-controller is not working anymore, it's neither syncing anything nor self healing. Restarting of application-controller pod fixes stuff immediately.
I have created a script to reproduce this issue pretty consistently (although it takes some time, because usually a few iterations are needed). However, I think it occurs more frequently the more secrets/config maps there is.
The code and results are available here:
https://github.com/code4free/argocd-stuck
If you feel that it might be separate issue I will create a separate ticket.
I am also seeing this issue on v1.5.4 with non HA configuration and small scale (~70 applications). I've been resorting to restarting the application-controller whenever it gets wedged.
Thank you @code4free ! I owe you a beer 馃嵑
I consistently can reproduce deadlock using the attached application. The bug is in SettingsManager.notifySubscribers method:
Apparently I already fixed it in master:
I think fix should be cherry-picked into 1.5. Working on it
cherry-picked into 1.5
@alexmt cool, happy to help. I tried to reduce our application to the minimal example that still produces the same result and that's how i ended up with an application with only secrets and configmaps. I have also noticed, that the more secrets/configmaps, the more often this error occurs.
Also I didn't observe the same behaviour (or I was just lucky), when the argocd itself was in different namespace, then the application "destination" namespace.
Most helpful comment
I don't know if its the same issue, but I have observed similar behaviour. I have a simple Argo CD application consisting of only secrets and config maps. Occasionally (once in a few runs) the application is never getting to Synced/Healthy state. What's more, is that application-controller is not working anymore, it's neither syncing anything nor self healing. Restarting of application-controller pod fixes stuff immediately.
I have created a script to reproduce this issue pretty consistently (although it takes some time, because usually a few iterations are needed). However, I think it occurs more frequently the more secrets/config maps there is.
The code and results are available here:
https://github.com/code4free/argocd-stuck
If you feel that it might be separate issue I will create a separate ticket.