Argo-cd: Login attempt fails sometimes after 1.5.3 upgrade

Created on 5 May 2020  路  24Comments  路  Source: argoproj/argo-cd

After upgrading to 1.5.3 I'm getting these errors on the first attempt to authenticate via /api/v1/session route after some idle period. It would eventually work if I retry a second or few more times. I did not configure any of the rate limiting and the credentials used to hit the route is the admin user. Not quite sure why this is happening.

time="2020-05-05T16:58:46Z" level=error msg="finished unary call with code Unknown" error="failed to enforce max concurrent logins limit: EOF" grpc.code=Unknown grpc.method=Create grpc.service=session.SessionService grpc.start_time="2020-05-05T16:58:46Z" grpc.time_ms=0.664 span.kind=server system=grpc
bug high criticial api

All 24 comments

Hi @eroji, can you share a little more details about your environment please? This error suggest that the Redis cache is not available (although seems to be intermittent according to your error description).

Interesting to know would be:

  • Have you installed ArgoCD in HA setup?
  • How did you upgrade, and from where (which version)?

My apologies. I'm using the HA install. Only modification I added was --insecure flag for argocd-server. I upgraded from 1.5.1.

I've encountered the same issue during upgrade. The solution was to "restart" both Redis statefulset and redis HA proxy. @eroji , can you give it a try please?

I've seen the same issue with 1.5.1.

Trying it now.

It seems to be working? I'll check throughout the day to see if I hit this error again and report back.

We should look for/file upstream bug in Redis HA helm chart. Looks like it is not happening often. In my case, it happened for 1 out of ~40 argocd instances.

Looks like it's still happening. I see that 1.5.4 has been released. I will try upgrading to that to see if it helps.

time="2020-05-06T07:54:14Z" level=error msg="finished unary call with code Unknown" error="failed to enforce max concurrent logins limit: EOF" grpc.code=Unknown grpc.method=Create grpc.service=session.SessionService grpc.start_time="2020-05-06T07:54:14Z" grpc.time_ms=0.886 span.kind=server system=grpc

Hello @eroji , 1.5.4 does not include redis related changes. I don't think it will help.

As a quick workaround you might disable concurrent login limit feature: set env ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT=0 in argocd-server deployment.

Going to enable retries in redis client and test it on local deployments.

@alexmt not sure why but it seems like upgrading to 1.5.4 resolved the issue. I didn't have to add the env var at all...

Created ticket in redis-ha chart repository: https://github.com/DandyDeveloper/charts/issues/26

PR that introduces redis retries during login flow is merged: https://github.com/argoproj/argo-cd/pull/3575

Adding big WARNING to 1.4 -> 1.5 upgrade instructions about possible redis issue as well: https://github.com/argoproj/argo-cd/pull/3584. Probably this is as much as we can do:

Once all three are done I think ticket can be closed. Does it look reasonable to you @jannfis , @jessesuen ?

v1.5.5 with the redis retries had been released. Please give it try. Closing ticket until we hear again about redis issues.

v1.5.5 with the redis retries had been released. Please give it try. Closing ticket until we hear again about redis issues.

still have this issue in 1.5.5, like @eroji only modification I have is --insecure flag

Same issue with v1.5.5, works only when setting the env variable to argocd-server:
ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT=0

In logs I'm getting this after many time outs, thought this might help

5/29/2020 7:13:54 PM 2020/05/29 17:13:54 cache: Get key="session|login.attempts|1.0.0" failed: dial tcp: i/o timeout
5/29/2020 7:13:54 PM time="2020-05-29T17:13:54Z" level=error msg="Could not retrieve login attempts: dial tcp: i/o timeout"
5/29/2020 7:14:14 PM 2020/05/29 17:14:14 cache: Get key="session|login.attempts|1.0.0" failed: dial tcp: i/o timeout
5/29/2020 7:14:14 PM time="2020-05-29T17:14:14Z" level=error msg="Could not retrieve login attempts: dial tcp: i/o timeout"
5/29/2020 7:14:34 PM 2020/05/29 17:14:34 cache: Set key="session|login.attempts|1.0.0" failed: dial tcp: i/o timeout
5/29/2020 7:14:34 PM time="2020-05-29T17:14:34Z" level=error msg="Could not update login attempts: dial tcp: i/o timeout"
5/29/2020 7:14:34 PM time="2020-05-29T17:14:34Z" level=info msg="Issuing claims: { 0 1590772474 argocd 1590772474 admin}"
5/29/2020 7:14:34 PM time="2020-05-29T17:14:34Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=Create grpc.service=session.SessionService grpc.start_time="2020-05-29T17:13:34Z" grpc.time_ms=60206.44 span.kind=server system=grpc
5/29/2020 7:14:35 PM time="2020-05-29T17:14:35Z" level=info msg="received unary call /session.SessionService/GetUserInfo" grpc.method=GetUserInfo grpc.request.claims="{\"iat\":1590772474,\"iss\":\"argocd\",\"nbf\":1590772474,\"sub\":\"admin\"}" grpc.request.content= grpc.service=session.SessionService grpc.start_time="2020-05-29T17:14:35Z" span.kind=server system=grpc
5/29/2020 7:14:35 PM time="2020-05-29T17:14:35Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetUserInfo grpc.service=session.SessionService grpc.start_time="2020-05-29T17:14:35Z" grpc.time_ms=0.456 span.kind=server system=grpc
5/29/2020 7:14:35 PM time="2020-05-29T17:14:35Z" level=info msg="received unary call /cluster.ClusterService/List" grpc.method=List grpc.request.claims="{\"iat\":1590772474,\"iss\":\"argocd\",\"nbf\":1590772474,\"sub\":\"admin\"}" grpc.request.content= grpc.service=cluster.ClusterService grpc.start_time="2020-05-29T17:14:35Z" span.kind=server system=grpc 

Does this affect users signing in via an IDP such as Okta?

adding this to my argocd-server deployment resolved the issue

env:
  - name: ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT
    value: "0"

Often, when log entries like these

5/29/2020 7:14:34 PM 2020/05/29 17:14:34 cache: Set key="session|login.attempts|1.0.0" failed: dial tcp: i/o timeout

can be observed, there is a problem with either in-cluster DNS resolution or otherwise interconnectivity issues within the cluster or the redis pod is not running at all.

Often, when log entries like these

5/29/2020 7:14:34 PM 2020/05/29 17:14:34 cache: Set key="session|login.attempts|1.0.0" failed: dial tcp: i/o timeout

can be observed, there is a problem with either in-cluster DNS resolution or otherwise interconnectivity issues within the cluster or the redis pod is not running at all.

I get this issue only when creating cluster on bare-metal azure vm. Works perfectly fine with cluster on ec2 instance.
Now I'm getting error while adding git repo
rpc error: code = Unknown desc = Get "https://gitlab.com/xxxxx/xxxxxxx.git/info/refs?service=git-upload-pack": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

env:

  • name: ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT
    value: "0

where exactly to add these values? can u show me the screenshot for this?

env:

  • name: ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT
    value: "0

where exactly to add these values? can u show me the screenshot for this?

Add this in argocd-server Deployment in the install.yaml. You can try adding it at https://github.com/argoproj/argo-cd/blob/master/manifests/install.yaml#L2646
for_git

env:

  • name: ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT
    value: "0

where exactly to add these values? can u show me the screenshot for this?

Add this in argocd-server Deployment in the install.yaml. You can try adding it at https://github.com/argoproj/argo-cd/blob/master/manifests/install.yaml#L2646
for_git

I got this:

error: error validating "install.yaml": error validating data: ValidationError(Deployment.spec.template.spec.containers[0]): unknown field "-env" in io.k8s.api.core.v1.Container; if you choose to ignore these errors, turn validation off with --validate=false

Now worked. Thanks a lot :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

phillebaba picture phillebaba  路  23Comments

StianOvrevage picture StianOvrevage  路  23Comments

guilhermeoki picture guilhermeoki  路  25Comments

alexmt picture alexmt  路  18Comments

tomjohnburton picture tomjohnburton  路  26Comments