Envoy: Possible to configure retry policy for external authorization requests?

Created on 15 Feb 2019  路  14Comments  路  Source: envoyproxy/envoy

Title: Retry policy for external authorization requests?

I'm using Envoy 1.9.0 and configured external authorization as described in https://www.envoyproxy.io/docs/envoy/v1.9.0/configuration/http_filters/ext_authz_filter#config-http-filters-ext-authz

It works as expected but sometimes the external authorization server respond with a 5xx status (could be either problems with the server, network, etc.). I haven't managed to configure a retry policy for these cases and wonder if it is possible to do this? Also, I can see that when this occurs, the actual response to the calling client is 403 which is actually a bit misleading and it might be better to propagate 5xx response codes from the authorization server.

So my questions are if retries are possible with authorization requests and if 5xx response codes can be propagates to the client?

Any hints are welcome!

question stale

Most helpful comment

I actually have it already in the pipe, but sadly I haven't seen this question before. I think it's a great enhancement. What I have is exactly the same retry policy already implemented in Envoy:

retry_on: "5xx"
num_retries: 2
per_try_timeout_ms: 2000

All 14 comments

Anyone who could comment on the above? Feels like something that is really missing if this is the case since the whole design of Envoy is to enable fault tolerance characteristics of microservices - but has it been missed in this case?

@gsagula WDYT?

I actually have it already in the pipe, but sadly I haven't seen this question before. I think it's a great enhancement. What I have is exactly the same retry policy already implemented in Envoy:

retry_on: "5xx"
num_retries: 2
per_try_timeout_ms: 2000

@enbohm Just out of curiosity. Did you try any of these retry strategies:
https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#envoy-api-field-route-routeaction-retry-policy
https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#envoy-api-field-route-virtualhost-retry-policy

@gsagula I've tried retry polices but that did only retry when the targeting service/cluster was failing not the authorization request. For instance, the below config does retry when "my-service" fails:

match:
  prefix: "/apis/my-service/"
route:
 prefix_rewrite: "/"
 cluster: my_service
 retry_policy: {"retry_on": "5xx", "num_retries": 3}

but I can't figure out how to make my auth-service to inherit this policy. Currently, my auth. config looks like

name: envoy.ext_authz
config:
  http_service:
    server_uri:
      uri: http://authorization-service:8080
      cluster: ext-authz
      timeout: 2s

I'd like to be able to configure it something like this:

name: envoy.ext_authz
config:
  http_service:
    server_uri:
      uri: http://authorization-service:8080
      cluster: ext-authz
      timeout: 2s
      retry_policy: {"retry_on": "5xx", "num_retries": 3}

but that is currently not supported AFAIK (I'm running envoy 1.9.0). Also, as per https://github.com/envoyproxy/envoy/issues/6119 design proposal, it would be great to have an option to propagate 5xx error codes instead of always returning 403.

GRPC Service as well... Something like this:

     - name: envoy.ext_authz
        config:
          grpc_service:
            envoy_grpc:
              cluster_name: authorization
            timeout: 5s
            retry_policy: {"retry_on": "5xx", "num_retries": 3}
          failure_mode_allow: false

I actually have it already in the pipe, but sadly I haven't seen this question before. I think it's a great enhancement. What I have is exactly the same retry policy already implemented in Envoy:

retry_on: "5xx"
num_retries: 2
per_try_timeout_ms: 2000

@gsagula Are you saying the retries on authz are already being added??

@gdheller42 I have https://github.com/envoyproxy/envoy/issues/6119 pretty much done. Just need to add a test. You should be able to use the Envoy's retry I believe.

@gsagula great thanks.. !!

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@gsagula Hi. Can't tell the status. Is it merged with master yet ? Thanks

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

@gsagula I've been trying to use the strategy suggested: envoy http route retries + ext_authz 5xx errors on connection failure. I don't see it retrying (tcpdump shows two connection attempts then immediate failure, regardless of the number of retry attempts I specify). Perhaps this has something to do with my setup, but there's something I don't understand about how it should be working. Doesn't the authz filter run prior to the route filter, and doesn't it terminate the filter chain on failure? If so, what triggers the retry logic?

My config is something along these lines:

authz config:

- name: envoy.ext_authz
  config:
    grpc_service:
      envoy_grpc:
        cluster_name: authz_cluster
    timeout: "5.000s"
    status_on_error:
      code: ServiceUnavailable

http route retry policy:

retry_policy:
  retry_on: "5xx"
  num_retries: 100
  per_try_timeout: "4s"
  retry_back_off:
    base_interval: "1s"
    max_interval: "5s"
Was this page helpful?
0 / 5 - 0 ratings