Title: Retry policy for external authorization requests?
I'm using Envoy 1.9.0 and configured external authorization as described in https://www.envoyproxy.io/docs/envoy/v1.9.0/configuration/http_filters/ext_authz_filter#config-http-filters-ext-authz
It works as expected but sometimes the external authorization server respond with a 5xx status (could be either problems with the server, network, etc.). I haven't managed to configure a retry policy for these cases and wonder if it is possible to do this? Also, I can see that when this occurs, the actual response to the calling client is 403 which is actually a bit misleading and it might be better to propagate 5xx response codes from the authorization server.
So my questions are if retries are possible with authorization requests and if 5xx response codes can be propagates to the client?
Any hints are welcome!
Anyone who could comment on the above? Feels like something that is really missing if this is the case since the whole design of Envoy is to enable fault tolerance characteristics of microservices - but has it been missed in this case?
@gsagula WDYT?
I actually have it already in the pipe, but sadly I haven't seen this question before. I think it's a great enhancement. What I have is exactly the same retry policy already implemented in Envoy:
retry_on: "5xx"
num_retries: 2
per_try_timeout_ms: 2000
@enbohm Just out of curiosity. Did you try any of these retry strategies:
https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#envoy-api-field-route-routeaction-retry-policy
https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#envoy-api-field-route-virtualhost-retry-policy
@gsagula I've tried retry polices but that did only retry when the targeting service/cluster was failing not the authorization request. For instance, the below config does retry when "my-service" fails:
match:
prefix: "/apis/my-service/"
route:
prefix_rewrite: "/"
cluster: my_service
retry_policy: {"retry_on": "5xx", "num_retries": 3}
but I can't figure out how to make my auth-service to inherit this policy. Currently, my auth. config looks like
name: envoy.ext_authz
config:
http_service:
server_uri:
uri: http://authorization-service:8080
cluster: ext-authz
timeout: 2s
I'd like to be able to configure it something like this:
name: envoy.ext_authz
config:
http_service:
server_uri:
uri: http://authorization-service:8080
cluster: ext-authz
timeout: 2s
retry_policy: {"retry_on": "5xx", "num_retries": 3}
but that is currently not supported AFAIK (I'm running envoy 1.9.0). Also, as per https://github.com/envoyproxy/envoy/issues/6119 design proposal, it would be great to have an option to propagate 5xx error codes instead of always returning 403.
GRPC Service as well... Something like this:
- name: envoy.ext_authz
config:
grpc_service:
envoy_grpc:
cluster_name: authorization
timeout: 5s
retry_policy: {"retry_on": "5xx", "num_retries": 3}
failure_mode_allow: false
I actually have it already in the pipe, but sadly I haven't seen this question before. I think it's a great enhancement. What I have is exactly the same retry policy already implemented in Envoy:
retry_on: "5xx" num_retries: 2 per_try_timeout_ms: 2000@gsagula Are you saying the retries on authz are already being added??
@gdheller42 I have https://github.com/envoyproxy/envoy/issues/6119 pretty much done. Just need to add a test. You should be able to use the Envoy's retry I believe.
@gsagula great thanks.. !!
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
@gsagula Hi. Can't tell the status. Is it merged with master yet ? Thanks
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.
@gsagula I've been trying to use the strategy suggested: envoy http route retries + ext_authz 5xx errors on connection failure. I don't see it retrying (tcpdump shows two connection attempts then immediate failure, regardless of the number of retry attempts I specify). Perhaps this has something to do with my setup, but there's something I don't understand about how it should be working. Doesn't the authz filter run prior to the route filter, and doesn't it terminate the filter chain on failure? If so, what triggers the retry logic?
My config is something along these lines:
authz config:
- name: envoy.ext_authz
config:
grpc_service:
envoy_grpc:
cluster_name: authz_cluster
timeout: "5.000s"
status_on_error:
code: ServiceUnavailable
http route retry policy:
retry_policy:
retry_on: "5xx"
num_retries: 100
per_try_timeout: "4s"
retry_back_off:
base_interval: "1s"
max_interval: "5s"
Most helpful comment
I actually have it already in the pipe, but sadly I haven't seen this question before. I think it's a great enhancement. What I have is exactly the same retry policy already implemented in Envoy: