Envoy: [Feature] add ability to serve custom error responses

Created on 27 Jun 2017 · 22Comments · Source: envoyproxy/envoy

for envoy to serve as an external proxy, I believe it needs the ability to return custom error responses. For example, instead of returning a generic http body Service Unavailable, I can have it return an error msg consistent with the formatting of my public facing api, such as:

{
  "error": {
    "type": "service_error",
    "status_code": 503,
    "message": "The service is temporarily unavailable."
  }
}

to do this in haproxy, one adds something like errorfile 503 /etc/haproxy/503.json to the config. Can we add similar functionality to envoy? (to use envoy right now for my public facing api, I actually have to put haproxy in front to serve custom error files based upon envoy's response codes.

As for the best place to put it, since it is an http attribute, perhaps it should belong as an http filter?

related to #378 (for what it is worth, I don't think envoy should serve general static files)

arehttp enhancement help wanted

Source

skippy

👍62 ❤13

Most helpful comment

Please deliver this as soon as possible. We're wanting to move from haproxy to envoy for many reasons, but lack of custom error pages (even if not different per content type) is preventing us from moving forward.

daninthewoods on 27 Jan 2020

👍16 👎1

All 22 comments

At a high level this feature makes sense to me. This actually will require some thought, because we need to differentiate between a "local origin" reply vs. a routed reply, so I don't think a filter is the way to go here. It probably makes sense to allow configuration in the route/vhost level for message overrides on a per response code basis, and then beef up the code in envoy that is used to send "local origin" replies.

mattklein123 on 27 Jun 2017

@junr03 can you take a look at this in the context of the conversation we had today about increasing the fidelity of Envoy error responses in certain cases? We should put together a small design that covers this and the common cases.

For everyone else, at Lyft we would like the ability to have better control over what Envoy returns in error cases. For example, can we return not only a 503, but also potentially JSON that carries additional information about what happened. E.g., a circuit breaker was hit. This will allow apps to potentially have much better error messages and take appropriate action.

I think we can potentially provide some built-in options as well as provide additional customizations if we do this right.

mattklein123 on 21 Mar 2018

👍4

At Cloud Foundry we have also seen operators and app developers asking for more control on the error returned to the downstream client.

The specific feature we were looking to build out in the near term, was to be able to distinguish between Envoy being aware of a route and the backend misbehaving (503 error) and the Envoy not being aware of a route (404 error). It's been requested by users in our community, ref here.

shubhaat on 30 Mar 2018

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

stale[bot] on 28 Jun 2018

I've just hit this seeming limitation of envoy. We need to serve a custom 503 page during maintenance, but it seems like the only way to do this simply is to serve it from the web servers, and to special case the envoy health checks so they they don't 503 when the maintenance page is up (so the web servers aren't removed from rotation causing envoy to serve a generic single line error message).

wjessop on 6 Aug 2018

👍3

@mattklein123 is there a timeline of adding this feature? Being able to override the response of a routed request became a higher demand to the product my team is working on.

qiannawang on 21 Nov 2018

👍6

@qiannawang can you sync up with @junr03 next week? I think his work in this area got de-prioritized but I'm not sure. If so, perhaps you can pick this up.

mattklein123 on 22 Nov 2018

hmm, looks like we can call StreamDecoderFilterCallbacks::sendLocalReply to override the entire HTTP response from the upstream endpoint. At least, my local experiment proves this override works.

This is kind of unexpected to me to call the decoder callbacks in the response path, like encodeHeaders. WDYT, @mattklein123 ? Would the decoder callbacks be destructed once the request is routed (or forwarded) to the upstream endpoint?

qiannawang on 23 Nov 2018

@qiannawang can you describe a bit more about what your exact use case is? I'm a little confused. (It might be true that sendLocalReply() can override a response, but that is not what this issue is tracking, which is to override the response that Envoy sends for locally originated responses such as 503, 404, etc.

mattklein123 on 24 Nov 2018

We use plugged-in filters in the order of X and Y. Then, the envoy.router forwards the request to the upstream endpoint, which might respond with 200s or 500s for example.

In our case, we would like the encoder filter X to override the response with a 503, no matter what the response was returned by other filters or the upstream endpoint. It seems that StreamDecoderFilterCallbacks::sendLocalReply does achieve this. I am wondering if it is reasonable to invoke the decoder filter callbacks in the encoder path.

qiannawang on 24 Nov 2018

@qiannawang you should be able to do what you are looking for with the existing filter interface. E.g., turning a response into headers only, adding trailers, changing/removing body, etc. Can you describe what you can do? It's possible that sendlocalReply() "works" but that's likely accidental for your use case. As I said already, this issue tracks a different feature request which is to allow modifying the responses that Envoy sends itself. If you have further questions can you please open a new issue?

mattklein123 on 26 Nov 2018

I landed on this issue while searching for how to serve custom error pages with Envoy. The suggestion in the issue of something similar to HAProxy's errorfile 503 /etc/haproxy/503.json seems close to ideal (although, I would want the ability to automatically determine if json or html should be served based on the requests accepted content types.

In the meantime, is there any capability to configure an override when a synthetic 503 would be returned? I'm thinking something along the lines of connecting to another upstream service that could provide the error pages.

dfjones on 1 Mar 2019

👍3

Now when envoy 1.10.0 is releases and this issue has the milestone 1.10.0 and 1.11.0, is it realistic to see support for this within the next release?

tobiaskohlbau on 16 May 2019

👍12

Is it fair at this moment to assume that all replies generated by Envoy itself are going through StreamDecoderFilterCallbacks::sendLocalReply?
It would make it a reasonable injection point to override the output (status code, headers, body) based on the configuration.

euroelessar on 17 Jul 2019

@euroelessar yes, agreed. See also the convo in https://github.com/envoyproxy/envoy/issues/7537

mattklein123 on 17 Jul 2019

I know this wont suit a lot of the use cases mentioned in this thread, but at least for the complete failure of a set of endpoints being actively health checked by Envoy we did the following:

Setup healthchecks on the endpoints for a given cluster:

      - interval: 5s
        no_traffic_interval: 45s
        timeout: 5s
        unhealthy_threshold: 3
        healthy_threshold: 3
        reuse_connection: yes
        http_health_check:
          path: /healthcheck

Have a infinitely high priority endpoint in the cluster that point back at envoy itself on a unique listener:

      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: host.docker.internal
                port_value: 9001
        priority: 128

Have that point at a listener that only serves a custom error response page/content and 200 OK (so that the healthcheck for this endpoint succeeds), no matter what the path/HTTP call is:

  listeners:
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 9001
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          codec_type: auto
          stat_prefix: ingress_http
          route_config: {}
          http_filters:
          - name: envoy.lua
            config:
              inline_code: |
                local failurecontent = require("lib.envoy.lua.failurecontent")
                function envoy_on_request(request_handle)
                  request_handle:respond(
                    {[":status"] = "200",
                     ["envoy-fallback"] = "true"},
                    failurecontent.htmlcontent())
                end
          - name: envoy.gzip
          - name: envoy.router

Finally on your main listener have a bit of lua which captures the envoy-fallback response and turns it into a 500 for clients

  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 80
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        config:
          codec_type: auto
          route_config:
            name: local_route
            virtual_hosts:
            - name: backend
              domains:
              - "*"
              routes:
              - match:
                  prefix: "/service/1"
                route:
                  cluster: service1
              - match:
                  prefix: "/service/2"
                route:
                  cluster: service2
          http_filters:
          - name: envoy.lua
            config:
              inline_code: |
                function envoy_on_response(response_handle)
                  if response_handle:headers():get("envoy-fallback") == "true" then
                    response_handle:headers():replace(":status", "500")
                    response_handle:headers():remove("envoy-fallback")
                  end
                end
          - name: envoy.gzip
          - name: envoy.router

It's messy but it works well.

bmgoau on 6 Sep 2019

daninthewoods on 27 Jan 2020

👍16 👎1

This would be relevant for my team as well. Thanks for the work on Envoy everyone!

zimmertr on 27 Jan 2020

This is a big problem for us as well.

cudneys on 19 Feb 2020

We have another use case for this which blocks as going with envoy as our public api gateway:

Our services respond with json for errors and for success responses. For error responses we use zalando's problem format. So when a user is not authorized to access a resource he gets a json response with http code 401 from backend service.
We could simply proxy the response through. Thats fine. But we want to use envoys' JWT validation feature so that the invalid requests don't get through to backend services. But in this case the response from envoy is not matching the expected 401 response which backend service would have served. It would be perfect if could set static error response which matches our backend service responses.

Ideally respecting Accept headers, to enable xml and json responses, otherwise Accept: application/xml gets json as response.

Thanks for making envoy the best proxy in the market

vemod on 16 Apr 2020

Fixed by https://github.com/envoyproxy/envoy/pull/11007. Please open new issues with specific requests.

mattklein123 on 24 May 2020

Those PR are very awesome, but want to know. Actually in the nginx we know that we can route to the other service when main service/upstream down with adding any additional data we need.

After checking documentation here. I did not found any option for us to routes traffic on error to another service with additional data. Or I miss something in any other places