Consul: Consul connect provide a way to configure envoy route timeout

Created on 23 Aug 2019  路  9Comments  路  Source: hashicorp/consul

Feature Description

Provide a way to configure upstream listener to configure timeout.

Use Case(s)

https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/route/route.proto#route-routeaction
The envoy default value is 15s second which is too high or too low depends on the use case.

themconnect themenvoxds typenhancement

Most helpful comment

Hi,

we set this timeout using a service router:

Kind = "service-router"
Name = "myservice"
Routes = [
  {
    Destination {
      Service = ""
      RetryOnConnectFailure = true
      RequestTimeout = "120s"
    }
  }
]

which can then be seeing in the envoy configuration:
"route": { "cluster": "myservice.default.dc1.internal.1af17f86-a6e7-28b2-3176-5a22dfc90678.consul", "timeout": "120s", "retry_policy": { "retry_on": "connect-failure" } }

However, envoy sets this header:
x-envoy-expected-rq-timeout-ms: 15000
(see https://www.envoyproxy.io/docs/envoy/v1.11.1/configuration/http_filters/router_filter.html?highlight=timeout#x-envoy-expected-rq-timeout-ms)
which seems to cause the upstream envoy to give up after 15 seconds.

We couldn't find a way to increase the value of this header ( or disable it ) via consul which would leave us with the configured timeout in the service router.
As a result we sporadically get a 504 Gateway Timeout for some long running requests.

At this point we are not sure if there is way to manage that header and we missed it or if there is still no way to manage it, in which case it would be a very helpful feature.

We are using consul 1.6.1 with envoy 1.11.1

If you could shed some light on this that would help us a lot
Thanks!

All 9 comments

Hey there,
We wanted to check in on this request since it has been inactive for at least 60 days.
If you think this is still an important issue in the latest version of Consul
or its documentation please reply with a comment here which will cause it to stay open for investigation.
If there is still no activity on this issue for 30 more days, we will go ahead and close it.

Feel free to check out the community forum as well!
Thank you!

Hi,

we set this timeout using a service router:

Kind = "service-router"
Name = "myservice"
Routes = [
  {
    Destination {
      Service = ""
      RetryOnConnectFailure = true
      RequestTimeout = "120s"
    }
  }
]

which can then be seeing in the envoy configuration:
"route": { "cluster": "myservice.default.dc1.internal.1af17f86-a6e7-28b2-3176-5a22dfc90678.consul", "timeout": "120s", "retry_policy": { "retry_on": "connect-failure" } }

However, envoy sets this header:
x-envoy-expected-rq-timeout-ms: 15000
(see https://www.envoyproxy.io/docs/envoy/v1.11.1/configuration/http_filters/router_filter.html?highlight=timeout#x-envoy-expected-rq-timeout-ms)
which seems to cause the upstream envoy to give up after 15 seconds.

We couldn't find a way to increase the value of this header ( or disable it ) via consul which would leave us with the configured timeout in the service router.
As a result we sporadically get a 504 Gateway Timeout for some long running requests.

At this point we are not sure if there is way to manage that header and we missed it or if there is still no way to manage it, in which case it would be a very helpful feature.

We are using consul 1.6.1 with envoy 1.11.1

If you could shed some light on this that would help us a lot
Thanks!

I have the same issue and i think this is because envoy proxy on the local app dont set a specific timeout.
Your envoy proxy in front of the target app set the default timeout for all requests it sends to his local app.
To avoid this problem, the only solution i found was to override the public listener with envoy_public_listener_json config key

Hi @rrondeau ,
many thanks for the hint. Would it be possible to get an example snippet of the configuration you used?
Thanks!

i faced the same issue. I end up to build consul from source code with my fix. hope you found it useful

diff --git a/agent/xds/listeners.go b/agent/xds/listeners.go
index b44cd996c..67805ec29 100644
--- a/agent/xds/listeners.go
+++ b/agent/xds/listeners.go
@@ -9,6 +9,7 @@ import (
        "regexp"
        "strconv"
        "strings"
+       "time"

        envoy "github.com/envoyproxy/go-control-plane/envoy/api/v2"
        envoyauth "github.com/envoyproxy/go-control-plane/envoy/api/v2/auth"
@@ -727,6 +728,7 @@ func makeHTTPFilter(
                                        ClusterSpecifier: &envoyroute.RouteAction_Cluster{
                                                Cluster: cluster,
                                        },
+                                       Timeout: addrTime(5*time.Minute),
                                },
                        },
                }
@@ -844,3 +846,7 @@ func makeCommonTLSContext(cfgSnap *proxycfg.ConfigSnapshot) *envoyauth.CommonTls
                },
        }
 }
+
+func addrTime(t time.Duration) *time.Duration {
+       return &t
+}
diff --git a/agent/xds/routes.go b/agent/xds/routes.go
index 520545bdc..9cadaa2b1 100644
--- a/agent/xds/routes.go
+++ b/agent/xds/routes.go
@@ -4,6 +4,7 @@ import (
        "errors"
        "fmt"
        "strings"
+       "time"

        "github.com/gogo/protobuf/proto"

@@ -317,6 +318,7 @@ func makeRouteActionForSingleCluster(targetID string, chain *structs.CompiledDis
                        ClusterSpecifier: &envoyroute.RouteAction_Cluster{
                                Cluster: clusterName,
                        },
+                       Timeout: addrTime(5*time.Minute),
                },
        }
 }
@@ -353,6 +355,7 @@ func makeRouteActionForSplitter(splits []*structs.DiscoverySplit, chain *structs
                                        TotalWeight: makeUint32Value(10000), // scaled up 100%
                                },
                        },
+                       Timeout: addrTime(5*time.Minute),
                },
        }, nil
 }

@msuarezd, I believe the issue you're describing is related to https://github.com/envoyproxy/envoy/issues/7358 which was fixed in Envoy version 1.12.0 with https://github.com/envoyproxy/envoy/pull/8051. If that's correct, the correct fix would be for Consul to also allow configuring respect_expected_rq_timeout in the envoy.router HTTP filter config on the destination sidecars.

Has there been any progress on this issue?
I'm extremely surprised to find there's no way to easily change request timeouts when using Consul Connect, this seems pretty critical?

Is setting RequestTimeout in a Service Resolver configuration the expected way to configure this? And it just doesn't work because of the respect_expected_rq_timeout issue mentioned above?

I'm not super familiar with Envoy itself and the documentation around the Envoy escape hatch configuration is hard to follow.
Are these options the expected way to configure a request timeout?
If so are there any examples of how to do this? Particularly when using the consul-k8s injector.

Wew, this has been a bit of a rabbit hole but for anyone else who runs into this issue...

The quickest fix is to change the destination service protocol to tcp, this obviously means you can't use any of the Connect L7 features but at least your application level timeouts will work.
edit: this does seem to have changed all the upstream listeners to TCP proxies even though only 1 of the services is TCP and the rest are HTTP, a different bug maybe?

Setting a service-router configuration with a RequestTimeout should work.
It injects a timeout configuration into the envoy route configuration as @msuarezd pointed out.

This had no effect for me, couldn't find a timeout parameter in the config dump on either envoy instance.
Reasonably sure this is me doing something wrong but I didn't dig too hard given the respect_expected_rq_timeout issue means this doesn't work properly.

I had a look at enabling respect_expected_rq_timeout in the envoy.router config but that's not an easy fix either.
The go-control-plane dependency doesn't support that configuration until v0.9.1 but v0.9.0 is a major breaking change and looks like it would need quite a few changes in the Consul codebase to support, or some hacking around to force that parameter.

Hi,
Same problem here, 15 seconds is too short for some of our workload.
Moreover, we are using Connect L7 feature so we can not set endpoint to tcp.
I am really interested by @rrondeau solution. Did you have any example ?

I suppose that you override the 'listener_filters_timeout' value ?
Is it possible to let all the other keys unset ?

Yours faithfully,
LCDP

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sandstrom picture sandstrom  路  3Comments

runswithd6s picture runswithd6s  路  3Comments

nicholasjackson picture nicholasjackson  路  3Comments

deadjoe picture deadjoe  路  4Comments

wargamez picture wargamez  路  4Comments