Describe the bug
If the Host header provided in a request includes the port, Ambassador responds with a 404 where there is an otherwise valid route.
To Reproduce
Steps to reproduce the behavior:
curl https://myhostname.com will work because curl sends Host: myhostname.comcurl https://myhostname.com -H "Host: myhostname.com:443" will return a 404 with an empty bodyExpected behavior
Ambassador should ignore the port portion of the Host header, or at least treat the default ports as equivalent to the absence of a port.
Versions (please complete the following information):
Additional context
Our client is using Qt's QNetworkAccessManager to send requests, but upon testing Ambassador I have found that requests were failing from our client (but working from browsers, curl, etc.) Initially I assumed this was a bug in Qt, but the RFC states that the Host header may include a port.
We ran into the same issue and worked around it by using host_regex:true and setting the host pattern like so: "some\.sub\.domain\.tld.*" Also works using . to stand in for the problematic colon: "some\.domain\.tld.9090"
@MattSurabian how exactly you made it work? I am having the same problem
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
not stale
@yokiworks Sorry for the delayed response here, you've probably figured it out or set it down by now. But if you're still wondering here's the mapping object spec as an example of the regex workaround I mentioned above:
prefix: /
rewrite: ""
host: www\.greatwebsite\.com.*
host_regex: true
service: great-website-service.master:8080
The UNESCAPED . after com serves as the stand in for the : that will be sent in the host header value and anything that comes after it. The star in the regex ensures that if the host header doesn't include the port number this mapping will still host match.
It should be noted that this work around would also allow host header matching for mutations of the TLD. Specifically, the above example will also host map to www.greatwebsite.computer. This probably isn't an issue in most use cases but it's worth noting as a side effect of this work around.
@MattSurabian which version of ambassador are you using? We tried it with 1.4.2 but could not get it to work. It feels like the regex is only applied to the hostname without the port.
Been putting off the upgrade to 1.X.X, we're the .8X.0 minor version family still. Was looking to do the migration in a few weeks, sounds like I may be in for at least one surprise when I do. I'd assume the dropped support for this workaround is through Envoy, right?
I'll let you know if I end up with a different work around on the latest version. Sorry for misleading anyone that's running the newer version :-(
Found this problem in 1.4.2 too. With regex it works with only 1 host. If you have multiple host and configure all of them using regex, only 1st host works (Ascending orders).
Same issue with us, exactly same problem. It regular expression it only works with the first host.
version 1.4.2
We upgraded recently to 1.5.3 and hit this. Somewhere between 0.86 and 1.5.3, Ambassador moved from a wildcard domains value in Envoy's configuration to the virtual host's FQDN.
(Some of the TLSContext hostname stuff is handwavey, please feel free to correct me.)
Mapping.host will ever work with recent Ambassador releases if a matching TLSContext is present, since domains is then taken from the matching TLSContext.host value.server_names interpolation should either reject or strip port numbers, since it doesn't make sense to have a port number in an SNI value.Ideally, the result is that Ambassador implements functionally equivalent logic to envoy#10960, so routing decisions treat an explicit external port as equivalent to implicit-port requests. Then, explicit port requests will work, and do so without a separate Mapping.
It seems the shortest path to that is by:
host:externalport to domainsmatch block using host:externalport as the match value.Figuring out the external port number might be tricky. It looks like some of the k8s service envvars have the external port number (I'm using the :443 HTTPS default), but I'm not sure how universal/portable that assumption is.
For example, with these mappings:
---
apiVersion: ambassador/v1
circuit_breakers:
- max_connections: 10000
max_pending_requests: 10000
connect_timeout_ms: 15000
host: graphql.dev.devoted.com
kind: Mapping
name: orinoco-7dc3eb653-1593020418_graphql_dev_devoted_com
prefix: /
resolver: endpoint
rewrite: /
service: graphql.orinoco-7dc3eb653-1593020418:443
timeout_ms: 1800000
tls: backend-graphql-client
---
apiVersion: ambassador/v1
circuit_breakers:
- max_connections: 10000
max_pending_requests: 10000
connect_timeout_ms: 15000
host: graphql.dev.devoted.com:443
kind: Mapping
name: orinoco-7dc3eb653-1593020418_graphql_dev_devoted_com:443
prefix: /
resolver: endpoint
rewrite: /
service: graphql.orinoco-7dc3eb653-1593020418:443
timeout_ms: 1800000
tls: backend-graphql-client
Ambassador 0.86 generates this config:
"filter_chain_match": {
"server_names": [
"graphql.prod.devoted.com"
]
},
"filters": [
"config": {
"route_config": {
"virtual_hosts": [
{
"domains": [
"*"
],
"name": "backend",
"routes": [
{
"match": {
"case_sensitive": true,
"headers": [
{
"exact_match": "graphql.prod.devoted.com:443",
"name": ":authority"
}
],
"prefix": "/",
"runtime_fraction": {
"default_value": {
"denominator": "HUNDRED",
"numerator": 0
},
"runtime_key": "routing.traffic_shift.cluster_graphql_orinoco_1e5fe20b5_159293-0"
}
},
"route": {
"cluster": "cluster_graphql_orinoco_1e5fe20b5_159293-0",
"prefix_rewrite": "/",
"priority": null,
"timeout": "1800.000s"
}
},
{
"match": {
"case_sensitive": true,
"headers": [
{
"exact_match": "graphql.prod.devoted.com",
"name": ":authority"
}
],
"prefix": "/",
"runtime_fraction": {
"default_value": {
"denominator": "HUNDRED",
"numerator": 0
},
"runtime_key": "routing.traffic_shift.cluster_graphql_orinoco_1e5fe20b5_159293-0"
}
},
"route": {
"cluster": "cluster_graphql_orinoco_1e5fe20b5_159293-0",
"prefix_rewrite": "/",
"priority": null,
"timeout": "1800.000s"
}
},
but the following one with 1.5.3 (domains has changed from * to a FQDN, everything else is the ~same). Since the host:port doesn't match anything in domains, Envoy returns a 404 NR.
{
"filter_chain_match": {
"server_names": [
"graphql.dev.devoted.com"
],
"transport_protocol": "tls"
},
"filters": [
{
"typed_config": {
"route_config": {
"virtual_hosts": [
{
"domains": [
"graphql.dev.devoted.com",
],
"name": "ambassador-listener-8443-graphql.dev.devoted.com",
"routes": [
{
"match": {
"case_sensitive": true,
"headers": [
{
"exact_match": "graphql.dev.devoted.com:443",
"name": ":authority"
},
{
"exact_match": "https",
"name": "x-forwarded-proto"
}
],
"prefix": "/",
"runtime_fraction": {
"default_value": {
"denominator": "HUNDRED",
"numerator": 0
},
"runtime_key": "routing.traffic_shift.cluster_graphql_orinoco_0dfd61322_159294-0"
}
},
"route": {
"cluster": "cluster_graphql_orinoco_0dfd61322_159294-0",
"prefix_rewrite": "/",
"priority": null,
"timeout": "1800.000s"
}
},
{
"match": {
"case_sensitive": true,
"headers": [
{
"exact_match": "graphql.dev.devoted.com",
"name": ":authority"
},
{
"exact_match": "https",
"name": "x-forwarded-proto"
}
],
"prefix": "/",
"runtime_fraction": {
"default_value": {
"denominator": "HUNDRED",
"numerator": 0
},
"runtime_key": "routing.traffic_shift.cluster_graphql_orinoco_0dfd61322_159294-0"
}
},
"route": { "cluster": "cluster_graphql_orinoco_0dfd61322_159294-0",
"prefix_rewrite": "/",
"priority": null,
"timeout": "1800.000s"
}
},
I found Envoy's troubleshooting guide for cases like this, which backs up the conclusion that domains is the cause.
At first, I thought envoy#10960 (datawire/ambassador#2818 datawire/envoy#3) would help with the :authority header matching, but:
domains and/or server_names matchingBeyond that, every Envoy binary I built segfaulted on startup. I thought Envoy's walk of its protobuf messages on startup caused this because of the protobuf field numbering discontinuity in my backport. Even after fixing that, Envoy still segfaulted with the same backtrace.
domainsSince that didn't work out, I started adding graphql.dev.devoted.com:443 to the domains list with this kludge in V2Listener.finalize().
Then, the request is successfully routed:
diff --git python/ambassador/envoy/v2/v2listener.py python/ambassador/envoy/v2/v2listener.py
index 5dbe693b1..1b5b1f28d 100644
--- python/ambassador/envoy/v2/v2listener.py
+++ python/ambassador/envoy/v2/v2listener.py
@@ -914,6 +914,9 @@ class V2Listener(dict):
]
}
+ if 'graphql' in vhost._hostname:
+ http_config["route_config"]["virtual_hosts"][0]["domains"].append(vhost._hostname + ":443")
+
filter_chain["filters"] = [
{
"name": "envoy.http_connection_manager",
server_names and domains gets populated with V2VirtualHost._hostname. When SNI is enabled, I _think_ this value comes from the TLSContext's hosts.
domains via TLSContextTrying a configuration-based approach, adding a host:port (graphql:443) item to the existing TLSContext.hosts (which had contained only graphql) generates the following Envoy config. It doesn't work, since server_names is matching on the SNI value for the incoming request and will never have a port number. Otherwise, it seems this config hunk would work.
{
"filter_chain_match": {
"server_names": [
"graphql.staging.devoted.com:443"
],
"transport_protocol": "tls"
},
"filters": [
{
"name": "envoy.http_connection_manager",
"typed_config": {
"route_config": {
"virtual_hosts": [
{
"domains": [
"graphql.staging.devoted.com:443"
],
"name": "ambassador-listener-8443-graphql.staging.devoted.com:443",
{
"match": {
"case_sensitive": true,
"headers": [
{
"exact_match": "graphql.staging.devoted.com:443",
"name": ":authority"
},
{
"exact_match": "https",
"name": "x-forwarded-proto"
}
],
"prefix": "/",
"runtime_fraction": {
"default_value": {
"denominator": "HUNDRED",
"numerator": 0
},
"runtime_key": "routing.traffic_shift.cluster_graphql_orinoco_4debe8700_159301-0"
}
},
"route": {
"cluster": "cluster_graphql_orinoco_4debe8700_159301-0",
"prefix_rewrite": "/",
"priority": null,
"timeout": "1800.000s"
}
},
@jwm Thanks for the detailed info!
What I'm thinking of here is
Host and/or TLSContextIs that something you could test quickly if we toss a build your way?
@kflynn totally. Thanks for looking into this!
@jwm I am also facing this issue. I am getting 404's when an external Prometheus stack is trying to scrape a /metrics endpoint that I have behind a mapping. I would also be happy to test.
I am facing similar issue, i tried workaround with regex it didn't work and getting 404's (working browser and curl). Is there any update on issue
upgrading ambassador on our production cluster completely broke the monitoring of all our microservices due to this issue, has there been any progress? it seems as if it hasn't been figured out, which means we will probably have to migrate to another solution unfortunately
Note, this has been added as a config on Envoy https://github.com/envoyproxy/envoy/pull/10960.
@kflynn the better approach would be to allow users to specify enabling this flag in the Host file
As is probably obvious, since my last comment here I got pulled in (many) different directions. 馃檨 @matdehaast, many thanks! As of 1.7.0, we have that Envoy fix, so let me see if there's a quick way to enable that flag.
Hi, is there an update on this? We are currently facing an issue on integrating a third party solution because of this. Im wondering if i should wait for this to get solved or to go around it another way 馃槉
Hi,
In terms of a workaround the two I've seen done are either:
A) use a Lua script in the Module to strip the port number from the Authority or Host header
B) to use host_regex on the mapping so that both the regular hostname and hostname:port get matched.
I think you can also do a literal match on hostname:port, in a mapping as well, although that duplicates the mappings.
Hope that helps!
Cool, i tried your Lua suggestion and it seems to be working
here is the script for it if anyone needs it
spec:
config:
lua_scripts: |
function envoy_on_request(request_handle)
if request_handle:headers():get("Host") == "someurl.com:443" then
request_handle:headers():replace("Host", "someurl.com")
end
end
Will this however be suported by ambassador so i dont need to have a Lua script run on each request? I did not get the host_regex to work last time i tried sadly
Hm, did this behavior end up changing in 1.7.x? I upgraded today, and noticed that our local kludge wasn't necessary any more.
In fact, it was breaking the Envoy config, which wound up looking like:
"virtual_hosts": [
{
"domains": [
"graphql.staging.devoted.com",
"graphql.staging.devoted.com:443",
"graphql.staging.devoted.com:443",
"graphql.staging.devoted.com:443:443"
],
which caused:
[2020-10-01 22:25:55.563][105][critical][main] [source/server/config_validation/server.cc:60] error initializing configuration '/ambassador/snapshots/econf-tmp.json': Only unique values for domains are permitted. Duplicate entry of domain graphql.staging.devoted.com:443
Removing the kludge (i.e., going back to a stock Ambassador) yields:
"virtual_hosts": [
{
"domains": [
"graphql.staging.devoted.com",
"graphql.staging.devoted.com:443"
],
Oops, ignore that previous comment. There was a TLSContext that I didn't realize had graphql:443 configured on it.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
not stale
Most helpful comment
As is probably obvious, since my last comment here I got pulled in (many) different directions. 馃檨 @matdehaast, many thanks! As of 1.7.0, we have that Envoy fix, so let me see if there's a quick way to enable that flag.