Description
This bug has already been fixed in the past and it came back when updating the Helm Chart from 6.5.5 to 6.5.6 (or Ambassador 1.7.1 to 1.7.2).
Even if one follows the documentation for ClearText support, Ambassador categorically refuses to forward non HTTP requests and always forces redirection (HTTP 301).
---
apiVersion: getambassador.io/v2
kind: Module
metadata:
name: ambassador
namespace: ambassador
spec:
config:
enable_grpc_web: true
use_proxy_proto: false
use_remote_address: true
x_forwarded_proto_redirect: false
...
apiVersion: getambassador.io/v2
kind: Host
metadata:
name: ambassador
namespace: ambassador
spec:
hostname: "*.example.com"
acmeProvider:
authority: none
requestPolicy:
insecure:
action: Route
additionalPort: -1
...
Hey -- could we get the logs from an Ambassador log running into this problem? kubectl logs ... | grep -v ACCESS would be lovely.
(Also, if you're not already on our Slack, that might be a good place to talk about this too.
ACCESS [2020-09-18T16:06:34.287Z] "GET / HTTP/1.1" 301 - 0 0 0 - "-" "curl/7.68.0" "ccb560ea-c965-450e-9189-679431e3b5e1" "argo-cd.example.com:8080" "-"
ACCESS [2020-09-18T16:07:09.345Z] "GET / HTTP/1.1" 301 - 0 0 0 - "-" "curl/7.68.0" "8d6d8282-c631-47f1-b024-ff14bfe5d59d" "argo-cd.example.com:8080" "-"
Hi,
Based on your configuration, I'm a little confused on what you're trying to accomplish. Right now, in the Module it seems like you want all traffic that contains the x_forwarded_proto: http to get redirected to HTTPS (which is the current behavior you are seeing). The other thing is that the additionalPort value describes what port to apply the insecure.action on. By setting it to -1, you're effectively saying "apply Route to the port -1".
edit: I was wrong about the above: For future reference: additionalPort means to specify a dedicated port for insecure requests. By default this is usually 8080, but a -1 value explicity turns off the additionalPort behavior entirely.
I tried several variants of configuration for the additionalPort parameter:
The value '-1' is suggested in the Ambassador documentation to disable all redirects. Here is an excerpt from the documentation:
The additionalPort element tells Ambassador to listen on the specified insecure-port and treat any request arriving on that port as insecure. By default, additionalPort will be set to 8080 for any Host using TLS. To disable this redirection entirely, set additionalPort explicitly to -1:
I also tried this configuration for the module:
use_remote_address: true
x_forwarded_proto_redirect: false
It gives the same behavior. My configuration has been working for a long time. My ex-colleagues who also work with Ambassador saw the same problem with version 1.7.2.
@srheaume What are you trying to do, exactly? The thing that's confusing me is that x_forwarded_proto_redirect: true means "respond with a 301 to any request that doesn't include X-Forwarded-Proto: https", where an insecure action of Route means "route any insecure request", which explicitly includes those without X-Forwarded-Proto: https. So on the face of it, those two would seem to be at odds with each other, which may imply that we don't correctly understand how you've deployed Ambassador. Is there a load balancer involved? Who's terminating TLS?
Wrong copy/paste. Sorry for the ambiguity. My config is the following for the module:
---
apiVersion: getambassador.io/v2
kind: Module
metadata:
name: ambassador
namespace: ambassador
spec:
config:
enable_grpc_web: true
use_proxy_proto: false
use_remote_address: true
x_forwarded_proto_redirect: false
...
Communications are in HTTP from end to end.
I believe the issue is that the .insecure.action isn't being applied to hostnames with * in them.
@LukeShu I tried a hostname without a wildcard and unfortunately it gives the same result.
@srheaume Can you verify whether docker.io/datawiredev/aes:d0d6227c3a9b fixes the issue? It does in all of our testing, but let's verify that we're testing the same thing that you're hitting.
still have ERR_SSL_PROTOCOL_ERROR
@kflynn Do you think this could be the same thing as the redirect_cleartext_from regression (#2957)?
@srheaume Wait. ERR_SSL_PROTOCOL_ERROR implies that you're trying to speak HTTPS rather than HTTP. Could you please describe exactly how your Ambassador is set up, including the load balancer in front of it? I apologize for rewinding here, but if you're getting an SSL protocol error, we pretty clearly have some inconsistencies in what you're trying to say and what we're understanding.
Here is my config to reproduce the issue. If you need a conference call, just let me know.
apiVersion: getambassador.io/v2
kind: Module
name: ambassador
namespace: prd-ambassador
spec:
ambassador_id: ambassador-prd
config:
enable_grpc_web: true
use_proxy_proto: false
use_remote_address: true
x_forwarded_proto_redirect: false
apiVersion: getambassador.io/v2
kind: Host
metadata:
name: prd-ambassador
namespace: prd-ambassador
spec:
acmeProvider:
authority: none
ambassador_id: ambassador-prd
hostname: '*.prd.infra.eng.acme.internal'
requestPolicy:
insecure:
action: Route
apiVersion: argoproj.io/v1alpha1
kind: Application
finalizers:
- resources-finalizer.argocd.argoproj.io
name: prd-ambassador
namespace: prd-argo-cd
spec:
destination:
namespace: prd-ambassador
server: https://kubernetes.default.svc
project: infra
source:
chart: ambassador
helm:
parameters:
- name: fullnameOverride
value: prd-ambassador
- name: env.AMBASSADOR_ID
value: ambassador-prd
- name: image.repository
value: docker.io/datawiredev/aes
- name: image.tag
value: d0d6227c3a9b
- name: adminService.create
value: "true"
- name: adminService.type
value: NodePort
- name: adminService.port
value: "8877"
- name: adminService.nodePort
- name: service.type
value: ClusterIP
- name: service.ports[0].name
value: http
- name: service.ports[0].port
value: "8080"
- name: service.ports[0].targetPort
value: "8080"
- name: service.ports[1].name
value: https
- name: service.ports[1].port
value: "8443"
- name: service.ports[1].targetPort
value: "8443"
- name: service.externalTrafficPolicy
- name: enableAES
value: "false"
values: |
nodeSelector:
{}
tolerations:
[]
resources:
{}
repoURL: https://getambassador.io
targetRevision: 6.5.5
syncPolicy:
automated:
prune: true
selfHeal: true
apiVersion: getambassador.io/v2
kind: Mapping
metadata:
name: prd-argo-cd
namespace: prd-argo-cd
spec:
ambassador_id: ambassador-prd
bypass_auth: true
connect_timeout_ms: 5000
host: argo-cd.prd.infra.eng.acme.internal:8080
idle_timeout_ms: 500000
prefix: /
service: prd-argo-cd-server.prd-argo-cd:80
timeout_ms: 5000
127.0.0.1 argo-cd.prd.infra.eng.acme.internal
kubectl port-forward -n prd-ambassador svc/prd-ambassador 8080:8080
Login by visiting http://argo-cd.prd.infra.eng.acme.internal:8080
in a browser and use the following credentials:
More info. The problem does not happen with:
- name: image.repository
value: quay.io/datawire/ambassador
- name: image.tag
value: 1.7.1
It might be related to redirects. I forgot to mention that ERR_SSL_PROTOCOL_ERROR is new. I didn't have this return code before. It went from HTTP 301 to ERR_SSL_PROTOCOL_ERROR. The Argo-CD redirect URL is set to http://argo-cd.prd.infra.eng.acme.internal:8080
Personally, this is not a scenario I would use in production. I use HTTP only during development. I can do without it. However, it would be nice if the product supports it properly.
Re: @kflynn's question: It sounds like it's redirecting-to-https, then after the client follows the redirect it's getting the error.
OK, I no longer think this is the same underlying cause as #2957.
There are at least 2 things going on here:
x_forwarded_proto_redirect should be doing is changing the default insecure.action from Redirect to Route; it shouldn't be necessary to both x_forwarded_proto_redirect=false and insecure.action=Route. (note: this is happening in irlistener.py, and I'm real confused if that's the right time/place for it)Host resources with a * in the domain (except for the special case of just "*")x_forwarded_proto_redirect not working right, and possibly interfering?So, @srheaume I think you should be able to work around the issue by using either the dev build I posted above or 1.7.3 (just released a few moments ago), and _only_ using the Hosts and dropping x_forwarded_proto_redirect?
But I'm still going to spend more time digging in to this and trying to reproduce.
@LukeShu it didn't work for me. Go back to 1.7.1.
This doesn't work anymore for me for root URLs:
https://www.getambassador.io/docs/latest/topics/running/ambassador-with-aws/#l7-load-balancer-elb-or-alb
I had to revert to previous Ambassador version to make it work again.
We also had to rollback a production release a couple of minutes ago. We're using an ALB and doing TLS termination at LB level. Ambassador is exposing HTTP only and all requests routed through the API responded with 301.
The upgrade was from version 1.7.1 to 1.7.3
@bmariesan, we'd be interested in seeing your configuration, if possible?
@jorgebirck, I think you're seeing #2996, for which we're also already working on a fix. I'm separating these two issues because the cause of #2996 is well-understood, and this one - at least originally - was more about wildcard hosts.
That being said, though -- @srheaume, are you in fact seeing trouble with 1.7.3 and Mappings that _aren't_ on /? or is it just / with 1.7.3?
@kflynn I cannot share the LB configuration but I've tested with a NLB and there we cannot reproduce the issue
@bmariesan I was thinking more of the Ambassador configuration -- the Hosts and/or Mappings involved.
@kflynn
Host:
apiVersion: getambassador.io/v2
kind: Host
metadata:
name: ambassador-host-definition
spec:
hostname: our-api-dns
acmeProvider:
authority: none
requestPolicy:
insecure:
action: Route
Mapping:
apiVersion: getambassador.io/v2
kind: Mapping
metadata:
name: api-mapping
spec:
prefix: /
service: our-api.dns
timeout_ms: 60000
connect_timeout_ms: 60000
labels:
ambassador:
- request_label_header:
- path:
header: ":path"
omit_if_not_present: false
- request_label_header:
- api_key:
header: "Authorization"
omit_if_not_present: true
mappings with '/'
@bmariesan Thanks! Any chance I could get your system's Ambassador snapshot? If you kubectl exec onto an Ambassador pod and run grab-snapshots, it'll produce a file called sanitized.tgz in your current directory. I would strongly encourage you to look at the contents first, but if that turns out to be something you can send me, that'd be great.
Same for you, too, actually, @srheaume?
@bmariesan @srheaume @jorgebirck I would also welcome having you folks on [our Slack]{https://d6e.co/slack) as that might be a simpler way to kick ideas around. 馃檪
@kflynn a bit late on my side since I've did a rollback of all envs back to 1.7.1
@bmariesan Ah, darn, was hoping you had a dev environment somewhere. OK. Your ALB is presumably sending along X-Forwarded-Proto? Do you have xff_num_trusted_hops configured in the ambassador Module?
@kflynn
Nope, just the default for that (according to the docs that xff_num_trusted_hops: 0). And regarding the X-Forwarded-Proto it is indeed sent along by the ALB. Perhaps if I get some time tomorrow I'll upgrade an environment back to 1.7.3 and try to get the diagnostics for you.
apiVersion: getambassador.io/v2
kind: Module
metadata:
name: ambassador
spec:
config:
envoy_log_type: json
envoy_log_path: /dev/stdout
diagnostics:
enabled: {{ .Values.enableDiagnostics }}
@srheaume Can you verify that docker.io/datawiredev/aes:4f77966b7ee9 fixes the issue for you? It's an unsatisfying fix for me, because it is rolling back to the v1.6.2 version of the code because the 1.7.x changes have been so problematic.
One of the several things going on here is the port number (cc @rhs).
When you hit http://argo-cd.prd.infra.eng.acme.internal:8080/, the redirect it sends back is location: https://argo-cd.prd.infra.eng.acme.internal:8080/--it doesn't strip the port number (this is the same kind of thing we see with edgectl login for local dev).
A second of the several things going on here is also the port number (cc @rhs).
It's setting insecure.action=Route for :authority==argo-cd.prd.infra.eng.acme.internal, but the request has :authority=argo-cd.prd.infra.eng.acme.internal:8080, so that doesn't match, it falls back to the default insecure.action=Redirect.
So now my question is if the old behavior is vhosts accepting any port number, or if it's just the single-insecure-action thing.
Answer to that question: It doesn't matter, because forced-star; so there's only 1 vhost.
This has been fixed in 1.7.4 by reverting the changes for https://github.com/datawire/ambassador/issues/2888 that shipped in 1.7.0.