Ambassador: Regression: Ambassador does not route insecure requests

Created on 18 Sep 2020 · 37Comments · Source: datawire/ambassador

Description

This bug has already been fixed in the past and it came back when updating the Helm Chart from 6.5.5 to 6.5.6 (or Ambassador 1.7.1 to 1.7.2).

Even if one follows the documentation for ClearText support, Ambassador categorically refuses to forward non HTTP requests and always forces redirection (HTTP 301).

---
apiVersion: getambassador.io/v2
kind: Module
metadata:
  name: ambassador
  namespace: ambassador
spec:
  config:
    enable_grpc_web: true
    use_proxy_proto: false
    use_remote_address: true
    x_forwarded_proto_redirect: false
...
apiVersion: getambassador.io/v2
kind: Host
metadata:
  name: ambassador
  namespace: ambassador
spec:
  hostname: "*.example.com"
  acmeProvider:
    authority: none
  requestPolicy:
    insecure:
      action: Route
      additionalPort: -1
...

bug

Source

srheaume

👍1

All 37 comments

Hey -- could we get the logs from an Ambassador log running into this problem? kubectl logs ... | grep -v ACCESS would be lovely.

kflynn on 18 Sep 2020

(Also, if you're not already on our Slack, that might be a good place to talk about this too.

kflynn on 18 Sep 2020

ACCESS [2020-09-18T16:06:34.287Z] "GET / HTTP/1.1" 301 - 0 0 0 - "-" "curl/7.68.0" "ccb560ea-c965-450e-9189-679431e3b5e1" "argo-cd.example.com:8080" "-"
ACCESS [2020-09-18T16:07:09.345Z] "GET / HTTP/1.1" 301 - 0 0 0 - "-" "curl/7.68.0" "8d6d8282-c631-47f1-b024-ff14bfe5d59d" "argo-cd.example.com:8080" "-"

srheaume on 18 Sep 2020

Hi,

Based on your configuration, I'm a little confused on what you're trying to accomplish. Right now, in the Module it seems like you want all traffic that contains the x_forwarded_proto: http to get redirected to HTTPS (which is the current behavior you are seeing). ~~The other thing is that the additionalPort value describes what port to apply the insecure.action on. By setting it to -1, you're effectively saying "apply Route to the port -1".~~

edit: I was wrong about the above: For future reference: additionalPort means to specify a dedicated port for insecure requests. By default this is usually 8080, but a -1 value explicity turns off the additionalPort behavior entirely.

cakuros on 18 Sep 2020

I tried several variants of configuration for the additionalPort parameter:

no additionalPort specified
additionalPort=8080
additionalPort=-1

The value '-1' is suggested in the Ambassador documentation to disable all redirects. Here is an excerpt from the documentation:

The additionalPort element tells Ambassador to listen on the specified insecure-port and treat any request arriving on that port as insecure. By default, additionalPort will be set to 8080 for any Host using TLS. To disable this redirection entirely, set additionalPort explicitly to -1:

I also tried this configuration for the module:

use_remote_address: true
x_forwarded_proto_redirect: false

It gives the same behavior. My configuration has been working for a long time. My ex-colleagues who also work with Ambassador saw the same problem with version 1.7.2.

srheaume on 18 Sep 2020

@srheaume What are you trying to do, exactly? The thing that's confusing me is that x_forwarded_proto_redirect: true means "respond with a 301 to any request that doesn't include X-Forwarded-Proto: https", where an insecure action of Route means "route any insecure request", which explicitly includes those without X-Forwarded-Proto: https. So on the face of it, those two would seem to be at odds with each other, which may imply that we don't correctly understand how you've deployed Ambassador. Is there a load balancer involved? Who's terminating TLS?

kflynn on 18 Sep 2020

Wrong copy/paste. Sorry for the ambiguity. My config is the following for the module:

---
apiVersion: getambassador.io/v2
kind: Module
metadata:
  name: ambassador
  namespace: ambassador
spec:
  config:
    enable_grpc_web: true
    use_proxy_proto: false
    use_remote_address: true
    x_forwarded_proto_redirect: false
...

Communications are in HTTP from end to end.

srheaume on 18 Sep 2020

I believe the issue is that the .insecure.action isn't being applied to hostnames with * in them.

LukeShu on 18 Sep 2020

👀1

@LukeShu I tried a hostname without a wildcard and unfortunately it gives the same result.

srheaume on 18 Sep 2020

@srheaume Can you verify whether docker.io/datawiredev/aes:d0d6227c3a9b fixes the issue? It does in all of our testing, but let's verify that we're testing the same thing that you're hitting.

LukeShu on 21 Sep 2020

still have ERR_SSL_PROTOCOL_ERROR

srheaume on 29 Sep 2020

@kflynn Do you think this could be the same thing as the redirect_cleartext_from regression (#2957)?

LukeShu on 29 Sep 2020

@srheaume Wait. ERR_SSL_PROTOCOL_ERROR implies that you're trying to speak HTTPS rather than HTTP. Could you please describe exactly how your Ambassador is set up, including the load balancer in front of it? I apologize for rewinding here, but if you're getting an SSL protocol error, we pretty clearly have some inconsistencies in what you're trying to say and what we're understanding.

kflynn on 29 Sep 2020

👍1

Here is my config to reproduce the issue. If you need a conference call, just let me know.

K8S Configuration

Ambassador Module

apiVersion: getambassador.io/v2
kind: Module
  name: ambassador
  namespace: prd-ambassador
spec:
  ambassador_id: ambassador-prd
  config:
    enable_grpc_web: true
    use_proxy_proto: false
    use_remote_address: true
    x_forwarded_proto_redirect: false

Ambassador Host

apiVersion: getambassador.io/v2
kind: Host
metadata:
  name: prd-ambassador
  namespace: prd-ambassador
spec:
  acmeProvider:
    authority: none
  ambassador_id: ambassador-prd
  hostname: '*.prd.infra.eng.acme.internal'
  requestPolicy:
    insecure:
      action: Route

Ambassador Deployment (with Argo-CD)

apiVersion: argoproj.io/v1alpha1
kind: Application
  finalizers:
  - resources-finalizer.argocd.argoproj.io
  name: prd-ambassador
  namespace: prd-argo-cd
spec:
  destination:
    namespace: prd-ambassador
    server: https://kubernetes.default.svc
  project: infra
  source:
    chart: ambassador
    helm:
      parameters:
      - name: fullnameOverride
        value: prd-ambassador
      - name: env.AMBASSADOR_ID
        value: ambassador-prd
      - name: image.repository
        value: docker.io/datawiredev/aes
      - name: image.tag
        value: d0d6227c3a9b
      - name: adminService.create
        value: "true"
      - name: adminService.type
        value: NodePort
      - name: adminService.port
        value: "8877"
      - name: adminService.nodePort
      - name: service.type
        value: ClusterIP
      - name: service.ports[0].name
        value: http
      - name: service.ports[0].port
        value: "8080"
      - name: service.ports[0].targetPort
        value: "8080"
      - name: service.ports[1].name
        value: https
      - name: service.ports[1].port
        value: "8443"
      - name: service.ports[1].targetPort
        value: "8443"
      - name: service.externalTrafficPolicy
      - name: enableAES
        value: "false"
      values: |
        nodeSelector:
          {}
        tolerations:
          []
        resources:
          {}
    repoURL: https://getambassador.io
    targetRevision: 6.5.5
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Argo-CD UI Mapping

apiVersion: getambassador.io/v2
kind: Mapping
metadata:
  name: prd-argo-cd
  namespace: prd-argo-cd
spec:
  ambassador_id: ambassador-prd
  bypass_auth: true
  connect_timeout_ms: 5000
  host: argo-cd.prd.infra.eng.acme.internal:8080
  idle_timeout_ms: 500000
  prefix: /
  service: prd-argo-cd-server.prd-argo-cd:80
  timeout_ms: 5000

Add the following to /etc/hosts

127.0.0.1 argo-cd.prd.infra.eng.acme.internal

Start port-forwarding for Ambassador

kubectl port-forward -n prd-ambassador svc/prd-ambassador 8080:8080

Open a browser to the Argo CD external UI

username: admin
password: admin

srheaume on 29 Sep 2020

More info. The problem does not happen with:

- name: image.repository
  value: quay.io/datawire/ambassador
- name: image.tag
  value: 1.7.1

srheaume on 29 Sep 2020

It might be related to redirects. I forgot to mention that ERR_SSL_PROTOCOL_ERROR is new. I didn't have this return code before. It went from HTTP 301 to ERR_SSL_PROTOCOL_ERROR. The Argo-CD redirect URL is set to http://argo-cd.prd.infra.eng.acme.internal:8080

srheaume on 29 Sep 2020

Personally, this is not a scenario I would use in production. I use HTTP only during development. I can do without it. However, it would be nice if the product supports it properly.

srheaume on 29 Sep 2020

Re: @kflynn's question: It sounds like it's redirecting-to-https, then after the client follows the redirect it's getting the error.

OK, I no longer think this is the same underlying cause as #2957.

There are at least 2 things going on here:

background: What x_forwarded_proto_redirect should be doing is changing the default insecure.action from Redirect to Route; it shouldn't be necessary to both x_forwarded_proto_redirect=false and insecure.action=Route. (note: this is happening in irlistener.py, and I'm real confused if that's the right time/place for it)
1: there was a regression in 1.7.2 where it no longer properly handled the insecure.action for Host resources with a * in the domain (except for the special case of just "*")
2: There's also clearly something hokey going on with x_forwarded_proto_redirect not working right, and possibly interfering?

So, @srheaume I think you should be able to work around the issue by using either the dev build I posted above or 1.7.3 (just released a few moments ago), and _only_ using the Hosts and dropping x_forwarded_proto_redirect?

But I'm still going to spend more time digging in to this and trying to reproduce.

LukeShu on 29 Sep 2020

@LukeShu it didn't work for me. Go back to 1.7.1.

srheaume on 29 Sep 2020

This doesn't work anymore for me for root URLs:

https://www.getambassador.io/docs/latest/topics/running/ambassador-with-aws/#l7-load-balancer-elb-or-alb

I had to revert to previous Ambassador version to make it work again.

jorgebirck on 30 Sep 2020

We also had to rollback a production release a couple of minutes ago. We're using an ALB and doing TLS termination at LB level. Ambassador is exposing HTTP only and all requests routed through the API responded with 301.

The upgrade was from version 1.7.1 to 1.7.3

bmariesan on 1 Oct 2020

@bmariesan, we'd be interested in seeing your configuration, if possible?

@jorgebirck, I think you're seeing #2996, for which we're also already working on a fix. I'm separating these two issues because the cause of #2996 is well-understood, and this one - at least originally - was more about wildcard hosts.

That being said, though -- @srheaume, are you in fact seeing trouble with 1.7.3 and Mappings that _aren't_ on /? or is it just / with 1.7.3?

kflynn on 1 Oct 2020

@kflynn I cannot share the LB configuration but I've tested with a NLB and there we cannot reproduce the issue

bmariesan on 1 Oct 2020

@bmariesan I was thinking more of the Ambassador configuration -- the Hosts and/or Mappings involved.

kflynn on 1 Oct 2020

@kflynn
Host:

apiVersion: getambassador.io/v2
kind: Host
metadata:
  name: ambassador-host-definition
spec:
  hostname: our-api-dns
  acmeProvider:
    authority: none
  requestPolicy:
    insecure:
      action: Route

Mapping:

apiVersion: getambassador.io/v2
kind: Mapping
metadata:
  name: api-mapping
spec:
  prefix: /
  service: our-api.dns
  timeout_ms: 60000
  connect_timeout_ms: 60000
  labels:
    ambassador:
      - request_label_header:
        - path:
            header: ":path"
            omit_if_not_present: false
      - request_label_header:
        - api_key:
            header: "Authorization"
            omit_if_not_present: true

bmariesan on 1 Oct 2020

mappings with '/'

srheaume on 1 Oct 2020

@bmariesan Thanks! Any chance I could get your system's Ambassador snapshot? If you kubectl exec onto an Ambassador pod and run grab-snapshots, it'll produce a file called sanitized.tgz in your current directory. I would strongly encourage you to look at the contents first, but if that turns out to be something you can send me, that'd be great.

Same for you, too, actually, @srheaume?

kflynn on 1 Oct 2020

@bmariesan @srheaume @jorgebirck I would also welcome having you folks on [our Slack]{https://d6e.co/slack) as that might be a simpler way to kick ideas around. 🙂

kflynn on 1 Oct 2020

@kflynn a bit late on my side since I've did a rollback of all envs back to 1.7.1

bmariesan on 1 Oct 2020

@bmariesan Ah, darn, was hoping you had a dev environment somewhere. OK. Your ALB is presumably sending along X-Forwarded-Proto? Do you have xff_num_trusted_hops configured in the ambassador Module?

kflynn on 1 Oct 2020

@kflynn
Nope, just the default for that (according to the docs that xff_num_trusted_hops: 0). And regarding the X-Forwarded-Proto it is indeed sent along by the ALB. Perhaps if I get some time tomorrow I'll upgrade an environment back to 1.7.3 and try to get the diagnostics for you.

apiVersion: getambassador.io/v2
kind:  Module
metadata:
  name:  ambassador
spec:
  config:
    envoy_log_type: json
    envoy_log_path: /dev/stdout
    diagnostics:
      enabled: {{ .Values.enableDiagnostics }}

bmariesan on 1 Oct 2020

@srheaume Can you verify that docker.io/datawiredev/aes:4f77966b7ee9 fixes the issue for you? It's an unsatisfying fix for me, because it is rolling back to the v1.6.2 version of the code because the 1.7.x changes have been so problematic.

LukeShu on 2 Oct 2020

One of the several things going on here is the port number (cc @rhs).

When you hit http://argo-cd.prd.infra.eng.acme.internal:8080/, the redirect it sends back is location: https://argo-cd.prd.infra.eng.acme.internal:8080/--it doesn't strip the port number (this is the same kind of thing we see with edgectl login for local dev).

LukeShu on 6 Oct 2020

A second of the several things going on here is also the port number (cc @rhs).

It's setting insecure.action=Route for :authority==argo-cd.prd.infra.eng.acme.internal, but the request has :authority=argo-cd.prd.infra.eng.acme.internal:8080, so that doesn't match, it falls back to the default insecure.action=Redirect.

LukeShu on 6 Oct 2020

So now my question is if the old behavior is vhosts accepting any port number, or if it's just the single-insecure-action thing.

LukeShu on 6 Oct 2020

Answer to that question: It doesn't matter, because forced-star; so there's only 1 vhost.

LukeShu on 6 Oct 2020

This has been fixed in 1.7.4 by reverting the changes for https://github.com/datawire/ambassador/issues/2888 that shipped in 1.7.0.