Ingress-nginx: Nginx dropping Connect/Upgrade headers for WebSocket handshake

Created on 9 Feb 2019 · 35Comments · Source: kubernetes/ingress-nginx

NGINX Ingress controller version: 0.22.0 and 0.17.1

Kubernetes version (use kubectl version): 1.9.6

Environment:

Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): Don't know
Kernel (e.g. uname -a): 4.4.121-k8s
Install tools: kops
Others:

What happened:

When connecting to a backend via ingress-nginx and a frontend ELB, the Connect: upgrade and Upgrade: websocket headers are being dropped from the request. This is causing my backend to reject the request with a 426 Upgrade Required response, though that is specific to the app server (Cowboy in a Phoenix/Elixir backend).

What you expected to happen:

Those headers should be passed through. Looking at the generated nginx.conf, I have these lines in the server block for the Ingress:

            # Allow websocket connections
            proxy_set_header                        Upgrade           $http_upgrade;

            proxy_set_header                        Connection        $connection_upgrade;

That should be passing them along. I have also tried overwriting/overloading them via the Ingress:

    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $connection_upgrade;

No such luck there. I've also tried forcing their values, but it doesn't work:

    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header Upgrade "websocket";
      proxy_set_header Connection "upgrade";

How to reproduce it (as minimally and precisely as possible):

I'm able to reproduce this issue with the echoserver container:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: websocket-test
  annotations:
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
    nginx.ingress.kubernetes.io/send-timeout: "3600"
spec:
  rules:
  - host: websocket-test.domain.com
    http:
      paths:
      - path: /
        backend:
          serviceName: websocket-test
          servicePort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: websocket-test
spec:
  ports:
  - name: websocket-test
    port: 80
    targetPort: 8080
    protocol: TCP
  selector:
    app: websocket-test
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: websocket-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: websocket-test
  template:
    metadata:
      labels:
        app: websocket-test
    spec:
      containers:
      - name: websocket-test
        image: k8s.gcr.io/echoserver:1.4
        ports:
        - containerPort: 8080

Making a dummy WebSocket request to the endpoint produces these results:

curl -v 'http://websocket-test.domain.com/' -H 'Upgrade: websocket' -H 'Connection: Upgrade'
*   Trying 1.2.3.4...
* TCP_NODELAY set
* Connected to websocket-test.domain.com (1.2.3.4) port 80 (#0)
> GET / HTTP/1.1
> Host: websocket-test.domain.com
> User-Agent: curl/7.54.0
> Accept: */*
> Upgrade: websocket
> Connection: Upgrade
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Date: Sat, 09 Feb 2019 20:58:11 GMT
< Server: nginx/1.15.8
< Vary: Accept-Encoding
< transfer-encoding: chunked
< Connection: keep-alive
<
CLIENT VALUES:
client_address=100.100.0.16
command=GET
real path=/
query=nil
request_version=1.1
request_uri=http://websocket-test.domain.com:8080/

SERVER VALUES:
server_version=nginx: 1.10.0 - lua: 10001

HEADERS RECEIVED:
accept=*/*
host=websocket-test.domain.com
user-agent=curl/7.54.0
x-forwarded-for=1.2.3.4
x-forwarded-host=websocket-test.domain.com
x-forwarded-port=80
x-forwarded-proto=http
x-original-forwarded-for=1.2.3.4
x-original-uri=/
x-real-ip=1.2.3.4
x-request-id=a788919db6caf0294be42fdfea14ca27
x-scheme=http

Now, for the fun part: It works just fine from within the cluster!

Here's a request from within another pod:

curl 'ingress-nginx.ingress-nginx.svc.cluster.local' -H 'Upgrade: websocket' -H 'Connection: Upgrade'  -H 'Host: websocket-test.domain.com' -v
* Rebuilt URL to: ingress-nginx.ingress-nginx.svc.cluster.local/
*   Trying 100.70.191.39...
* TCP_NODELAY set
* Connected to ingress-nginx.ingress-nginx.svc.cluster.local (100.70.191.39) port 80 (#0)
> GET / HTTP/1.1
> Host: websocket-test.domain.com
> User-Agent: curl/7.52.1
> Accept: */*
> Upgrade: websocket
> Connection: Upgrade
>
< HTTP/1.1 200 OK
< Server: nginx/1.15.8
< Date: Sat, 09 Feb 2019 20:58:07 GMT
< Content-Type: text/plain
< Transfer-Encoding: chunked
< Connection: keep-alive
< Vary: Accept-Encoding
<
CLIENT VALUES:
client_address=100.100.0.16
command=GET
real path=/
query=nil
request_version=1.1
request_uri=http://websocket-test.domain.com:8080/

SERVER VALUES:
server_version=nginx: 1.10.0 - lua: 10001

HEADERS RECEIVED:
accept=*/*
connection=upgrade
host=websocket-test.domain.com
upgrade=websocket
user-agent=curl/7.52.1
x-forwarded-for=100.126.0.10
x-forwarded-host=websocket-test.domain.com
x-forwarded-port=80
x-forwarded-proto=http
x-original-uri=/
x-real-ip=100.126.0.10
x-request-id=9dcc6cc94455ec7e04fcf89cd488714b
x-scheme=http

Even more odd is that I can get them to go through for the first request after the config is reloaded in nginx. Whatever is filtering it out is only doing so after a first pass of the request chain.

My best guess is something to do with openresty. I have yet to do more testing with manual tweaking of the nginx config. I've tried logging out the access headers in the request tail phase, but they appear to be stripped by that point (not surprising):

    nginx.ingress.kubernetes.io/configuration-snippet: |
      access_by_lua_block {
        local h = ngx.req.get_headers()
        for k, v in pairs(h) do
          ngx.log(ngx.ERR, "Got header "..k..": "..v..";")
        end
      }

Any thoughts on what we're doing wrong?

Source

timdorr

👍20

Most helpful comment

@jsdevtom This isn't about generalized WebSocket support. That's definitely supported. This is about it breaking when using an SSL-terminating AWS ELB in front of the ingress controller.

timdorr on 28 Apr 2019

👍10

All 35 comments

FWIW, I've also set up a LoadBalancer Service fronted by an ELB externally. I am getting the headers when I take ingress-nginx out of the chain:

curl -v 'https://websocket-test-lb.domain.com/' -H 'Upgrade: websocket' -H 'Connection: Upgrade'
* Trying 1.2.3.4...
* TCP_NODELAY set
* Connected to websocket-test-lb.domain.com (1.2.3.4) port 443 (#0)
* SSL certificate verify ok.
> GET / HTTP/1.1
> Host: websocket-test-lb.domain.com
> User-Agent: curl/7.54.0
> Accept: */*
> Upgrade: websocket
> Connection: Upgrade
>
< HTTP/1.1 200 OK
< Server: nginx/1.10.0
< Date: Sat, 09 Feb 2019 21:12:11 GMT
< Content-Type: text/plain
< Transfer-Encoding: chunked
< Connection: keep-alive
<
CLIENT VALUES:
client_address=100.112.0.0
command=GET
real path=/
query=nil
request_version=1.1
request_uri=http://websocket-test-lb.domain.com:8080/

SERVER VALUES:
server_version=nginx: 1.10.0 - lua: 10001

HEADERS RECEIVED:
accept=*/*
connection=Upgrade
host=websocket-test-lb.domain.com
upgrade=websocket
user-agent=curl/7.54.0

timdorr on 9 Feb 2019

@timdorr They only support websockets over http or https/ssl if you terminate ssl on nginx side, doesn't work if ssl is terminated on nginx side - https://github.com/kubernetes/ingress-nginx/issues/1822

midN on 15 Mar 2019

I just ran into the same issue today. However, it worked for one ingress but didn't for the other one.

Comparing the ingress service load balancer, I needed to change the ELB to use TCP instead of HTTP with the following annotation:

service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"

m1schka on 19 Mar 2019

👍1

It works, only on HTTP, doing HTTPS and terminating SSL on ELB won't work.

midN on 19 Mar 2019

For me, it works with the ELB terminating the ssl, both for http, https, ws and wss using this:

kind: Service
apiVersion: v1
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    app: ingress-nginx
  annotations:
    # replace with the correct value of the generated certifcate in the AWS console
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "<redacted>" 
    # the backend instances are HTTP/HTTPS/TCP so let Nginx do that
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
    # Map port 443
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
    # Increase the ELB idle timeout to avoid issues with WebSockets or Server-Sent Events.
    service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: '3600'
spec:
  externalTrafficPolicy: Cluster
  type: LoadBalancer
  selector:
    app: ingress-nginx
  ports:
  - name: http
    port: 80
    targetPort: http
  - name: https
    port: 443
    targetPort: http

just some annotations are broken for forcing SSL, see #2000

m1schka on 20 Mar 2019

👍8 🎉1

@m1schka interesting! Whats ur ingress for websocket service looks like and what version of nginx ingress controller?

midN on 20 Mar 2019

nothing special, no additional annotations needed, regular ingress. one thing I forgot is the following annotation

service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"

(I added that to the previous answer)

and then in the ingress configmap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: nginx-configuration
  namespace: ingress-nginx
  labels:
    app: ingress-nginx
data:
  use-proxy-protocol: "true"
  real-ip-header: "proxy_protocol"
  set-real-ip-from: "0.0.0.0/0"
  proxy-read-timeout: "3600"
  proxy-send-timeout: "3600"
  use-forwarded-headers: "true"
  force-ssl-redirect: "false"

then the ingress also shows the real ip.

I've used that config etc with many versions of the nginx-ingress, so no special version needed

m1schka on 20 Mar 2019

👍3

@midN @m1schka The above works but only on layer4 with proxy-protocol enabled and this annotation service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp".

When having the SSL cert terminating on the ELB, nginx-ingress returns < HTTP/1.1 400 Bad Request on behalf of the backend websocket service and does not upgrade the connection. This is a problem related to where the SSL certs exists for securing the websocket connection.

A simple solution in order to get it to work with nginx on layer7 (service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http") might be to install cert-manager on the AWS cluster and provision a wildcard cert and provide a default certificate to nginx-ingress using these values:

controller:
    extraArgs:
    default-ssl-certificate: "kube-system/wildcard-certificate"

This is still a kind of hacky solution but I'm 99% certain it will work for layer7 at that point since it is the default SSL http listener on nginx that closes the connection before reaching the backend socket. Nginx Layer 4 (aws-load-balancer-backend-protocol: "tcp") works because the raw TCP listener does not check for an SSL cert for the connection. :)

0verc1ocker on 16 Apr 2019

For others looking to solve their WebSockets/Ingress issues, I created a checklist here: https://gist.github.com/jsdevtom/7045c03c021ce46b08cb3f41db0d76da#file-ingress-service-yaml

jsdevtom on 28 Apr 2019

👍4 👎2

@jsdevtom This isn't about generalized WebSocket support. That's definitely supported. This is about it breaking when using an SSL-terminating AWS ELB in front of the ingress controller.

timdorr on 28 Apr 2019

👍10

When having the SSL cert terminating on the ELB, nginx-ingress returns < HTTP/1.1 400 Bad Request on behalf of the backend websocket service and does not upgrade the connection. This is a problem related to where the SSL certs exists for securing the websocket connection.

This is about it breaking when using an SSL-terminating AWS ELB in front of the ingress controller.

It doesn't do that for me. As I said, I got the SSL termination working (https/wss on port 443) with the configuration I posted earlier. There is no 400 Bad Request. What exactly is the problem with doing tcp instead of http on the load balancer? I don't see any issues and I never had any problems

m1schka on 29 Apr 2019

@m1schka , your configuration is working. There is nothing wrong with it and I am using the TCP listener on nginx-ingress and my cert is on the AWS ELB as well.

I posted that comment to make sure people understand why the SSL termination on AWS is different and it breaks websockets when SSL is terminated on the AWS ELB instead of nginx-ingress and to give them my understanding from solving that problem. If you try switching to layer7 http(s) listener on ingress-nginx your websocket service won't work either.

0verc1ocker on 29 Apr 2019

SSL Termination happens on ELB side:

HTTPS/HTTP - Nginx knows it was terminated on ELB by looking at few passed in headers and just proxies the connection
Websocket - SSL was terminated on ELB side, Nginx tries to proxy non-ssl ws to your backend, Chrome blocks the req with "non-secure ws over https, stop or i'll spoil endgame"

midN on 30 Apr 2019

@midN This isn't the browser blocking anything. ingress-nginx is dropping those headers before they get to my backend. It only does this when the ELB terminates SSL and is of the HTTP protocol (which is ideal for me). My best guess is something with the OpenResty WAF module.

timdorr on 30 Apr 2019

@timdorr in my case the nginx knows the origin ip, because the ELB does the proxy protocol on a tcp basis somehow. So there is no real drawback of using tcp instead of http except force ssl is not working for the websocket. I created an issue for that #2000 but it never got solved or much attention

m1schka on 30 Apr 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot on 29 Jul 2019

@m1schka thanks for posting a working solution. Could you update your response https://github.com/kubernetes/ingress-nginx/issues/3746#issuecomment-474738867 to mention configMap update required to set use-proxy-protocol to true. Without that, nginx ingress returns 400 Bad Request. Alternately, users can remove aws-load-balancer-proxy-protocol from annotations.
When I tried your fix, I thought aws-load-balancer-proxy-protocol might be harmless and when it failed, I was not sure which part of the fix caused it :).

In my case, changing aws-load-balancer-backend-protocol to tcp was all that was required. I do realize that my services don't see the client-ip.

rangadi on 15 Aug 2019

I think the above comment was addressed to @m1schka

m1kola on 15 Aug 2019

Sorry about that. Updated.

rangadi on 15 Aug 2019

👍1

Also, without the 'tcp' fix, the issue with websockets still persisted even with direct 'LoadBalancer' service bypassing nginx-ingress. The echoserver test worked exactly as in @timdorr's description when ingress was not involved. But Jupyter notebook service kept failing to start kernel in the same way with or without ingress. Jupyetr notebook service uses websockets. So there is more involved. I can help anyone looking deeper into this issue with configs to repro.

I hope this gets looked into. It will be good to have websockets working well with default 'http' protocol between ELB and ingress or services.

rangadi on 16 Aug 2019

Does anyone have an update to this issue?

lbpage on 28 Sep 2019

👍3

@rangadi I see you have mentioned about 400 bad request as I am facing a similar issue. As soon as I change the service.beta.kubernetes.io/aws-load-balancer-backend-protocol from http (which is my current config) to tcp, 400 bad request error starts popping up.

Using AWS ACM certificates for SSL, classic ELB and trying with ingress-nginx image 0.21.0 (0.26 latest throws too many redirects error). I have been stuck here for a while and have not been able to test WebSockets on a WebSocket enabled the app.

I have followed the instructions (I hope) which have been mentioned in the above posts, but not working. Any pointers?

grv231 on 26 Oct 2019

👍1

Thanks to @rangadi's suggestion about removing aws-load-balancer-proxy-protocol from annotations, the 400 error went away.

michaelxuwang on 15 Nov 2019

I can still confirm that for me it works with changing aws-load-balancer-backend-protocol from http to tcp.
The change occurs in the controller, I'm using nginx-ingress-controller helm chart, so this is how my values look like:

And this is how it looks like after applying those changes:
listeners

unfor19 on 21 Jan 2020

👍3

We need aws-load-balancer-backend-protocol as https as we need end to end encryption. It works if you change it to TCP but that is not what the requirement is.
For wss://sample.example.net we get failed: Error during WebSocket handshake: Unexpected response code: 400

nastrofaction on 27 Mar 2020

👍4

i've used all of @m1schka configuration (backend protocol = tcp) and getting 400 errors . should i manually enable proxypolicy/protocol on the elb? my scenario is as mentioned here: terminating on ELB, redirecting to SSL on nginx-ingress controller.

it was mentioned above that setting use-proxy-protocol: true will get passed the 400 errors. but for me theyt persist. this is my nginx ingress config:

controller:
  config:
    redirect-to-https: "true"
    use-forwarded-headers: "true"
    real-ip-header: "proxy_protocol"
    set-real-ip-from: "0.0.0.0/0"
    proxy-read-timeout: "3600"
    proxy-send-timeout: "3600"
    force-ssl-redirect: "false"
    use-proxy-protocol: "true"

  replicaCount: 3
  extraArgs:
    publish-service: kube-system/jxing-nginx-ingress-controller
  service:
    targetPorts: 
      https: http
    omitClusterIP: true
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: <certArnReducted>
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
      service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
      service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: '3600'

defaultBackend:
  service:
    omitClusterIP: true
rbac:
  create: true

and still getting:

iftachsc on 7 Apr 2020

👍1

Can anyone confirm they have managed to setup using a single ELB:

SSL termination by ELB (https and wss)
Both ws and http backends
Redirection from http to https
Redirection to www
Without manually enabling anything

Thank you,

grifx on 28 Apr 2020

👍1

Move to NLB. That solved it for us.

On Tue, Apr 28, 2020 at 5:46 AM Joris Garonian notifications@github.com
wrote:

Can anyone confirm they have managed to setup using a single ELB:

SSL termination by ELB (https and wss)

Both ws and http backends

Redirection from http to https

Redirection to www

Without manually enabling anything

Thank you,

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/kubernetes/ingress-nginx/issues/3746#issuecomment-620498884,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABL7BA35TYDWDQFQZM2SYILRO2QWBANCNFSM4GWK7CDA
.

--
-
Akshat

nastrofaction on 28 Apr 2020

I have the same issue on Azure / AKS :(

I have no special configuration for ingress.

First WebSocket request fails with error message: Error during WebSocket handshake: Unexpected response code: 400

If i expose the service with LoadBalancer it works fine, so it looks like a configuration missing on the ingress side ?

RomainWilbert on 24 Jun 2020

@RomainWilbert , 400 means Bad Request and probably returned by nginx-ingress.

Did you try adding these annotations to the ingress object of your service?

    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $connection_upgrade;

Cryptophobia on 24 Jun 2020

@RomainWilbert , 400 means Bad Request and probably returned by nginx-ingress.

Did you try adding these annotations to the ingress object of your service?
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection $connection_upgrade;

This does not help. I don't think we have do configure this because the correct headers are already set in nginx generated configuration by default

RomainWilbert on 24 Jun 2020

Faced exact similar issue, in newer nginx ingress version (1.18.0), annotation naming syntax has been changed.

ref: https://docs.nginx.com/nginx-ingress-controller/configuration/ingress-resources/advanced-configuration-with-snippets/

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    nginx.org/location-snippets: |
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection upgrade;

After just adding above annotation, the error disappeared. The changes can also be verified from nginx-ingress pod.

master:~ # kubectl exec -it -n nginx-ingress nginx-ingress-bk5sd -- cat /etc/nginx/conf.d/cattle-system-rancher.conf | grep upgrade
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection upgrade;

ankit8sri on 19 Aug 2020

👎2 👍1

From what i can see, i believe the value of the HTTP Connection header get transformed to lower case.

I use a ingress and ingress-nginx without any special config or annotations for websockets.

In my application i had to check for a websocket request this way:

func IsWebsocket(r *http.Request) bool {
    connection := strings.ToLower(r.Header.Get("Connection"))
    upgrade := strings.ToLower(r.Header.Get("Upgrade"))
    return strings.Contains(connection, "upgrade") && upgrade == "websocket"
}

The Brower sends

Connection: Upgrade
Upgrade: websocket

and it arrives

Connection: upgrade
Upgrade: websocket

I hope this will help someone :D

damoon on 15 Sep 2020

Can anyone confirm they have managed to setup using a single ELB:
* SSL termination by ELB (https and wss)

* Both ws and http backends

* Redirection from http to https

* Redirection to www

* Without manually enabling anything
Thank you,

@grifx This is probably because of this - https://www.built.io/blog/websockets-on-aws-s-elb

And also this https://github.com/kubernetes/kubernetes/issues/40244

These two basically mean that you cannot define both TCP (needed for websocket) and HTTP backends on the same ELB via Kubernetes. You will have to use a different ELB to handle TCP connections, or use an NLB for all of it together.

We've managed to have all the things you mentioned above setup using two different ELBs, I found this issue while checking up on the migration to an NLB and thought this might be useful for you.