Socket.io: Question: What could be the possible reasons for socket-io server returning 400 Bad Request via nginx?

Created on 4 Dec 2015 · 28Comments · Source: socketio/socket.io

Hi,

We are facing a scenario where a good percentage of requests going to /Socket.io/?... is returning with 400.

Default nginx access log results are littered with:

10.0.0.82 - - [04/Dec/2015:11:43:27 +0000] "GET /socket.io/?__sails_io_sdk_version=0.11.0&__sails_io_sdk_platform=browser&__sails_io_sdk_language=javascript&EIO=3&transport=websocket HTTP/1.1" 400 45 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"

The requests are not even reaching our overlying SailsJS server. We suspect something going wrong at socket-io layer. Any thought?

Source

shamasis

Most helpful comment

After a LOT of painstaking log-watching, this is our temporary solution for nginx config (to stop freaking out our servers.) The following directive checks for missing "Upgrade" header in request and returns with a 202 without hitting the node server.

location /socket.io {
    set $local_upgrade_failure "${arg___sails_io_sdk_platform}${arg_transport}${connection_upgrade}";
    if ( $local_upgrade_failure = "browserwebsocket" ) { return 202; break; }

 ... remain proxy configurations
}

Now, this does NOT fix the clients that are sending these requests, but surely frees up the server. Now, the question is @rauchg, why are some socket.io clients not sending the "upgrade" header? One pattern is that they are all Windows and a very few Linux machines. Did not spot a single OSX in one full hour of heavy usage.

shamasis on 5 Dec 2015

❤3

All 28 comments

Interesting discovery: The upgrade header is blank for the requests having 400 response. Forcing upgrade header to websocket stops the 400, but causes intermittent 499 error on nginx.

shamasis on 5 Dec 2015

👍1

location /socket.io {
    set $local_upgrade_failure "${arg___sails_io_sdk_platform}${arg_transport}${connection_upgrade}";
    if ( $local_upgrade_failure = "browserwebsocket" ) { return 202; break; }

 ... remain proxy configurations
}

shamasis on 5 Dec 2015

❤3

Did you set up sticky sessions ? http://socket.io/docs/using-multiple-nodes/

nkzawa on 17 Jan 2016

Not needed. We use websockets alone and its sticky anyway.

shamasis on 20 Jan 2016

A bit late of a reply, but seems related to socketio/engine.io#283... open since 2014, mind you.

julianlam on 14 Mar 2016

Just as a note, we are still facing this error.

transport only websocket
proxy of nginx configured correctly

We see that the request header is missing Upgrade "websocket" within nginx. If we force this using set_header, the soxket.io server closes connection and we get 499 from nginx.

shamasis on 8 Jun 2016

If using nginx, we worked around this by setting max_fails to 0. Not a good solution, but got us going again.

julianlam on 8 Jun 2016

@julianlam - I will research and try it out. If it works, lunch is on me if we ever meet. 😊

shamasis on 10 Jun 2016

Ok. That does not solve the problem. I am thinking of ditching as many network layers as possible to get to the root of this.

We have AWS ELB in TCP mode which proxies to EC2 instances (c4.2xlarge) and those instances have nginx and docker running in them. nginx is configured simply and nicely (all websocket related configs in place) to reverse proxy to a docker port. Inside docker pm2 runs our NodeJS server (SailsJS) driven in cluster mode. The kernel is tweaked to handle large number of connections.

This is so frustrating that I am going crazy.

shamasis on 11 Jun 2016

Server tweak? To handle large number of connections? Whats that look like? Maybe turn that off and see if it's responsible.
Read somewhere (remote syslog site) ELBalancer is a bad bottle neck under "heavy load" burst conditions. I think they opted for DNS only solution.
Here it is...
https://www.loggly.com/blog/why-aws-route-53-over-elastic-load-balancing/

MasterJames on 11 Jun 2016

https://github.com/socketio/socket.io/issues/1942#issuecomment-94243556
More on the load balancer issue recommended against in this thread.

MasterJames on 11 Jun 2016

Thanks for feedback.

The criteria of ELB feature failure in the Loggly articles seems a bit dated and does not apply to us. We are doing TCP load balancing with a predictable traffic on standard ports. Also, all other points in that article has been addressed in ELB over time - except the burst traffic drop and warm up (not relevant to us.) 😞

The server tweaks are simply increase of file ulimits, net somaxconn to modest 1024 and increase to internal port range.

I've tried turning each off. Making it a vanilla nginx with Websocket proxy tweaks - but to no use.

shamasis on 12 Jun 2016

Oh well. Mayne I was thinking bad gateway.
400 is bad request. Maybe try to get better logging of the request content on both sides and see if you can see why it's "bad" aka malformed.

MasterJames on 12 Jun 2016

I will then have to deep dive into socket.io - it's being returned by socket.io with no additional message -

{ code: 3, message: "badRequest"}

shamasis on 12 Jun 2016

👍2

Guys, someone was found resolution? If I am using transport like polling, then all work fine, but if i am chaning transport to websocket i have 400.

objque on 8 Aug 2016

The bug has been reported (and ignored) upstream: socketio/engine.io#283

There is no workaround. Clients will drop to xhr for that session. If using nginx and you run into "no live upstreams", this is the cause, so a workaround is to set max_fails to 0, so nginx never considers the upstream "down". Hacky workaround (esp. if your upstream ever _does_ go down), but there it is.

julianlam on 8 Aug 2016

👍1

This Sorted me out https://chrislea.com/2013/02/23/proxying-websockets-with-nginx/

wamburu on 14 Jun 2017

@denzelwamburu - what's different in that tutorial than what we were already facing in this thread? all of the nginx configurations mentioned there is exactly the same all of us tried out here. What am I missing here?

shamasis on 14 Jun 2017

@shamasis there are several issues listed here, some of which have been fixed (I think socketio/engine.io#283 was fixed by https://github.com/socketio/engine.io/pull/458). Could you please be more specific?

Note: it's indeed the same configuration that is used in the nginx example here.

darrachequesne on 14 Jun 2017

@shamasis did you ever find a solution? I am also facing similar problem just bad request and no further details are provided.

aftabnaveed on 29 Nov 2017

Is there a solution for this? All of the sudden our servers are freaking out and we have tones of 400 errors in the logs.

thenitai on 8 Mar 2018

Anything new? Getting some 400 from IE11 users...

jflebeau on 8 May 2018

same thing

topoleov on 14 Nov 2018

Same thing. Difficult to debug. My logs are littered with 400s. Very difficult to debug.

https://stackoverflow.com/questions/54933396/elastic-beanstalk-socket-io-sticky-sessions

mrwillis on 28 Feb 2019

I was having the very same issue, I faced this error when i was proxy my NodJS api from http://baseurl/api. Apparently I was missing ; (semicolons) at the end of nginx configs.

I was doing

proxy_http_version 1.1
proxy_set_header Upgrade $http_upgrade
proxy_set_header Connection "upgrade"

What worked

proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

Yep missing semicolons; Please double check ur configs

arfanliaqat on 5 Oct 2019

👍1

I have the following that works fine for HTTP request, how to make it works to HTTPS?

files:
  /etc/nginx/conf.d/proxy.conf:
    mode: "000644"
    owner: root
    group: root
    content: |
      upstream nodejs {
        server 127.0.0.1:8081;
        keepalive 256;
      }

      server {
        listen 8080;

        if ($time_iso8601 ~ "^(\d{4})-(\d{2})-(\d{2})T(\d{2})") {
            set $year $1;
            set $month $2;
            set $day $3;
            set $hour $4;
        }
        access_log /var/log/nginx/healthd/application.log.$year-$month-$day-$hour healthd;
        access_log  /var/log/nginx/access.log  main;

        location / {
            proxy_pass  http://nodejs;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_http_version 1.1;
            proxy_set_header        Host            $host;
            proxy_set_header        X-Real-IP       $remote_addr;
            proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
        }

        gzip on;
        gzip_comp_level 4;
        gzip_types text/html text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;

        location /static {
            alias /var/app/current/static;
        }

      }

container_commands:
 removeconfig:
    command: "rm -f /tmp/deployment/config/#etc#nginx#conf.d#00_elastic_beanstalk_proxy.conf /etc/nginx/conf.d/00_elastic_beanstalk_proxy.conf"

cacothi on 13 Apr 2020

I was having the very same issue, I faced this error when i was proxy my NodJS api from http://baseurl/api. Apparently I was missing ; (semicolons) at the end of nginx configs.

I was doing
proxy_http_version 1.1
proxy_set_header Upgrade $http_upgrade
proxy_set_header Connection "upgrade"
What worked
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
Yep missing semicolons; Please double check ur configs

How missing quote could be a solution? It's obvious wrong syntax in nginx conf which will end up with invalid number of arguments and nginx will never start. And without working nginx you can test it. So I don't see that as a solution