Socket.io: Question: What could be the possible reasons for socket-io server returning 400 Bad Request via nginx?

Created on 4 Dec 2015  路  28Comments  路  Source: socketio/socket.io

Hi,

We are facing a scenario where a good percentage of requests going to /Socket.io/?... is returning with 400.

Default nginx access log results are littered with:

10.0.0.82 - - [04/Dec/2015:11:43:27 +0000] "GET /socket.io/?__sails_io_sdk_version=0.11.0&__sails_io_sdk_platform=browser&__sails_io_sdk_language=javascript&EIO=3&transport=websocket HTTP/1.1" 400 45 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"

The requests are not even reaching our overlying SailsJS server. We suspect something going wrong at socket-io layer. Any thought?

Most helpful comment

After a LOT of painstaking log-watching, this is our temporary solution for nginx config (to stop freaking out our servers.) The following directive checks for missing "Upgrade" header in request and returns with a 202 without hitting the node server.

location /socket.io {
    set $local_upgrade_failure "${arg___sails_io_sdk_platform}${arg_transport}${connection_upgrade}";
    if ( $local_upgrade_failure = "browserwebsocket" ) { return 202; break; }

 ... remain proxy configurations
}

Now, this does NOT fix the clients that are sending these requests, but surely frees up the server. Now, the question is @rauchg, why are some socket.io clients not sending the "upgrade" header? One pattern is that they are all Windows and a very few Linux machines. Did not spot a single OSX in one full hour of heavy usage.

All 28 comments

Interesting discovery: The upgrade header is blank for the requests having 400 response. Forcing upgrade header to websocket stops the 400, but causes intermittent 499 error on nginx.

After a LOT of painstaking log-watching, this is our temporary solution for nginx config (to stop freaking out our servers.) The following directive checks for missing "Upgrade" header in request and returns with a 202 without hitting the node server.

location /socket.io {
    set $local_upgrade_failure "${arg___sails_io_sdk_platform}${arg_transport}${connection_upgrade}";
    if ( $local_upgrade_failure = "browserwebsocket" ) { return 202; break; }

 ... remain proxy configurations
}

Now, this does NOT fix the clients that are sending these requests, but surely frees up the server. Now, the question is @rauchg, why are some socket.io clients not sending the "upgrade" header? One pattern is that they are all Windows and a very few Linux machines. Did not spot a single OSX in one full hour of heavy usage.

Did you set up sticky sessions ? http://socket.io/docs/using-multiple-nodes/

Not needed. We use websockets alone and its sticky anyway.

A bit late of a reply, but seems related to socketio/engine.io#283... open since 2014, mind you.

Just as a note, we are still facing this error.

  • transport only websocket
  • proxy of nginx configured correctly

We see that the request header is missing Upgrade "websocket" within nginx. If we force this using set_header, the soxket.io server closes connection and we get 499 from nginx.

If using nginx, we worked around this by setting max_fails to 0. Not a good solution, but got us going again.

@julianlam - I will research and try it out. If it works, lunch is on me if we ever meet. 馃槉

Ok. That does not solve the problem. I am thinking of ditching as many network layers as possible to get to the root of this.

We have AWS ELB in TCP mode which proxies to EC2 instances (c4.2xlarge) and those instances have nginx and docker running in them. nginx is configured simply and nicely (all websocket related configs in place) to reverse proxy to a docker port. Inside docker pm2 runs our NodeJS server (SailsJS) driven in cluster mode. The kernel is tweaked to handle large number of connections.

This is so frustrating that I am going crazy.

Server tweak? To handle large number of connections? Whats that look like? Maybe turn that off and see if it's responsible.
Read somewhere (remote syslog site) ELBalancer is a bad bottle neck under "heavy load" burst conditions. I think they opted for DNS only solution.
Here it is...
https://www.loggly.com/blog/why-aws-route-53-over-elastic-load-balancing/

https://github.com/socketio/socket.io/issues/1942#issuecomment-94243556
More on the load balancer issue recommended against in this thread.

Thanks for feedback.

The criteria of ELB feature failure in the Loggly articles seems a bit dated and does not apply to us. We are doing TCP load balancing with a predictable traffic on standard ports. Also, all other points in that article has been addressed in ELB over time - except the burst traffic drop and warm up (not relevant to us.) 馃槥

The server tweaks are simply increase of file ulimits, net somaxconn to modest 1024 and increase to internal port range.

I've tried turning each off. Making it a vanilla nginx with Websocket proxy tweaks - but to no use.

Oh well. Mayne I was thinking bad gateway.
400 is bad request. Maybe try to get better logging of the request content on both sides and see if you can see why it's "bad" aka malformed.

I will then have to deep dive into socket.io - it's being returned by socket.io with no additional message -

{ code: 3, message: "badRequest"}

Guys, someone was found resolution? If I am using transport like polling, then all work fine, but if i am chaning transport to websocket i have 400.

The bug has been reported (and ignored) upstream: socketio/engine.io#283

There is no workaround. Clients will drop to xhr for that session. If using nginx and you run into "no live upstreams", this is the cause, so a workaround is to set max_fails to 0, so nginx never considers the upstream "down". Hacky workaround (esp. if your upstream ever _does_ go down), but there it is.

@denzelwamburu - what's different in that tutorial than what we were already facing in this thread? all of the nginx configurations mentioned there is exactly the same all of us tried out here. What am I missing here?

@shamasis there are several issues listed here, some of which have been fixed (I think socketio/engine.io#283 was fixed by https://github.com/socketio/engine.io/pull/458). Could you please be more specific?

Note: it's indeed the same configuration that is used in the nginx example here.

@shamasis did you ever find a solution? I am also facing similar problem just bad request and no further details are provided.

Is there a solution for this? All of the sudden our servers are freaking out and we have tones of 400 errors in the logs.

Anything new? Getting some 400 from IE11 users...

same thing

Same thing. Difficult to debug. My logs are littered with 400s. Very difficult to debug.

https://stackoverflow.com/questions/54933396/elastic-beanstalk-socket-io-sticky-sessions

I was having the very same issue, I faced this error when i was proxy my NodJS api from http://baseurl/api. Apparently I was missing ; (semicolons) at the end of nginx configs.

I was doing

proxy_http_version 1.1
proxy_set_header Upgrade $http_upgrade
proxy_set_header Connection "upgrade"

What worked

proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

Yep missing semicolons; Please double check ur configs

I have the following that works fine for HTTP request, how to make it works to HTTPS?

files:
  /etc/nginx/conf.d/proxy.conf:
    mode: "000644"
    owner: root
    group: root
    content: |
      upstream nodejs {
        server 127.0.0.1:8081;
        keepalive 256;
      }

      server {
        listen 8080;

        if ($time_iso8601 ~ "^(\d{4})-(\d{2})-(\d{2})T(\d{2})") {
            set $year $1;
            set $month $2;
            set $day $3;
            set $hour $4;
        }
        access_log /var/log/nginx/healthd/application.log.$year-$month-$day-$hour healthd;
        access_log  /var/log/nginx/access.log  main;

        location / {
            proxy_pass  http://nodejs;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_http_version 1.1;
            proxy_set_header        Host            $host;
            proxy_set_header        X-Real-IP       $remote_addr;
            proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
        }

        gzip on;
        gzip_comp_level 4;
        gzip_types text/html text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;

        location /static {
            alias /var/app/current/static;
        }

      }

container_commands:
 removeconfig:
    command: "rm -f /tmp/deployment/config/#etc#nginx#conf.d#00_elastic_beanstalk_proxy.conf /etc/nginx/conf.d/00_elastic_beanstalk_proxy.conf"

I was having the very same issue, I faced this error when i was proxy my NodJS api from http://baseurl/api. Apparently I was missing ; (semicolons) at the end of nginx configs.

I was doing

proxy_http_version 1.1
proxy_set_header Upgrade $http_upgrade
proxy_set_header Connection "upgrade"

What worked

proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

Yep missing semicolons; Please double check ur configs

How missing quote could be a solution? It's obvious wrong syntax in nginx conf which will end up with invalid number of arguments and nginx will never start. And without working nginx you can test it. So I don't see that as a solution

Definitely a problem of sticky session ! If you have worker like gunicorn -w or nodejs cluster it will not work !

Was this page helpful?
0 / 5 - 0 ratings