Channels: Messages getting lost, connections dropped

Created on 20 Jun 2017  路  10Comments  路  Source: django/channels

I'm currently running Channels on a load-balanced Azure configuration, with two Daphne dedicated servers, one redis cache, and one database. For some reason however, after a few minutes, channels don't accept messages from the frontend and get stuck. I keep sending messages, usually like 20 times over, and nothing gets acknowledged on the server. There are also no errors on the console. I'm suspecting the load balancer servers are switching or something, but I'm not so sure.

I don't use channel sessions much, so I don't think it could be sessions being dropped. Here's a look at the consumers:
https://github.com/taleship/taleship/blob/master/rooms/consumers.py
(yes, it's kinda messy)

blockeuser-response bug

Most helpful comment

I'll close this for now and file it under the general "enforce_ordering is not great" section :)

All 10 comments

What versions are you running?

OK, thanks. Can you be clearer when you say "doesn't accept messages from the frontend" - does it do any or all of the following?

  • Accepts HTTP connections but does not return a status code
  • Returns HTTP connections with a status code that is not 200
  • Accepts WebSockets but does not finish the handshake (they never open)
  • Finishes the WebSocket handshake (the socket opens) but your code does not run in response to a message being sent down it

You can debug WebSocket connection state in the developer tools pane of your browser.

HTTP is fully operational, websocket connections are not dropped (RWS would've reconnected), and it accepts messages for about 3-4 minutes then when sending the server websocket messages it is supposed to respond either with success or an error, but no matter how much I send, no response is received at all. I've tried debugging with RWS and it is sending correctly but the server is returning nothing.

I'm going to keep debugging the frontend and report back, but so far nothing is working.

OK, so if the servers are handshaking the websockets then Daphne is happy. My suspicion is that enforce_ordering is failing - could you try removing that and seeing if it works?

Sure, I'll try. I'd have to do a hotfix real quick, so one second.

@andrewgodwin I do need enforce_ordering for authentication though, so that'd be difficult. I'll try.

Update 2: So far I disabled them on production, currently testing them. Give me about 10-15 minutes to try to reproduce and I'll report back results. I disabled it on ws_message, which seems to be the source of problems.

@andrewgodwin Wow, that seemed to have worked. Disabling enforce_ordering is working _so far._ I will keep running tests to validate this.

Yup, it's really a performance killer and I'm honestly considering removing it entirely unless we can get a stable version implemented.

I did see that you had it on connect and receive - you don't need this, as new Channels always sequences connect before receive even without the decorator (it waits for connect to send back either a send or accept before it finishes the WebSocket handshake). If that's all you had it for, then you're good without it.

@andrewgodwin Makes sense, will get rid of it. Thanks for the fast reply. What a strange bug 馃槢

I'll close this for now and file it under the general "enforce_ordering is not great" section :)

Was this page helpful?
0 / 5 - 0 ratings