I'm currently running Channels on a load-balanced Azure configuration, with two Daphne dedicated servers, one redis cache, and one database. For some reason however, after a few minutes, channels don't accept messages from the frontend and get stuck. I keep sending messages, usually like 20 times over, and nothing gets acknowledged on the server. There are also no errors on the console. I'm suspecting the load balancer servers are switching or something, but I'm not so sure.
I don't use channel sessions much, so I don't think it could be sessions being dropped. Here's a look at the consumers:
https://github.com/taleship/taleship/blob/master/rooms/consumers.py
(yes, it's kinda messy)
What versions are you running?
@andrewgodwin All latest versions. https://github.com/taleship/taleship/blob/master/requirements.txt
OK, thanks. Can you be clearer when you say "doesn't accept messages from the frontend" - does it do any or all of the following?
You can debug WebSocket connection state in the developer tools pane of your browser.
HTTP is fully operational, websocket connections are not dropped (RWS would've reconnected), and it accepts messages for about 3-4 minutes then when sending the server websocket messages it is supposed to respond either with success or an error, but no matter how much I send, no response is received at all. I've tried debugging with RWS and it is sending correctly but the server is returning nothing.
I'm going to keep debugging the frontend and report back, but so far nothing is working.
OK, so if the servers are handshaking the websockets then Daphne is happy. My suspicion is that enforce_ordering is failing - could you try removing that and seeing if it works?
Sure, I'll try. I'd have to do a hotfix real quick, so one second.
@andrewgodwin I do need enforce_ordering for authentication though, so that'd be difficult. I'll try.
Update 2: So far I disabled them on production, currently testing them. Give me about 10-15 minutes to try to reproduce and I'll report back results. I disabled it on ws_message, which seems to be the source of problems.
@andrewgodwin Wow, that seemed to have worked. Disabling enforce_ordering is working _so far._ I will keep running tests to validate this.
Yup, it's really a performance killer and I'm honestly considering removing it entirely unless we can get a stable version implemented.
I did see that you had it on connect and receive - you don't need this, as new Channels always sequences connect before receive even without the decorator (it waits for connect to send back either a send or accept before it finishes the WebSocket handshake). If that's all you had it for, then you're good without it.
@andrewgodwin Makes sense, will get rid of it. Thanks for the fast reply. What a strange bug 馃槢
I'll close this for now and file it under the general "enforce_ordering is not great" section :)
Most helpful comment
I'll close this for now and file it under the general "enforce_ordering is not great" section :)