I'm trying to implement the following scenario via django channels: if the user is currently online, send them a notification via websocket. If not, send them an email notification.
My hope was to leverage Group membership in a group keyed on a user id to send the websocket message, but also to detect if the user is currently online (so I can handle the other case of sending them an email if they are not online).
I see that the Group class supports add, discard, send instance methods. But to accomplish what I'd like to do I'd need to know whether the Group currently contains a reply channel. Is there a way to get either the length of the number of reply channels within a Group? Or maybe to get the list of channels themselves contained with the group?
If not today, would this be a reasonable enhancement request?
I assume an alternative implementation is just simply for me to keep track in my own database of who is online, but then I'd have to handle expiry, etc myself which Group now does for me.
Channels deliberately does not keep any of this state because it is near-impossible to provide it in a scalable way that performs well while also being accurate; you might want to know if a user's socket has been confirmed alive in the past 10 seconds; others might want it to within 10 milliseconds or a few minutes.
That means, you can't tell what's in a Group, just send to it. The Redis version does indeed know internally, but in a sharded scenario it's not necessarily true that you can quickly and accurately enumerate all the channels and return them.
Specifically, you need to establish what you mean when you say a WebSocket is disconnected. A socket can be apparently open but not connected to a browser for a decent amount of time; the only way to establish this is to ping through it and wait for the response. A socket disconnection message could also get lost between the protocol server and your business logic. Or, a socket may be open but not ping properly due to a temporary network glitch but still be connected when you try to send to it 5 seconds later. Also bear in mind a user may have multiple channels open to the server, some of which may be good and some may be bad (so length isn't good enough).
For all these reasons, Channels does not ship with a way to tell the state of sockets. I recommend you build it yourself based on connect/disconnect messages plus a timeout, stored against the user in a suitable datastore (database for low volume, something else for high volume) - that should be reasonably accurate (around 99.9% or so). I envision this being the kind of thing that becomes a few reuseable apps, each catering for a different case (e.g. one is kind of accurate but can handle millions of users, one is very accurate but only a few hundred at once)
@sachinrekhi
You could just send the notification via websocket and require some sort of reply. If you don't get one, send an email.
That's a good idea.
Any suggestion on best way to process a no reply? I'll do the initial socket send from an http handler, the reply will be handled by a socket receive handler. And now I need to send my email on a no reply. So it seems I would need to run a separate handler after some delay and have it check whether a reply was received (via some persistence). So maybe on the socket send I also send a message to a separate channel whose handler simply checks for the persisted reply and sends an email if it doesn't receive it. But then i likely have to put a sleep in that handler to enforce the delay? Don't think there is a way to have the channel process the message after some delay, is there?
I was discussing a very similar situation with someone at DjangoCon - that being the ability to send a notification down a socket, and then if it's not read/acknowledged within a certain amount of time, send an email.
I don't want to add task delays to Channels if I can help it - it seems like something that could be served by a third-party app easily enough (in particular, something that looks at messages with run-at timestamps and does something like raise channels.exceptions.ConsumeLater if they're not ready to be served yet - but also, Celery already has this, yes.
A third option is to have some management command that runs ~once a minute (maybe with cron, or a future scheduling protocol server) that looks in a table or cache of sent notifications and sends emails for those that are old enough.
No matter what, I would suggest making notifications a first-class Model so you can easily tie them to users and objects, and have an ID to pass around to refer to them. It also provides a convenient place to note if it was read over a socket/emailed/mobile app pushed/etc.
Appreciate the additional thoughts on this Andrew!
While Celery does support delayed tasks, it's feels a bit much to have to add it to my stack just for delayed tasks (given the stack is already nginx, daphne, channel workers, redis, memcache). If I run celery I need another broker layer (Celery prefers rabbitmq but seems to suggest redis will also work fine, so don't need to duplicate that). I also need another pool of celery workers separate from the channel workers, which seems like it just creates more opportunities for contention/thrashing of processes if run in a single box. And tasks are defined in a slightly different way compared to django channel tasks adding a bit more complexity. So feels like a lot of overhead just for adding delays to what I'd like to do with django channel tasks.
I like the suggestion of just using django channel tasks and storing in the message a run-at time and raising an exception if it's not ready to be served. That's easy-enough to do. But am I correct that this will just result in a lot of thrashing? The workers will constantly be picking up the task, raising the exception, and putting it back in the queue effectively until the run-at time occurs. Seems like it would waste a ton of cycles on the consumers and might starve some other tasks or at least eat up CPU cycles, right?
I guess this is just a long-winded way of saying I would LOVE support for delaying tasks directly in django channels ;) And honestly real-time notifications is going to be one of the killer scenarios for channels in the first place (it's the one scenario i've already launched in production with django channels already).
You're right that doing that is essentially a busyloop, and moreover, channels has code to detect when a message gets requeued too many times (it's 10 right now by default) and then it kills it to prevent livelocks, so that's not a workable solution really.
I don't want to extend the core Channel abstraction to allow delays; it's suddenly asking a lot more of channel layers (and also just by virtue of adding complexity is going to make things harder to scale).
My current thought is this: What if there was a channel you could send things to with a delay, content and channel keys, and then a new protocol server whose job was to read things from this channel, wait a bit, and then insert them onto the channel in the message after delay.
This same server could theoretically be the same one that handled scheduled tasks or something similar as well. It does mean running an extra process, but it would be pretty lightweight, need very little configuration, and solve two things in one. Shouldn't be too hard to write, either.
Very interesting idea. I'm liking the direction of this. What do you think about taking it even further and enabling the following:
What if we allow a worker to specify which channels it can pull messages from. By default all workers pull messages from any channel to process, but if you provide a command like the following, then it will only pull messages from say the "backend" channel:
python manage.py runworker --channel=backend
I could see this being useful in general for channels to improve scaling. I'm used to always running the set of workers that are operating on back-end tasks separate from the pool of workers that is processing user-facing web requests. The reason being I can work hard to ensure every web-facing request is a very short-lived web request. But then don't need to be so careful for backend processing tasks. But by mixing the pools like django-channels does you have to be very careful that your web requests don't get starved or end up with higher latency/response times because a set of the same workers are processing long-running background tasks. The ability to run a dedicate pool of workers for say back-end tasks and a separate pool for web requests could make it easy to allow latency/response time to increase on back-end tasks but closely monitor and prevent it on web requests. This could be completely optional to avoid complexity in the simple case.
With this capability you could then have a dedicated worker(s) for handling delayed requests on a specific channel. These workers could do short sleeps when the run_at time specified in the message is not yet there and requeue (or just sleep until time to operate is soon if run_at is only like 30s away). And you wouldn't have to worry about the sleeps starving other meaningful tasks because you've segregated these workers to only operate on these types of delayed back-end tasks and your web requests are handling by a different pool of workers that never have such sleeps in the tasks their processing.
Thoughts (on both the feature for scaling front-end and back-end independently as well as the usage of this functionality for processing delayed scheduled tasks)?
I did consider the idea of specifying a subset of channels for workers to consume earlier on in the project (it may have even been a command line option at one point) but didn't add it yet due to simplicity. It might be a nice idea, though; while the best way to separate concerns is to run two separate channel layers, that's overkill for most applications.
It's also something that ops departments would quite like (e.g. at work we separate API and website traffic onto two clusters as they're different levels of interactivity). I still don't think the workers should do consume-and-sleep to delay messages; a full on protocol server with a specialised event loop would be better, as then it could handle all delayed messages in one process with ease.
I've opened tickets for both of these ideas, though, as I think they have value: #115 and #116
Most helpful comment
Channels deliberately does not keep any of this state because it is near-impossible to provide it in a scalable way that performs well while also being accurate; you might want to know if a user's socket has been confirmed alive in the past 10 seconds; others might want it to within 10 milliseconds or a few minutes.
That means, you can't tell what's in a Group, just send to it. The Redis version does indeed know internally, but in a sharded scenario it's not necessarily true that you can quickly and accurately enumerate all the channels and return them.
Specifically, you need to establish what you mean when you say a WebSocket is disconnected. A socket can be apparently open but not connected to a browser for a decent amount of time; the only way to establish this is to ping through it and wait for the response. A socket disconnection message could also get lost between the protocol server and your business logic. Or, a socket may be open but not ping properly due to a temporary network glitch but still be connected when you try to send to it 5 seconds later. Also bear in mind a user may have multiple channels open to the server, some of which may be good and some may be bad (so length isn't good enough).
For all these reasons, Channels does not ship with a way to tell the state of sockets. I recommend you build it yourself based on connect/disconnect messages plus a timeout, stored against the user in a suitable datastore (database for low volume, something else for high volume) - that should be reasonably accurate (around 99.9% or so). I envision this being the kind of thing that becomes a few reuseable apps, each catering for a different case (e.g. one is kind of accurate but can handle millions of users, one is very accurate but only a few hundred at once)