If there is a problem in SMSSync/medic-gateway and SMS are stuck in the pending state on webapp for a long time, then when the gateway is finally fixed, very old messages will be sent.
Messages should be transitioned to a timed-out/failed state on webapp after they have been pending for too long (e.g. one week).
From a technical perspective, I would expect medic-sentinel to update messages from pending to timed-out-from-pending after a certain amount of time. A safer solution might be for the sms-gateway controller in medic-api to check how long a message has been pending before sending it. Perhaps a combination of both would be ideal.
It would be ideal if message timeouts were flagged clearly to the tech lead, project manager, or the project's admin user.
I would probably also make this a setting? So we'd have a default pending_messages_timeout value or something and that could be modified if someone wants old messages to go out.
Out of scope for this issue but would also be nice if someone could tell what messages are queued up before they plug in a gateway, or even disable all outgoing messages with a setting. Also I could see editing pending messages so they don't get sent. There is currently no way in webapp to see or get statistics about what messages have been/will be/are being sent. We have an api for this but aren't using it.
@mandric any thoughts on what would take responsibility for changing the state of a message?
Messages that are older than a week or two will still be very relevant as we send out scheduled reminders atleast a week before to CHWs for a lot of use cases and workflows. I would suggest to wait atleast 2 weeks before flagging any pending messages as timed-out or failed.
@alxndrsn yeah probably sentinel, it has a "scheduler" that runs every 5 minutes that looks for messages that should be sent (set to pending). So probably makes sense to also have it do stuff for medic-gateway.
Unless you wanted all that logic in one server/place... I don't see a problem with having medic-gateway having a standalone server either and we would just route it through an nginx config.
I don't see a problem with having medic-gateway having a standalone server
I can see the attraction to this - it seems like a logical division of responsibilities - but if we are currently separating APIs into medic-api and scheduled jobs into medic-sentinel then I think there's logic to that as well and I'd be inclined to stick with the current approach.
By dividing the logic into two places I can see us getting into a situation where they need to be in sync. So we add a feature or bugfix to medic gateway and we need to touch 3 repos and make sure they are deployed in sync, which seems like a pain.
I also see value in abstracting medic-gateway as much as possible from the medic-webapp family of deps. The only real dependency for medic-gateway currently should be an instance of couchdb. I know this gets into refactoring as I typically do but at this point, that's all I really care about doing anyway. Basically I think if non-@medic humans can't parse a readme to get something setup to pass sms messages in and out of couchdb on a low level then things are broken and that's the biggest fire I see, and one that's been burning for 5 years. Continuing to ignore that feels almost as bad in my soul as orgs that continue to ignore injustices that have been happening for centuries. Different scale obviously but same feeling. i.e. I am highly motivated to fix it.
I see what you mean, but the current medic data model for storing SMS is wholly unsuitable for abstract users, so I don't think medic-gateway having a dependency on couchdb makes much sense - better just to define a clean JSON API which the Android app expects the server to implement.
I also prefer having a clean JSON API, which anyone can implement around. But we'd be maintaining the adapter for couchdb (and some legacy hacks for medic-webapp compatibility) since that's what we use. The ugly data model should be replaced with something much simpler and it should all be decoupled so anyone can get something up and running quickly.
@mandric did you write a script this morning which can be used to solve this problem?
IMO _dummy send mode_ deals with the immediate need for tech leads to prevent old messages being sent out by a project.