Cht-core: If gateway is fixed/reconnected after a long time, very old messages may be sent out

Created on 1 Mar 2017 · 10Comments · Source: medic/cht-core

Observed

If there is a problem in SMSSync/medic-gateway and SMS are stuck in the pending state on webapp for a long time, then when the gateway is finally fixed, very old messages will be sent.

Expected

Messages should be transitioned to a timed-out/failed state on webapp after they have been pending for too long (e.g. one week).

Recommendation

From a technical perspective, I would expect medic-sentinel to update messages from pending to timed-out-from-pending after a certain amount of time. A safer solution might be for the sms-gateway controller in medic-api to check how long a message has been pending before sending it. Perhaps a combination of both would be ideal.

It would be ideal if message timeouts were flagged clearly to the tech lead, project manager, or the project's admin user.

Bug

Source

alxndrsn

All 10 comments

I would probably also make this a setting? So we'd have a default pending_messages_timeout value or something and that could be modified if someone wants old messages to go out.

Out of scope for this issue but would also be nice if someone could tell what messages are queued up before they plug in a gateway, or even disable all outgoing messages with a setting. Also I could see editing pending messages so they don't get sent. There is currently no way in webapp to see or get statistics about what messages have been/will be/are being sent. We have an api for this but aren't using it.

mandric on 6 Mar 2017

👍1

@mandric any thoughts on what would take responsibility for changing the state of a message?

alxndrsn on 7 Mar 2017

Messages that are older than a week or two will still be very relevant as we send out scheduled reminders atleast a week before to CHWs for a lot of use cases and workflows. I would suggest to wait atleast 2 weeks before flagging any pending messages as timed-out or failed.

bishwasBhatta on 7 Mar 2017

@alxndrsn yeah probably sentinel, it has a "scheduler" that runs every 5 minutes that looks for messages that should be sent (set to pending). So probably makes sense to also have it do stuff for medic-gateway.

Unless you wanted all that logic in one server/place... I don't see a problem with having medic-gateway having a standalone server either and we would just route it through an nginx config.

mandric on 7 Mar 2017

👍1

I don't see a problem with having medic-gateway having a standalone server

I can see the attraction to this - it seems like a logical division of responsibilities - but if we are currently separating APIs into medic-api and scheduled jobs into medic-sentinel then I think there's logic to that as well and I'd be inclined to stick with the current approach.

alxndrsn on 8 Mar 2017

By dividing the logic into two places I can see us getting into a situation where they need to be in sync. So we add a feature or bugfix to medic gateway and we need to touch 3 repos and make sure they are deployed in sync, which seems like a pain.

I also see value in abstracting medic-gateway as much as possible from the medic-webapp family of deps. The only real dependency for medic-gateway currently should be an instance of couchdb. I know this gets into refactoring as I typically do but at this point, that's all I really care about doing anyway. Basically I think if non-@medic humans can't parse a readme to get something setup to pass sms messages in and out of couchdb on a low level then things are broken and that's the biggest fire I see, and one that's been burning for 5 years. Continuing to ignore that feels almost as bad in my soul as orgs that continue to ignore injustices that have been happening for centuries. Different scale obviously but same feeling. i.e. I am highly motivated to fix it.

mandric on 8 Mar 2017

I see what you mean, but the current medic data model for storing SMS is wholly unsuitable for abstract users, so I don't think medic-gateway having a dependency on couchdb makes much sense - better just to define a clean JSON API which the Android app expects the server to implement.

alxndrsn on 11 Mar 2017

I also prefer having a clean JSON API, which anyone can implement around. But we'd be maintaining the adapter for couchdb (and some legacy hacks for medic-webapp compatibility) since that's what we use. The ugly data model should be replaced with something much simpler and it should all be decoupled so anyone can get something up and running quickly.

mandric on 27 Mar 2017

@mandric did you write a script this morning which can be used to solve this problem?

alxndrsn on 12 Apr 2017

IMO _dummy send mode_ deals with the immediate need for tech leads to prevent old messages being sent out by a project.

alxndrsn on 19 Apr 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Can't hold it back anymooore

estellecomment · 5Comments

Medic ID never generated if a report is created before sentinel ID is generated

ngaruko · 5Comments

Telemetry could be aggregated at a configurable interval, or at configurable times

kennsippell · 3Comments

Number of tasks for person not shown on profile

abbyad · 4Comments

XForm with single top level field/group does not load

abbyad · 4Comments